mbsoft31/slr-ranking

This is my package slr-ranking

v0.0.1 2025-09-05 18:02 UTC

This package is auto-updated.

Last update: 2025-09-05 18:14:44 UTC


README

Latest Version on Packagist GitHub Tests Action Status GitHub Code Style Action Status Total Downloads

Headless, auditable SLR pipeline for open-source evidence ranking — protocol → connectors (OpenAlex/Crossref/arXiv/Semantic Scholar) → normalization & dedup → enrichment (OA/code/data/citations) → scoring (venue, recency, OA/repro, novelty, realism, breadth) → exports. Built as a Laravel package (no UI), following Spatie Package Tools conventions.

Table of Contents

Features

  • Package-only (no UI): pure Eloquent models, services, jobs, and CLI commands.
  • Protocol storage: projects keep objective, weights, half-life, search strings, and eligibility rules.
  • Open connectors: OpenAlex, Crossref, arXiv, Semantic Scholar (S2) with cursor/offset paging.
  • Normalization & dedup: unified “work” per paper (DOI favored; fuzzy title match fallback).
  • Enrichment: Unpaywall (OA), code/data link detection, citation counts & fields.
  • Scoring engine: venue quality, recency (half-life), OA & reproducibility, novelty, realism, breadth → composite 0–100.
  • Exports: JSON project bundle (CSV/Markdown planned).
  • Lookups: SJR quartiles & CORE ranks via CSV uploads (versioned snapshots).
  • Auditable: minimal audit log events & immutable derived scores.

Requirements

  • PHP: 8.2+

  • Laravel: 10.x or 11.x

  • Database: any supported by Laravel (PostgreSQL recommended)

  • Queues: Redis (recommended) or any Laravel queue backend

  • HTTP: Guzzle (already required)

  • Optional:

    • Laravel Scout + Meilisearch for full-text indexing
    • Laravel Horizon for queue monitoring

Installation

1) Require the package

Monorepo / local path:

// composer.json (of your host app)
{
  "repositories": [
    { "type": "path", "url": "packages/mbsoft/slr-ranking" }
  ]
}
composer require mbsoft/slr-ranking:dev-main

Standalone (from VCS):

composer require mbsoft/slr-ranking

2) Publish config & migrations

php artisan slr:install
php artisan migrate

3) Set environment

Add at least:

UNPAYWALL_EMAIL=you@example.com
OPENALEX_BASE=https://api.openalex.org
CROSSREF_BASE=https://api.crossref.org
S2_BASE=https://api.semanticscholar.org/graph/v1
ARXIV_BASE=https://export.arxiv.org/api

4) Run queues

php artisan queue:work
# (or) php artisan horizon

Configuration

config/slr-ranking.php (published by slr:install):

return [
    'half_life' => 3.0,  // years for recency decay
    'default_weights' => [
        'venue' => 0.30, 'recency' => 0.15, 'oa' => 0.05,
        'novelty' => 0.20, 'realism' => 0.20, 'breadth' => 0.10,
    ],
    'user_model' => \App\Models\User::class,

    'endpoints' => [
        'openalex' => env('OPENALEX_BASE', 'https://api.openalex.org'),
        'crossref' => env('CROSSREF_BASE', 'https://api.crossref.org'),
        'unpaywall'=> 'https://api.unpaywall.org/v2',
        's2'       => env('S2_BASE', 'https://api.semanticscholar.org/graph/v1'),
        'arxiv'    => env('ARXIV_BASE', 'https://export.arxiv.org/api'),
    ],
    'unpaywall_email' => env('UNPAYWALL_EMAIL'),

    // Optional search: set to 'scout' to enable Laravel Scout indexing
    'search_driver' => env('SLR_SEARCH', null),

    'features' => [
        // If true, venue quality falls back to a capped percentile of citations when SJR/CORE missing.
        'citations_percentile_fallback' => true,
    ],
];

Environment Variables

  • UNPAYWALL_EMAIL — required by Unpaywall.
  • OPENALEX_BASE, CROSSREF_BASE, S2_BASE, ARXIV_BASE — override endpoints if needed.
  • SLR_SEARCH=scout — enable Scout indexing (host app must configure Scout + Meilisearch).
  • Usual Laravel queue/database configuration as per your app.

Data Model

Tables (all prefixed slr_):

  • projects — protocol store: name, objective, weights, search strings, inclusion rules, half-life.
  • sources — connector catalog (openalex, crossref, arxiv, semanticscholar, manual).
  • raw_records — raw JSON per source pull (for audit/replay).
  • worksnormalized unique works (per project).
  • enrichments — OA flags, OA URL, code/data links, citations, fields.
  • venue_metrics — versioned SJR/CORE snapshots.
  • screenings — TA/FT screening decisions (reviewer morphs).
  • criterion_scores — per-criterion scores.
  • composite_scores — final composite (breakdown + timestamp).
  • expert_reviews — expert notes, adjustment deltas, overrides.
  • audit_logs — lightweight action trail.
  • lookups_sjr, lookups_core — uploaded CSV snapshots.

ER (Mermaid)

erDiagram
  slr_projects ||--o{ slr_works : has
  slr_sources ||--o{ slr_raw_records : provides
  slr_projects ||--o{ slr_raw_records : collects
  slr_works ||--|| slr_enrichments : has
  slr_works ||--|| slr_criterion_scores : has
  slr_works ||--|| slr_composite_scores : has
  slr_works ||--o{ slr_screenings : reviewed_by
  slr_works ||--o{ slr_expert_reviews : adjusted_by
Loading

Connectors

Each connector is queued and writes raw_records → normalization job → enrichment → scoring.

  • OpenAlex: /works endpoint with cursor paging.
  • Crossref: /works with query + filter (e.g., type:journal-article).
  • arXiv: Atom feed (XML) paginated with start/max_results.
  • Semantic Scholar (S2): /paper/search offset/limit; optional DOI lookups for enrichment fallback.

Project search_strings structure (JSON):

{
  "openalex": { "q": "vision transformer agriculture", "filter": "from_publication_date:2023-01-01" },
  "crossref": { "q": "agriculture transformer", "filter": "type:journal-article,from-pub-date:2023-01-01" },
  "arxiv":    { "q": "ti:(agriculture) AND (cat:cs.CV OR cs.LG)" },
  "s2":       { "q": "agriculture transformer" }
}

Rate limits & ToS: Be polite; the package does not bypass rate limits. You are responsible for adhering to each source’s ToS.

Normalization & Dedup

  • Precedence: DOI > OpenAlex > Crossref > arXiv > S2 (field-by-field best-effort).
  • Title normalization: lowercase, strip punctuation, collapse whitespace.
  • Fuzzy match: token-set ratio ≥ 92 within year ±1 to merge when DOI missing.
  • Provenance: raw source preserved in slr_raw_records (replayable).

Enrichment

  • OA (Unpaywall): is_oa + best OA URL (PDF or landing).
  • Code links: detects GitHub/GitLab/Zenodo record URLs from title/abstract.
  • Data links: detects Zenodo/Figshare/Dataverse from title/abstract.
  • Citations & fields: OpenAlex by ID; fallback to S2 by DOI if missing.

Scoring

Default weights (configurable per project):

  • Venue 0.30, Recency 0.15, OA/Repro 0.05, Novelty 0.20, Realism 0.20, Breadth 0.10

Venue quality:

  • Journal (SJR): Q1=1.00, Q2=0.75, Q3=0.50, Q4=0.25 (fallback 0.40)
  • Conference (CORE): A*=1.00, A=0.85, B=0.65, C=0.45 (fallback 0.50)
  • Preprint: 0.35
  • Optional fallback: min(0.6, citations_percentile) within project if no SJR/CORE.

Recency: exp(-ln(2) * Δt / H) with H = half-life years.

OA & Reproducibility: base 0.6 if OA +0.25 code +0.15 data (capped 1.0).

Novelty / Realism / Breadth: reviewer-provided 0..1 checklist scores (persisted in criterion_scores).

Composite: 100 * Σ (w_i * criterion_i) stored with breakdown & timestamp.

CLI Commands

# 1) Publish config + migrations
php artisan slr:install

# 2) Run connectors (choose flags)
php artisan slr:run {project-uuid} --openalex --crossref --arxiv --s2

# 3) Recompute scores for all works in a project
php artisan slr:score {project-uuid}

# 4) Export bundle (JSON)
php artisan slr:export {project-uuid} --type=json --out=slr

# 5) Upload lookups
php artisan slr:upload-sjr  storage/app/sjr_snapshot.csv    # issn,quartile,snapshot_date
php artisan slr:upload-core storage/app/core_snapshot.csv   # conference,rank,snapshot_date

Queues: jobs use queues pull, normalize, enrich, score (you can route these via Laravel’s queue config).

Quickstart

  1. Create a project (via Tinker/Seeder):
use Illuminate\Support\Str;
use Mbsoft\SlrRanking\Models\Project;

$proj = Project::create([
  'id' => (string) Str::uuid(),
  'name' => 'Demo SLR',
  'objective' => 'custom',
  'weights' => config('slr-ranking.default_weights'),
  'search_strings' => [
    'openalex' => ['q' => 'vision transformer agriculture'],
    'crossref' => ['q' => 'agriculture transformer', 'filter' => 'type:journal-article,from-pub-date:2023-01-01'],
    'arxiv'    => ['q' => 'ti:(agriculture) AND (cat:cs.CV OR cs.LG)'],
    's2'       => ['q' => 'agriculture transformer']
  ],
  'inclusion_criteria' => [],
  'half_life' => 3
]);
$proj->id;
  1. Run pulls & processing:
php artisan slr:run {project-uuid} --openalex --crossref --arxiv --s2
php artisan slr:score {project-uuid}
  1. Export bundle:
php artisan slr:export {project-uuid} --type=json --out=slr
# => storage/app/slr/{project-uuid}.json

Scheduling (optional)

In your app’s app/Console/Kernel.php:

protected function schedule(Schedule $s): void {
    $s->command('slr:run YOUR-PROJECT-UUID --openalex --crossref')->dailyAt('01:30');
    $s->command('slr:score YOUR-PROJECT-UUID')->dailyAt('03:00');
}

Search Integration (optional)

Enable Scout + Meilisearch in your app:

composer require laravel/scout meilisearch/meilisearch-php http-interop/http-factory-guzzle

Set SLR_SEARCH=scout and configure Scout as usual. Note: This package does not ship searchable repositories by default; you can call Work::search($q) if Scout is enabled and you add indexing calls in your app.

Testing

  • Uses Pest + Orchestra Testbench.
  • Run:
composer install
vendor/bin/pest

Write unit tests for:

  • NormalizationService mappings (OpenAlex/Crossref/arXiv/S2)
  • DedupService title/DOI logic
  • ScoreService math & edge cases
  • Command smoke tests for slr:run, slr:score, slr:export

FAQ

Q: Does this store PDFs? A: No. Only metadata + OA links. Respect publisher ToS.

Q: Can I change weights or half-life later? A: Yes—update the project record and re-run slr:score.

Q: How do I add my own connector? A: Implement Contracts\Connector, create a pull job, and register a facade accessor (copy the existing patterns).

Roadmap

  • CSV & Markdown exports (tables for works, scores, top-N, by-venue heatmap)
  • Mermaid PRISMA diagram export
  • Crossref abstract JATS parsing (sanitized plain text)
  • Additional code/data link heuristics & GitHub API probe (stars, license)
  • Reviewer role policies (left to host app)

Contributing

  • Follow PSR-12, run Pest tests, and include fixtures for new connectors or mappings.
  • Open a PR with a clear description and reproduction steps.

Security & ToS

  • This package is read-only against public APIs; you are responsible for:

    • Obeying rate limits & identifying with a valid User-Agent (set this in your app’s HTTP client if desired).
    • Complying with the terms of service of each data source.
  • Report vulnerabilities via a private issue or email.

License

The MIT License (MIT). Please see License File for more information.