mbsoft31 / slr-ranking
This is my package slr-ranking
Fund package maintenance!
mbsoft31
Requires
- php: ^8.3
- illuminate/contracts: ^11.0||^12.0
- spatie/laravel-package-tools: ^1.16
Requires (Dev)
- larastan/larastan: ^3.0
- laravel/pint: ^1.14
- nunomaduro/collision: ^8.8
- orchestra/testbench: ^10.0.0||^9.0.0
- pestphp/pest: ^4.0
- pestphp/pest-plugin-arch: ^4.0
- pestphp/pest-plugin-laravel: ^4.0
- phpstan/extension-installer: ^1.4
- phpstan/phpstan-deprecation-rules: ^2.0
- phpstan/phpstan-phpunit: ^2.0
README
Headless, auditable SLR pipeline for open-source evidence ranking — protocol → connectors (OpenAlex/Crossref/arXiv/Semantic Scholar) → normalization & dedup → enrichment (OA/code/data/citations) → scoring (venue, recency, OA/repro, novelty, realism, breadth) → exports. Built as a Laravel package (no UI), following Spatie Package Tools conventions.
Table of Contents
- Features
- Requirements
- Installation
- Configuration
- Environment Variables
- Data Model
- Connectors
- Normalization & Dedup
- Enrichment
- Scoring
- CLI Commands
- Quickstart
- Scheduling (optional)
- Search Integration (optional)
- Testing
- FAQ
- Roadmap
- Contributing
- Security & ToS
- License
Features
- Package-only (no UI): pure Eloquent models, services, jobs, and CLI commands.
- Protocol storage: projects keep objective, weights, half-life, search strings, and eligibility rules.
- Open connectors: OpenAlex, Crossref, arXiv, Semantic Scholar (S2) with cursor/offset paging.
- Normalization & dedup: unified “work” per paper (DOI favored; fuzzy title match fallback).
- Enrichment: Unpaywall (OA), code/data link detection, citation counts & fields.
- Scoring engine: venue quality, recency (half-life), OA & reproducibility, novelty, realism, breadth → composite 0–100.
- Exports: JSON project bundle (CSV/Markdown planned).
- Lookups: SJR quartiles & CORE ranks via CSV uploads (versioned snapshots).
- Auditable: minimal audit log events & immutable derived scores.
Requirements
-
PHP: 8.2+
-
Laravel: 10.x or 11.x
-
Database: any supported by Laravel (PostgreSQL recommended)
-
Queues: Redis (recommended) or any Laravel queue backend
-
HTTP: Guzzle (already required)
-
Optional:
- Laravel Scout + Meilisearch for full-text indexing
- Laravel Horizon for queue monitoring
Installation
1) Require the package
Monorepo / local path:
// composer.json (of your host app) { "repositories": [ { "type": "path", "url": "packages/mbsoft/slr-ranking" } ] }
composer require mbsoft/slr-ranking:dev-main
Standalone (from VCS):
composer require mbsoft/slr-ranking
2) Publish config & migrations
php artisan slr:install php artisan migrate
3) Set environment
Add at least:
UNPAYWALL_EMAIL=you@example.com OPENALEX_BASE=https://api.openalex.org CROSSREF_BASE=https://api.crossref.org S2_BASE=https://api.semanticscholar.org/graph/v1 ARXIV_BASE=https://export.arxiv.org/api
4) Run queues
php artisan queue:work
# (or) php artisan horizon
Configuration
config/slr-ranking.php
(published by slr:install
):
return [ 'half_life' => 3.0, // years for recency decay 'default_weights' => [ 'venue' => 0.30, 'recency' => 0.15, 'oa' => 0.05, 'novelty' => 0.20, 'realism' => 0.20, 'breadth' => 0.10, ], 'user_model' => \App\Models\User::class, 'endpoints' => [ 'openalex' => env('OPENALEX_BASE', 'https://api.openalex.org'), 'crossref' => env('CROSSREF_BASE', 'https://api.crossref.org'), 'unpaywall'=> 'https://api.unpaywall.org/v2', 's2' => env('S2_BASE', 'https://api.semanticscholar.org/graph/v1'), 'arxiv' => env('ARXIV_BASE', 'https://export.arxiv.org/api'), ], 'unpaywall_email' => env('UNPAYWALL_EMAIL'), // Optional search: set to 'scout' to enable Laravel Scout indexing 'search_driver' => env('SLR_SEARCH', null), 'features' => [ // If true, venue quality falls back to a capped percentile of citations when SJR/CORE missing. 'citations_percentile_fallback' => true, ], ];
Environment Variables
UNPAYWALL_EMAIL
— required by Unpaywall.OPENALEX_BASE
,CROSSREF_BASE
,S2_BASE
,ARXIV_BASE
— override endpoints if needed.SLR_SEARCH=scout
— enable Scout indexing (host app must configure Scout + Meilisearch).- Usual Laravel queue/database configuration as per your app.
Data Model
Tables (all prefixed slr_
):
- projects — protocol store: name, objective, weights, search strings, inclusion rules, half-life.
- sources — connector catalog (openalex, crossref, arxiv, semanticscholar, manual).
- raw_records — raw JSON per source pull (for audit/replay).
- works — normalized unique works (per project).
- enrichments — OA flags, OA URL, code/data links, citations, fields.
- venue_metrics — versioned SJR/CORE snapshots.
- screenings — TA/FT screening decisions (reviewer morphs).
- criterion_scores — per-criterion scores.
- composite_scores — final composite (breakdown + timestamp).
- expert_reviews — expert notes, adjustment deltas, overrides.
- audit_logs — lightweight action trail.
- lookups_sjr, lookups_core — uploaded CSV snapshots.
ER (Mermaid)
erDiagram slr_projects ||--o{ slr_works : has slr_sources ||--o{ slr_raw_records : provides slr_projects ||--o{ slr_raw_records : collects slr_works ||--|| slr_enrichments : has slr_works ||--|| slr_criterion_scores : has slr_works ||--|| slr_composite_scores : has slr_works ||--o{ slr_screenings : reviewed_by slr_works ||--o{ slr_expert_reviews : adjusted_byLoading
Connectors
Each connector is queued and writes raw_records → normalization job → enrichment → scoring.
- OpenAlex:
/works
endpoint with cursor paging. - Crossref:
/works
with query + filter (e.g.,type:journal-article
). - arXiv: Atom feed (XML) paginated with
start
/max_results
. - Semantic Scholar (S2):
/paper/search
offset/limit; optional DOI lookups for enrichment fallback.
Project search_strings
structure (JSON):
{ "openalex": { "q": "vision transformer agriculture", "filter": "from_publication_date:2023-01-01" }, "crossref": { "q": "agriculture transformer", "filter": "type:journal-article,from-pub-date:2023-01-01" }, "arxiv": { "q": "ti:(agriculture) AND (cat:cs.CV OR cs.LG)" }, "s2": { "q": "agriculture transformer" } }
Rate limits & ToS: Be polite; the package does not bypass rate limits. You are responsible for adhering to each source’s ToS.
Normalization & Dedup
- Precedence: DOI > OpenAlex > Crossref > arXiv > S2 (field-by-field best-effort).
- Title normalization: lowercase, strip punctuation, collapse whitespace.
- Fuzzy match: token-set ratio ≥ 92 within year ±1 to merge when DOI missing.
- Provenance: raw source preserved in
slr_raw_records
(replayable).
Enrichment
- OA (Unpaywall):
is_oa
+ best OA URL (PDF or landing). - Code links: detects GitHub/GitLab/Zenodo record URLs from title/abstract.
- Data links: detects Zenodo/Figshare/Dataverse from title/abstract.
- Citations & fields: OpenAlex by ID; fallback to S2 by DOI if missing.
Scoring
Default weights (configurable per project):
- Venue 0.30, Recency 0.15, OA/Repro 0.05, Novelty 0.20, Realism 0.20, Breadth 0.10
Venue quality:
- Journal (SJR): Q1=1.00, Q2=0.75, Q3=0.50, Q4=0.25 (fallback 0.40)
- Conference (CORE): A*=1.00, A=0.85, B=0.65, C=0.45 (fallback 0.50)
- Preprint: 0.35
- Optional fallback:
min(0.6, citations_percentile)
within project if no SJR/CORE.
Recency: exp(-ln(2) * Δt / H)
with H = half-life years.
OA & Reproducibility: base 0.6 if OA +0.25 code +0.15 data (capped 1.0).
Novelty / Realism / Breadth: reviewer-provided 0..1 checklist scores (persisted in criterion_scores
).
Composite: 100 * Σ (w_i * criterion_i)
stored with breakdown & timestamp.
CLI Commands
# 1) Publish config + migrations php artisan slr:install # 2) Run connectors (choose flags) php artisan slr:run {project-uuid} --openalex --crossref --arxiv --s2 # 3) Recompute scores for all works in a project php artisan slr:score {project-uuid} # 4) Export bundle (JSON) php artisan slr:export {project-uuid} --type=json --out=slr # 5) Upload lookups php artisan slr:upload-sjr storage/app/sjr_snapshot.csv # issn,quartile,snapshot_date php artisan slr:upload-core storage/app/core_snapshot.csv # conference,rank,snapshot_date
Queues: jobs use queues
pull
,normalize
,enrich
,score
(you can route these via Laravel’s queue config).
Quickstart
- Create a project (via Tinker/Seeder):
use Illuminate\Support\Str; use Mbsoft\SlrRanking\Models\Project; $proj = Project::create([ 'id' => (string) Str::uuid(), 'name' => 'Demo SLR', 'objective' => 'custom', 'weights' => config('slr-ranking.default_weights'), 'search_strings' => [ 'openalex' => ['q' => 'vision transformer agriculture'], 'crossref' => ['q' => 'agriculture transformer', 'filter' => 'type:journal-article,from-pub-date:2023-01-01'], 'arxiv' => ['q' => 'ti:(agriculture) AND (cat:cs.CV OR cs.LG)'], 's2' => ['q' => 'agriculture transformer'] ], 'inclusion_criteria' => [], 'half_life' => 3 ]); $proj->id;
- Run pulls & processing:
php artisan slr:run {project-uuid} --openalex --crossref --arxiv --s2 php artisan slr:score {project-uuid}
- Export bundle:
php artisan slr:export {project-uuid} --type=json --out=slr
# => storage/app/slr/{project-uuid}.json
Scheduling (optional)
In your app’s app/Console/Kernel.php
:
protected function schedule(Schedule $s): void { $s->command('slr:run YOUR-PROJECT-UUID --openalex --crossref')->dailyAt('01:30'); $s->command('slr:score YOUR-PROJECT-UUID')->dailyAt('03:00'); }
Search Integration (optional)
Enable Scout + Meilisearch in your app:
composer require laravel/scout meilisearch/meilisearch-php http-interop/http-factory-guzzle
Set SLR_SEARCH=scout
and configure Scout as usual.
Note: This package does not ship searchable repositories by default; you can call Work::search($q)
if Scout is enabled and you add indexing calls in your app.
Testing
- Uses Pest + Orchestra Testbench.
- Run:
composer install vendor/bin/pest
Write unit tests for:
NormalizationService
mappings (OpenAlex/Crossref/arXiv/S2)DedupService
title/DOI logicScoreService
math & edge cases- Command smoke tests for
slr:run
,slr:score
,slr:export
FAQ
Q: Does this store PDFs? A: No. Only metadata + OA links. Respect publisher ToS.
Q: Can I change weights or half-life later?
A: Yes—update the project record and re-run slr:score
.
Q: How do I add my own connector?
A: Implement Contracts\Connector
, create a pull job, and register a facade accessor (copy the existing patterns).
Roadmap
- CSV & Markdown exports (tables for works, scores, top-N, by-venue heatmap)
- Mermaid PRISMA diagram export
- Crossref abstract JATS parsing (sanitized plain text)
- Additional code/data link heuristics & GitHub API probe (stars, license)
- Reviewer role policies (left to host app)
Contributing
- Follow PSR-12, run Pest tests, and include fixtures for new connectors or mappings.
- Open a PR with a clear description and reproduction steps.
Security & ToS
-
This package is read-only against public APIs; you are responsible for:
- Obeying rate limits & identifying with a valid User-Agent (set this in your app’s HTTP client if desired).
- Complying with the terms of service of each data source.
-
Report vulnerabilities via a private issue or email.
License
The MIT License (MIT). Please see License File for more information.