README

📰 extrablatt 📰

a simple news aggregator in politically charged times. pulls articles from configurable rss feeds (plus reddit, hacker news and x via cookie-authenticated scrape), stores them in sqlite, detects paywalls, fetches thumbnails, categorises through an llm, and opens single articles through an archive.ph proxy with mobile-friendly css rewrites. installable as a progressive web app.

installation

mkdir extrablatt
cd extrablatt
composer require vielhuber/extrablatt
./vendor/bin/extrablatt-init

after install, edit:

.data/config.json: papers (see schema below)
.data/.env: AI_API_KEY / AUTH_PASSWORD / AI_PROVIDER / AI_MODEL
.data/cookies/: drop cookie exports per host into
.data/database.sqlite: restore database (optional)

config.json schema

{
    "papers": {
        "<paper-key>": {
            "url": "https://example.com",
            "label": "Display Name",
            "rss": "https://example.com/feed.xml",
            "default_image": "https://example.com/fallback.png",
            "stub_markers": ["Subscribe to read", "Premium content"]
        }
    }
}

default_image (optional): fallback thumbnail when the RSS item carries no image.
stub_markers (optional): substrings present in the archive.ph snapshot of a PLUS article when it's only a teaser, so the snapshot is dropped instead of surfaced as if it were the full text.
Special rss schemes: reddit://home and x://home activate the cookie-authenticated JSON scrapers in place of XML parsing.

categories, AI defaults (temperature, timeout, max_tries), and the archive fulltext minimum (8000 chars) are hardcoded in the package.

usage

php -S 127.0.0.1:8080 -t .

cron

0 6,18 * * * curl -s 'https://your-host/?scrape=1&key=<AUTH_PASSWORD>' >/dev/null

backup

zip -r backup.zip .data

vielhuber / extrablatt

Maintainers

Package info

Statistics

Security