survos / fetch-bundle
HTTP fetch utilities for Symfony: resumable chunk downloads, caching, retry/backoff, and concurrent fetch into JSONL
Fund package maintenance!
Requires
- php: ^8.4
- psr/log: ^3.0
- survos/jsonl-bundle: ^2.6
- survos/kit-bundle: ^2.6
- symfony/console: ^7.3||^8.0
- symfony/framework-bundle: ^7.3||^8.0
- symfony/http-client: ^7.3||^8.0
Requires (Dev)
- roave/security-advisories: dev-latest
- dev-main
- 2.7.18
- 2.7.17
- 2.7.16
- 2.7.15
- 2.7.13
- 2.7.12
- 2.7.11
- 2.7.10
- 2.7.9
- 2.7.8
- 2.7.7
- 2.7.6
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.0
- 2.5.8
- 2.5.7
- 2.5.6
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.1
- 2.5.0
- 2.4.4
- 2.4.3
- 2.4.2
- 2.4.1
- 2.4.0
- 2.3.0
- 2.2.5
- 2.2.4
- 2.2.3
- 2.2.2
- 2.2.1
- 2.2.0
- 2.1.2
- 2.1.1
- 2.0.220
- 2.0.219
- 2.0.218
- 2.0.217
- 2.0.216
- 2.0.215
- 2.0.214
- 2.0.213
- 2.0.212
- 2.0.211
- 2.0.210
- 2.0.209
- 2.0.208
- 2.0.207
- 2.0.206
- 2.0.205
- 2.0.204
- 2.0.203
- 2.0.202
- 2.0.201
- 2.0.200
- 2.0.199
- 2.0.198
- 2.0.197
- 2.0.196
- 2.0.195
- 2.0.194
- 2.0.193
- 2.0.192
- 2.0.191
- 2.0.190
- 2.0.189
- 2.0.188
- 2.0.187
- 2.0.186
- 2.0.185
- 2.0.184
- 2.0.183
- 2.0.182
- 2.0.181
- 2.0.180
- 2.0.179
- 2.0.178
- 2.0.177
- 2.0.176
- 2.0.175
- 2.0.174
- 2.0.173
- 2.0.172
- 2.0.171
- 2.0.170
- 2.0.169
- 2.0.168
- 2.0.167
- 2.0.166
- 2.0.165
- 2.0.164
- 2.0.163
- 2.0.162
- 2.0.161
- 2.0.160
- 2.0.159
This package is auto-updated.
Last update: 2026-06-06 20:35:53 UTC
README
Reusable HTTP fetch utilities for Symfony applications that harvest remote data into JSONL.
This bundle started as multi-fetch-bundle, an experiment around parallel API fetching. The useful core is broader than concurrency: most dataset fetchers need the same small set of primitives:
- bounded-concurrency HTTP requests;
- retry and backoff for transient failures;
- page planning for offset, page-number, and cursor APIs;
- resumable downloads for large files;
- JSONL output through
survos/jsonl-bundle; - optional HTTP cache helpers for source APIs that do not send useful cache headers.
The package is survos/fetch-bundle. "Multi" (concurrent fetching) is just one execution mode; the resumable ChunkDownloader and cache helpers stand on their own.
Current Services
SymfonyConcurrentFetcher implements bounded concurrent HTTP fetching using Symfony HttpClient streaming. It accepts keyed request metadata and yields keyed response arrays as requests complete.
ExponentialBackoffRetry provides a simple retry policy for transport errors, HTTP 429, and 5xx responses.
ChunkDownloader downloads large files to *.part, supports HTTP Range resume when the source honors it, retries transient failures, and reports byte progress.
multi:fetch is an experimental CLI for Solr/JSON/JSON-LD style sources. It writes rows with Survos\JsonlBundle\IO\JsonlWriter.
Dependencies
survos/jsonl-bundle is a hard dependency because JSONL is the canonical output format for Survos dataset harvesting. This bundle should depend on jsonl-bundle, not copy classes from it.
Example
use Survos\FetchBundle\Contract\ConcurrentFetcherInterface; use Survos\FetchBundle\Contract\DTO\FetchOptions; final class DatasetFetcher { public function __construct( private readonly ConcurrentFetcherInterface $fetcher, ) {} public function fetch(array $urls): iterable { $requests = []; foreach ($urls as $i => $url) { $requests[$i] = ['url' => $url]; } yield from $this->fetcher->fetchMany($requests, new FetchOptions( concurrency: 8, timeout: 60.0, defaultHeaders: ['Accept' => 'application/json'], )); } }
Harvest References
The next design pass should extract repetition from Harvest dataset commands such as:
dataset:fetch:belvedere: page-number API, XML parse, stop on empty page;dataset:fetch:victoria: page-number API, JSON parse, sidecar/count based resume;dataset:fetch:aust: offset/limit API, multiple raw output cores;dataset:fetch:walters: large archive download and local CSV-to-JSONL conversion.
Source-specific parsing and row normalization should stay in applications. Pagination, retry, resume, cache behavior, and JSONL output targets belong here.
TUI Progress
Symfony 8.1's TUI component is a good fit for visualizing concurrent fetches, but it should be an optional presentation layer over the fetch engine.
The core fetch service should emit structured progress events such as planned, started, bytes, pageComplete, retry, failed, and merged. A TUI renderer can show one row per active page/download, plus aggregate totals and a log pane. Non-interactive runs should use the same events for normal console progress output.
For precomputed page ranges, concurrent downloads can be displayed naturally: page number, URL/key, status, retries, bytes, rows, and elapsed time. For cursor or nextPage APIs, concurrency is usually limited because the next URL is discovered only after reading the current response; the TUI still helps by showing cursor progress, row counts, retries, and merge state.
A future TUI implementation should follow the tui-monitor pattern: keep the engine independent of TUI classes, then put dashboard/widgets in a separate namespace that is only registered when Symfony\Component\Tui\Tui exists.
Generic Pagination Flow
The target generic flow is:
- Build a fetch plan from endpoint configuration, auth headers/query params, and a pagination strategy.
- Fetch pages to a temporary directory as page-local JSON/JSONL files.
- Extract rows from each page using a source-specific selector/extractor.
- Merge page files in stable order into the final JSONL output with
JsonlWriter. - Write sidecar state so interrupted runs can resume or skip completed pages.
For page-number and offset/limit APIs, the plan can often be known up front and fetched concurrently. For nextPage/cursor APIs, planning and fetching are interleaved unless the API also exposes all cursors or a total count.
MVP Scope
The first useful extraction should be sequential and resumable, not concurrent.
A practical v1 should cover the common Harvest loop:
- read existing JSONL sidecar/count state;
- resume from the correct page or offset;
- fetch one page at a time with retry/backoff and optional delay;
- extract rows from JSON;
- append rows with
JsonlWriter; - stop on empty page, missing
nextPage, or explicit limit.
Victoria is the best first consumer. Belvedere is a good second consumer. Multi-output fetchers such as Aust and archive converters such as Walters should wait until the small sequential API is stable.