jersey-mike/civilview-salesweb-scraper

Laravel 12 package that scrapes foreclosure/sheriff sale listings from salesweb.civilview.com.

Maintainers

Package info

github.com/jersey-mike/civilview-salesweb-scraper

Language:HTML

pkg:composer/jersey-mike/civilview-salesweb-scraper

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

dev-main 2026-05-02 01:24 UTC

This package is auto-updated.

Last update: 2026-05-02 02:44:30 UTC


README

A Laravel 12 package that turns the public foreclosure / sheriff-sale listings on salesweb.civilview.com into clean PHP objects you can use, store, or pipe into your own systems.

Table of contents

What this does

salesweb.civilview.com is a Tyler Technologies-hosted portal that ~50 U.S. counties use to publish their upcoming foreclosure / sheriff-sale dockets — places like Philadelphia County, PA; Burlington County, NJ; Maricopa County, AZ; Orleans Parish, LA; and so on. Each county's page lists the open and recently-sold cases: case number, sale date, plaintiff (typically a bank or mortgage servicer), defendant, property address, and a "View Details" link to a richer page showing the judgment amount, attorney of record, and an adjournment history.

The site is public and read-only — there's no API, no login, and no robots restriction on the listings themselves. This package scrapes the HTML and returns plain readonly DTOs. It does not save anything to your database, mutate any state on the upstream site, or sign requests with credentials. It is a thin, well-tested adapter layer.

What you get back per scrape:

  • A list of every county the portal serves, with their numeric IDs.
  • Every sale on a county's docket — open or sold — with all the columns the county chose to expose, plus the property's detail-page URL and ID.
  • For any property, the rich detail-page key/value pairs and full status history (every adjournment with its date and who requested it).

What you do with it is your problem (and the point): persist it to your schema, push it to a queue, diff it against last week's pull to detect changes, surface it in a dashboard, mail yourself when a specific plaintiff files, etc.

Why you might want it

A few real-world use cases:

  • Real estate investors tracking new foreclosure inventory in target counties and getting alerted when properties matching certain criteria hit the docket.
  • Title companies and attorneys monitoring filings in counties they practice in.
  • Researchers and journalists studying foreclosure trends — which plaintiffs are most active, where, and how the volume changes month over month.
  • Internal back-office tools at firms that already have to check this site daily — automating it means one less manual rotation.

If you've ever opened the Civilview site, ctrl-F'd through 1,500 rows looking for one plaintiff name, and thought "this should be an API," this package is that API.

How it works

The site is a server-rendered ASP.NET MVC app. There's no client-side rendering, so there's no JS engine required — a plain HTTP client and an HTML parser do the job.

                      ┌────────────────────────────────────┐
                      │  CivilviewClient (the public API)  │
                      └──┬──────────┬──────────┬───────────┘
                         │          │          │
                         ▼          ▼          ▼
              ┌─────────────┐ ┌──────────┐ ┌──────────────┐
              │ HttpClient  │ │ Scrapers │ │   DTOs       │
              │ (Laravel    │ │ (DOM-    │ │   County     │
              │  Http +     │ │ Crawler) │ │   Sale       │
              │  Guzzle)    │ │          │ │   SaleDetail │
              └─────────────┘ └──────────┘ └──────────────┘
                         │          │          ▲
                         ▼          ▼          │
                  HTTPS GET/POST → salesweb.civilview.com
                                    │          │
                                    └──────────┘
                                  raw HTML response

Three high-level moving parts:

  1. HttpClient — A thin wrapper around Laravel's Http facade plus Guzzle's cookie jar. Handles the base URL, user agent, timeouts, and retries (configurable in config/civilview.php). Crucially, it can produce a session-aware request handle whose cookies survive across calls — needed because the site keeps state in ASP.NET sessions (see The session quirk).
  2. Scrapers — Three small classes, one per page type (CountyDirectoryScraper, SalesListScraper, SaleDetailScraper), each using Symfony\Component\DomCrawler with CSS selectors. They take raw HTML and produce DTOs. The scraping logic is intentionally defensive: column counts and labels vary between counties, and the scrapers cope.
  3. CivilviewClient — The orchestration layer. Picks the right HTTP request shape (GET vs. POST, cold vs. session-bound), routes the response to the right scraper, and returns a typed result. This is the class you actually call (usually via the Civilview facade).

A queueable job, an Artisan command, and two events sit on top for ergonomics, but the core is just those three pieces.

Installation

Requirements:

  • PHP ^8.3
  • Laravel ^12.0

Install via Composer:

composer require jersey-mike/civilview-salesweb-scraper

Laravel's package discovery picks up the service provider and the Civilview facade automatically — no config/app.php edits.

If you want to override the defaults (base URL, user agent, timeouts, politeness delay), publish the config file:

php artisan vendor:publish --tag=civilview-config

That puts a config/civilview.php in your app where you can tweak things or pull values from .env.

Quick start

use JerseyMike\Civilview\Facades\Civilview;

// Discover what counties exist
$counties = Civilview::counties();
$counties->where('state', 'NJ')->pluck('name');
// Collection: ["Atlantic County", "Bergen County", "Burlington County", ...]

// Pull every active foreclosure case in Burlington County, NJ
$sales = Civilview::sales(countyId: 3);
$sales->count();                                   // 89
$sales->first()->plaintiff();                      // "WELLS FARGO BANK, N.A."
$sales->first()->address();                        // "5 PEAR AVENUE BROWNS MILLS NJ 08015"
$sales->first()->saleDate();                       // "5/7/2026"

Three lines of code, ninety property records.

Usage in depth

All examples assume use JerseyMike\Civilview\Facades\Civilview; at the top of the file. If you'd rather inject the client (for testing or DI preferences), type-hint JerseyMike\Civilview\CivilviewClient instead — the facade is just sugar over a singleton binding.

Listing counties

The Civilview portal's homepage doubles as a county directory. We scrape it once and give you the list as DTOs:

$counties = Civilview::counties();

foreach ($counties as $county) {
    $county->id;     // 3
    $county->name;   // "Burlington County"
    $county->state;  // "NJ"
}

About 47 counties are returned at the time of writing, spanning AZ, CO, DE, FL, IA, ID, IL, KS, LA, MN, NJ, OH, OR, PA, TX, and WA. Some Texas counties have multiple sub-jurisdictions (Constable Precincts 1–5, Sheriff's Office) — those come back as separate County rows with distinct IDs and a descriptive name.

Listing sales for a county

Pass a countyId from the directory:

$sales = Civilview::sales(60); // Philadelphia County, PA → ~1,600 rows

The result is a Laravel Collection of Sale DTOs. Each has:

  • countyId — echoed back so you can group/filter downstream
  • propertyId — the upstream identifier (used to fetch the detail page)
  • detailsUrl — the relative path on civilview.com
  • attributes — an associative array [columnHeader => cellValue] preserving exactly what the table showed

Convenience accessors normalize the most common columns:

$sale->plaintiff();      // "WELLS FARGO BANK, N.A."
$sale->defendant();      // "KENNETH YAUGER; ..."
$sale->address();        // "423 EAST 9TH STREET FLORENCE NJ 08518"
$sale->saleDate();       // "4/30/2026"
$sale->sheriffNumber();  // "25001681"     (NJ counties)
$sale->bookAndWrit();    // "2308-381"     (Philadelphia)
$sale->opaNumber();      // "1313111130"   (Philadelphia)

These accessors are case-insensitive and return null when the column isn't present for that county. For anything county-specific, fall back to $sale->get('Some Custom Header') or $sale->attributes.

Filtering

Build a SalesFilter and pass it as the second argument:

use JerseyMike\Civilview\Data\SalesFilter;

$wellsFargoOpen = Civilview::sales(3, new SalesFilter(
    isOpen: true,                       // open vs. sold/cancelled
    plaintiff: 'WELLS FARGO',           // partial match, the site does the filtering
    defendant: null,
    address: null,
    city: null,
    salesDate: '5/7/2026',              // a specific sale date — exact match
    monthNumber: null,                  // or 1–12 for "all of May"
    sheriffNumber: null,
));

Mechanically: the upstream site stores its countyId in the session and expects a POST to /Sales/SalesSearch with form fields like PlaintiffTitle, IsOpen, PropertyStatusDate, etc. The package handles both — the GET-then-POST dance and the field name mapping — so you stay in PHP-land.

A real session output:

php artisan civilview:scrape --county=3 --filter-status=open --filter-plaintiff="WELLS FARGO"
# → 6 rows

Fetching a sale's detail page

$detail = Civilview::saleDetails(propertyId: 1950595109, countyId: 3);

$detail->propertyId;              // 1950595109
$detail->get('Plaintiff');        // "NEW JERSEY HOUSING AND MORTGAGE FINANCE AGENCY"
$detail->get('Approx. Judgment'); // "$185,432.10"
$detail->get('Attorney');         // "STERN & EISENBERG, PC"

$detail->statusHistory;
// [
//   ['status' => 'Scheduled for',                  'date' => '10/23/2025'],
//   ['status' => '1st Debtor Adjournment to',      'date' => '11/20/2025'],
//   ['status' => '1st Attorney Adjournment to',    'date' => '1/8/2026'],
//   ['status' => '2nd Attorney Adjournment to',    'date' => '2/5/2026'],
//   ['status' => 'Adjourned by Court to',          'date' => '3/19/2026'],
//   ['status' => 'Adjourned by Court to',          'date' => '5/7/2026'],
// ]

Why the second argument matters — the Civilview detail endpoint requires a session that has previously visited a SalesSearch for a matching county. We seat that session for you when you pass countyId. If you don't, you'll get an empty attributes array and a redirect to the homepage. See The session quirk.

Listing + detail in one call

$detailed = Civilview::salesWithDetails(3); // → Collection<SaleDetail>

Internally:

  1. One scrape of the list page (1 HTTP request).
  2. One reused session for the whole batch (1 HTTP request to seat it).
  3. N detail-page GETs, one per sale, with a small usleep between each (configurable; default 250 ms) to avoid hammering the server.

For a 90-row Burlington pull, that's ~92 requests over ~25 seconds. For 1,600-row Philadelphia, plan for several minutes — push it to a queue.

The data model

Three readonly DTOs, all with toArray() for JSON encoding:

County

final readonly class County {
    public int $id;
    public string $name;
    public ?string $state;  // null for jurisdictions outside the US heading parser
}

Sale

final readonly class Sale {
    public int $countyId;
    public int $propertyId;
    public string $detailsUrl;        // e.g. "/Sales/SaleDetails?PropertyId=1950595109"
    public array $attributes;         // ["Plaintiff" => "...", "Address" => "...", ...]

    // Case-insensitive convenience accessors. Return null when missing.
    public function plaintiff(): ?string;
    public function defendant(): ?string;
    public function address(): ?string;
    public function saleDate(): ?string;
    public function sheriffNumber(): ?string;
    public function bookAndWrit(): ?string;
    public function opaNumber(): ?string;
    public function get(string $key): ?string; // for anything else
}

SaleDetail

final readonly class SaleDetail {
    public int $propertyId;
    public array $attributes;        // every key/value pair on the detail page
    public array $statusHistory;     // [['status' => '...', 'date' => '...'], ...]

    public function get(string $key): ?string;
}

The DTOs are deliberately dumb — no Eloquent, no traits, no ORM. They serialize cleanly to JSON, are safe to ship across queues, and can be mapped into whatever schema your application uses.

Cross-county schema variance

Different counties expose different columns. This is the single most important thing to understand if you're storing the data.

Burlington County, NJ (countyId=3):

Sheriff # Sales Date Plaintiff Defendant Address

Philadelphia County, PA (countyId=60):

Book & Writ OPA # Address Plaintiff

Notice: Burlington has a Sales Date column and a Defendant column; Philadelphia has neither. Philadelphia has OPA # (the Office of Property Assessment number — a Philly-specific tax ID); Burlington has no equivalent.

The package handles this by never assuming a fixed schema. The Sale DTO carries an attributes map whose keys come straight from the table's <th> text. The convenience accessors do case-insensitive lookups and fall back to null when a column isn't there.

What this means for you:

  • If you're storing to a relational DB, a JSON/JSONB column for attributes is the path of least friction. You can also pluck fixed columns out (plaintiff, address, sale_date) into typed columns and leave the rest in a JSON blob.
  • If you only care about a few counties, you can hard-code the columns you expect after one quick exploration call.
  • If you're aggregating across counties, do not assume any column exists. Use the convenience accessors (which return null gracefully) and design your reports around what's actually populated.

The session quirk

This is the one piece of weirdness in the upstream site, and the most likely source of confusion if you go off-script.

The detail endpoint, GET /Sales/SaleDetails?PropertyId=NNNN, looks self-sufficient — the URL contains a property ID, you'd expect a row back. But the controller looks up the matching countyId from the ASP.NET session, not the request. If the session has no countyId (or one that doesn't match this property), the response is a 302 to the homepage — sometimes via /Home/Index?aspxerrorpath=/Sales/SaleDetails, which makes it look like a server-side exception when it's really an auth-style guard.

Demonstrated:

# Cold call — no session
curl -i "https://salesweb.civilview.com/Sales/SaleDetails?PropertyId=1950595109"
# → HTTP/2 302
# → location: /

# With a session that has visited SalesSearch?countyId=3 first
curl -c jar -o /dev/null "https://salesweb.civilview.com/Sales/SalesSearch?countyId=3"
curl -b jar "https://salesweb.civilview.com/Sales/SaleDetails?PropertyId=1950595109"
# → HTTP/2 200, ~10 KB of HTML

The package handles this for you — saleDetails($propertyId, $countyId) seats the session, and salesWithDetails() reuses one session across the whole batch (so the seating cost is paid once, not N times). The same applies to filtered list searches: filtered POSTs to /Sales/SalesSearch also need the session-stored countyId, and the package issues the seating GET first.

The only failure mode you can run into: calling saleDetails($propertyId) without a countyId. The package keeps that signature for backward compatibility / advanced cases where you've already seated the session yourself, but in normal use, always pass the countyId you got back on the corresponding Sale.

Artisan command

The package ships a CLI tool useful for ad-hoc queries, JSON exports, and sanity-checking that the upstream site hasn't changed shape.

# List every county and its ID
php artisan civilview:scrape

# Pull all sales for a county (formatted table)
php artisan civilview:scrape --county=3

# JSON output (pipe to jq, redirect to a file)
php artisan civilview:scrape --county=3 --json > burlington.json

# Apply filters
php artisan civilview:scrape --county=3 \
    --filter-status=open \
    --filter-plaintiff="WELLS FARGO" \
    --filter-date="5/7/2026" \
    --json

# Include each sale's detail page (slow — be patient)
php artisan civilview:scrape --county=3 --with-details --json

Available flags:

Flag Description
--county=ID County ID to scrape. Omit to list available counties.
--with-details Also fetch each sale's detail page.
--filter-status=open|sold Filter by docket status.
--filter-plaintiff=NAME Plaintiff substring filter.
--filter-defendant=NAME Defendant substring filter.
--filter-date=MM/DD/YYYY Specific sale date.
--filter-city=NAME City filter (only meaningful for counties that expose it).
--json Emit JSON instead of an ASCII table.

Scheduling, queues, and events

For periodic scraping, dispatch the job from your scheduler:

use JerseyMike\Civilview\Jobs\ScrapeCountyJob;
use JerseyMike\Civilview\Data\SalesFilter;

// routes/console.php (Laravel 12) or app/Console/Kernel.php
Schedule::call(function () {
    foreach ([3, 6, 19, 60] as $countyId) {
        ScrapeCountyJob::dispatch(
            countyId: $countyId,
            filter: new SalesFilter(isOpen: true),
            withDetails: false,
        );
    }
})->dailyAt('06:00');

The job fires events you can listen for to do the actual persistence:

use JerseyMike\Civilview\Events\CountyScraped;
use JerseyMike\Civilview\Events\SaleDetailScraped;

// app/Providers/AppServiceProvider.php (boot method)
Event::listen(function (CountyScraped $event) {
    $event->countyId;  // 3
    $event->sales;     // Collection<Sale>

    foreach ($event->sales as $sale) {
        ForeclosureSale::updateOrCreate(
            ['property_id' => $sale->propertyId],
            [
                'county_id'   => $sale->countyId,
                'plaintiff'   => $sale->plaintiff(),
                'address'     => $sale->address(),
                'sale_date'   => $sale->saleDate(),
                'attributes'  => $sale->attributes,    // JSON column
                'last_seen_at'=> now(),
            ],
        );
    }
});

Event::listen(function (SaleDetailScraped $event) {
    // Only fires when the job was dispatched with withDetails: true
    $event->detail;  // SaleDetail
});

This event-driven design is intentional. The package never imposes a schema on you, but gives you a single hook where persistence belongs. Your tests can fake the events, your listeners can be queued for further async work, and swapping persistence implementations doesn't touch anything in this package.

Configuration

config/civilview.php (after php artisan vendor:publish):

Key Default Notes
base_url https://salesweb.civilview.com Override only for testing/proxy.
user_agent Mozilla/5.0 (compatible; civilview-salesweb-scraper/1.0) Set a contact URL/email here when scraping at volume — it's polite and makes you debuggable to the site operators.
timeout 30 Per-request seconds.
retry_times 3 Wraps Http::retry(). The site occasionally returns transient 5xxs.
retry_sleep_ms 200 Backoff between retries.
detail_delay_ms 250 Pause between consecutive detail fetches in salesWithDetails().

All of these can be backed by .env variables (e.g. CIVILVIEW_DETAIL_DELAY_MS=500) — see the published config for the env keys.

Testing

The package's own test suite:

composer install
vendor/bin/pest

14 tests, all offline — they parse against captured HTML fixtures in tests/Fixtures/ so they're deterministic and don't touch the network. The fixtures cover both schema variants (Burlington's Sheriff#/Defendant shape and Philadelphia's Book & Writ/OPA shape), the homepage, and a synthetic detail page.

For your own application's tests, fake the HTTP layer:

use Illuminate\Support\Facades\Http;

Http::fake([
    'salesweb.civilview.com/Sales/SalesSearch*' =>
        Http::response(file_get_contents(__DIR__ . '/fixtures/burlington.html')),
]);

$sales = Civilview::sales(3);
expect($sales)->toHaveCount(89);

Or fake the events:

Event::fake();
ScrapeCountyJob::dispatchSync(3);
Event::assertDispatched(CountyScraped::class);

For a smoke-test against the real site, the package ships an Orchestra Testbench so you can run the Artisan command without a host app:

vendor/bin/testbench civilview:scrape --county=3 --json | jq '.[0]'

Politeness and legal

This is a public government portal, but you're still hitting someone else's servers. A few rules of the road:

  • Don't hammer. Keep detail_delay_ms reasonable (the default 250 ms is a sensible floor). Don't run twenty-county scrapes in parallel from one IP.
  • Identify yourself. Set a descriptive user_agent with a contact URL or email. If you cause a problem, the operators can email you instead of just blocking your IP.
  • Cache aggressively. The data updates daily at most. Re-pulling the same county every five minutes is wasteful and rude.
  • Respect changes. If the site breaks or starts returning errors, back off — don't retry-loop indefinitely.
  • The data is public, but how you use it isn't always. Some jurisdictions have rules about how foreclosure data can be republished or aggregated commercially. If you're building a product on top of this, talk to a lawyer.

This package is provided as-is and doesn't grant you any rights to the underlying data. Tyler Technologies (the platform vendor) and the counties themselves are the source of truth. The author of this package is unaffiliated with both.

Troubleshooting

"Civilview::saleDetails() returns empty attributes." You called it without a countyId, or with a wrong one. Pass countyId: $sale->countyId from the corresponding Sale.

"My filter returns the same rows as no filter." Most likely you used a column the county doesn't expose (e.g. SalesFilter(defendant: '...') against Philadelphia, which has no defendant column). The site silently ignores unknown filters. Check the column list with an unfiltered scrape first.

"I'm getting 5xx errors from the upstream." The package retries 3 times by default. If errors persist, the site is down — give it a few minutes. You can bump retry_times and retry_sleep_ms in config, but more retries won't fix a server that's genuinely offline.

"My PestPHP tests can't find the package classes." Run composer dump-autoload after pulling new code. The PSR-4 namespace is JerseyMike\Civilview\… (StudlyCase, no hyphens).

"The number of rows changed unexpectedly." The dockets are live. Open cases get adjourned, sold, or cancelled constantly. If you need a stable snapshot, capture and timestamp your own copy.

"A column I rely on disappeared." Counties occasionally rearrange their schemas (it's their portal, not ours). Use $sale->get('Header Name') for tolerant access, log unexpected nulls, and don't hard-code column orders.

License

MIT. See LICENSE.md.