rauwang/paging-steal

遍历分页爬取,记录爬取断点,断点续爬

dev-master 2020-12-16 07:21 UTC

This package is auto-updated.

Last update: 2024-06-16 15:11:44 UTC


README

简单地封装一个断点续爬的小插件,只需要继承3个抽象类和实现一个接口就可以了。

抽象类是对三个数据模型的操作进行抽象;接口是对分页操作的方法进行抽象。

抽象类

  • StealTarget: 记录分页的信息;
  • StealBreakpoint:记录断点的信息;
  • StealDataPage:记录已爬节点的信息;

接口

  • PagingSteal:具体的分页操作接口;

实现抽象类和接口

  • TestStealTarget实现StealTarget抽象类:
use Rauwang\PagingSteal\Driver\Repositories\StealTarget;

class TestStealTarget extends StealTarget {
	// ...
}
  • TestStealDataPage实现StealDataPage抽象类:
use Rauwang\PagingSteal\Driver\Repositories\StealDataPage;

class TestStealDataPage extends StealDataPage {
    // ...
}
  • TestStealBreakpoint实现StealBreakpoint抽象类:
use Rauwang\PagingSteal\Driver\Repositories\StealBreakpoint;

class TestStealBreakpoint extends StealBreakpoint {
    // ...
}
  • PagingStealDemo1实现PagingSteal接口:
use Rauwang\PagingSteal\Driver\PagingSteal;

class PagingStealDemo1 implements PagingSteal {
    // ...
}

配置

\Rauwang\PagingSteal\PagingSteal::init(
	TestStealTarget::class,
    TestStealBreakpoint::class,
    TestStealDataPage::class,
);

调用

\Rauwang\PagingSteal\PagingSteal::build(PagingStealDemo1::class)->steal();