yggverse / yo
Yo! Micro Web Crawler in PHP & Manticore
Installs: 9
Dependents: 0
Suggesters: 0
Security: 0
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 7
Type:project
pkg:composer/yggverse/yo
Requires
- gregwar/captcha: ^1.2
- jdenticon/jdenticon: ^1.0
- manticoresoftware/manticoresearch-php: ^3.1
- symfony/css-selector: ^6.3
- symfony/dom-crawler: ^6.3
- yggverse/ftp: ^1.0
- yggverse/net: ^1.3
- yggverse/yo-tools: dev-main
This package is auto-updated.
Last update: 2025-10-18 10:51:01 UTC
README
Micro Web Crawler in PHP & Manticore
Yo! is the super thin client-server crawler based on Manticore full-text search.
Compatible with different networks, includes flexible settings, history snaps, CLI tools and adaptive JS-less UI.
Available alternative branch for Gemini Protocol!
Features
- MIME-based crawler with flexible filter settings by regular expressions, selectors, external links etc
- Page snap history with local and remote mirrors support (including FTP protocol)
- CLI tools for index administration and crontab tasks
- JS-less frontend to run local or public search web portal
Components
- Manticore Server
- PHP library for Manticore
- PHP library for Network operations
- Symfony DOM crawler
- Symfony CSS selector
- FTP client for snap mirrors
- Hostname ident icons
- Captcha
- Bootstrap icons
Install
Environment
Debian
wget https://repo.manticoresearch.com/manticore-repo.noarch.debdpkg -i manticore-repo.noarch.debapt updateapt install git composer manticore manticore-extra php-fpm php-curl php-mbstring php-gd
Yo search engine uses Manticore as the primary database. If your server sensitive to power down,
change default binlog flush strategy to binlog_flush = 1
Deployment
Project in development, to create new search project, use dev-main branch:
composer create-project yggverse/yo:dev-main
Development
git clone https://github.com/YGGverse/Yo.gitcd Yocomposer updategit checkout -b pr-branchgit commit -m 'new fix'git push
Update
cd Yogit pullcomposer update
Init
cp example/config.json config.jsonphp src/cli/index/init.php
Usage
php src/cli/document/add.php URLphp src/cli/document/crawl.phpphp src/cli/document/search.php '*'
Web UI
cd src/webuiphp -S 127.0.0.1:8080- open
http://127.0.0.1:8080in browser
Documentation
CLI
Index
Init
Create initial index
php src/cli/index/init.php [reset]
reset- optional, reset existing index
Alter
Change existing index
php src/cli/index/alter.php {operation} {column} {type}
operation- operation name, supported values:add|dropcolumn- target column nametype- target column type, supported values:text|integer
Document
Add
php src/cli/document/add.php URL
URL- add new URL to the crawl queue
Crawl
php src/cli/document/crawl.php
Clean
Make index optimization, apply new configuration rules
php src/cli/document/clean.php [limit]
limit- integer, documents quantity per queue
Search
php src/cli/document/search.php '@title "*"' [limit]
query- requiredlimit- optional search results limit
Migration
YGGo
Import index from YGGo database
php src/cli/yggo/import.php 'host' 'port' 'user' 'password' 'database' [unique=off] [start=0] [limit=100]
Source DB fields required:
hostportuserpassworddatabaseunique- optional, check for unique URL (takes more time)start- optional, offset to start queuelimit- optional, limit queue
Backup
Logical
SQL text dumps could be useful for public index distribution, but requires more computing resources.
Physical
Better for infrastructure administration and includes original data binaries.
Instances
Yggdrasil
http://[201:23b4:991a:634d:8359:4521:5576:15b7]/yo/- IPv60200::/7addresses only | indexhttp://yo.ygghttp://yo.ygg.athttp://ygg.yo.index