haikson / sitemap-generator
Sitemap crawler and generator class
Installs: 0
Dependents: 0
Suggesters: 0
Security: 0
Stars: 36
Watchers: 5
Forks: 18
Open Issues: 3
Language:Python
Requires
- beautifulsoup4: >=4.4.0
- mechanize: >=0.2.5
This package is not auto-updated.
Last update: 2020-01-10 15:16:35 UTC
README
Sitemap generator
installing
pip install sitemap-generator
Gevent
Sitemap-generator uses gevent to implement multiprocessing. Install gevent:
pip install gevent
example
import pysitemap
if __name__ == '__main__':
url = 'http://www.example.com/' # url from to crawl
logfile = 'errlog.log' # path to logfile
oformat = 'xml' # output format
crawl = pysitemap.Crawler(url=url, logfile=logfile, oformat=oformat)
crawl.crawl()
multiprocessing example
import pysitemap
if __name__ == '__main__':
url = 'http://www.example.com/' # url from to crawl
logfile = 'errlog.log' # path to logfile
oformat = 'xml' # output format
crawl = pysitemap.Crawler(url=url, logfile=logfile, oformat=oformat)
crawl.crawl(pool_size=10) # 10 parsing processes