Skip to content

spekulatius/awesome-php-scrapers-and-crawlers

Repository files navigation

Awesome PHP Scrapers, Spiders and Crawlers Awesome lint

A collection of scrapers, spiders, crawlers, and related tools.

A curated list of anything open-source in the PHP crawler and scraping space: Scrapers, Crawlers, Spiders, Tools and along with how to guides, articles, etc.

Contents

Crawlers

  • Spatie/Crawler - An easy to use, powerful crawler implemented in PHP. Can execute JavaScript. Toolkit available for those keen to use the full power of the Spatie crawler.
  • crawlzone/crawlzone - Crawlzone is a fast asynchronous crawling framework.
  • zrashwani/arachnid - SEO-focused crawler to collect link information, etc.
  • nadar/crawler - A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.

Spiders

Scrapers

Tools and Related Libraries

  • spatie/robots-txt - Determine if a page may be crawled from robots.txt, robots meta tags and robot headers.
  • symfony/dom-crawler - The DomCrawler component eases DOM navigation for HTML and XML documents.
  • symfony/panther - A browser testing and web crawling library for PHP and Symfony.

Detection

HTML Handling: Serialization, Sanitization, etc

Contributing

Contributions of any kind welcome, just follow the guidelines!

Contributors

Thanks goes to these contributors!

About

An awesome list covering PHP scrapers, spiders and crawlers

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •