Big Data Spain

15th ~ 16th OCT 2015 MADRID, SPAIN #BDS15


THANK YOU FOR AN AMAZING CONFERENCE!


THE 4th EDITION OF BIG DATA IN Oct 2015 WAS A RESOUNDING SUCCESS.

FRONTERA: OPEN SOURCE LARGE-SCALE WEB CRAWLING FRAMEWORK

Thursday 15th

from 17:00 pm to 17:45 pm

Room 25

-

Technical

In this talk Alex is going to introduce new open source framework Frontera. Frontera is a crawl frontier framework, telling your web crawler what to crawl and when.

It's basically the brain of your web crawler. Frontera allows to build real-time, large scale, distributed web crawlers. Offering:

  • customizable storage (RDBMS or Key-Value based),
  • crawling strategies management,
  • transport layer abstraction,
  • fetcher abstraction.

Along with framework description Alex will share with you technical problems he faced developing framework and demonstrate how to build a distributed crawler using Scrapy, Apache Kafka and HBase. The talk is organized in funny and exciting form of a story.

Alexander Sibiryakov foto

Alexander Sibiryakov

ScrapinghubData Scientist