Facebook Scale Graph Processing using Giraph.


This talk will discuss the design and architecture of Giraph - an offline distributed graph processing system built on top of Hadoop. We have scaled Giraph to process very large graphs. For example at Facebook we run PageRank on a 400 billion edge graph in a matter of minutes. A similar workload in Hive or Hadoop takes many hours and requires an order of magnitude more machines. The talk will go through the design decisions we made in order to keep Giraph simple to use yet expressive and powerful. We will dive into the architecture that allows Giraph to scale to very large data sizes. Giraph utilizes Hadoop for job scheduling, resource management, and checkpointing, among other things. We ended up customizing core functionality for efficiency wins. In particular, we built on our own completely in-memory message passing system and use Netty I/O with a lot of caching for performance.

Slides are here: http://www.slideshare.net/nitayj/2013-0603-berlin-buzzwords

About the speaker: 
Nitay Joffe is a software engineer at Facebook working on large scale graph computation. He has contributed to several Apache projects including HBase, Thrift and ZooKeeper before working on Giraph. He likes wheat beer.

Schedule info

Time slot: 
3 June 12:00 - 12:45