Facebook Scale Graph Processing using Giraph.
This talk will discuss the design and architecture of Giraph - an offline distributed graph processing system built on top of Hadoop. We have scaled Giraph to process very large graphs. For example at Facebook we run PageRank on a 400 billion edge graph in a matter of minutes. A similar workload in Hive or Hadoop takes many hours and requires an order of magnitude more machines. The talk will go through the design decisions we made in order to keep Giraph simple to use yet expressive and powerful. We will dive into the architecture that allows Giraph to scale to very large data sizes. Giraph utilizes Hadoop for job scheduling, resource management, and checkpointing, among other things. We ended up customizing core functionality for efficiency wins. In particular, we built on our own completely in-memory message passing system and use Netty I/O with a lot of caching for performance.
Slides are here: http://www.slideshare.net/nitayj/2013-0603-berlin-buzzwords
Schedule info
- Login to post comments