Part 1: Algorithms.
Generally two categories:
1) Apriori-based
- AGM: Inokuchi, et al. (PKDD’00)
- FSG: Kuramochi and Karypis (ICDM’01)
- FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04)
2) Pattern-growth
- MoFa: Borgelt and Berthold (ICDM’02)
- gSpan: Yan and Han (ICDM’02)
- Gaston: Nijssen and Kok (KDD’04)
Part 2: MapReduce/BSP implementations.
Here are some open source graph processing frameworks. Most follow the Bulk Synchronous Parallel model, where vertices send messages to each other in a series of iterations called supersteps.
1. Giraph [1], a Java BSP implementation that runs on existing Hadoop clusters and provides fault tolerance for both the workers and the master using ZooKeeper. Giraph came out of Yahoo! and is now an Apache Incubator project.
2. Apache Hama [2] is a pure BSP(Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Also an Apache Incubator project, and works with ZooKeeper.
3. GraphLab [3], a new, asynchronous programming model in C++ from Carnegie Mellon.
4. Phoebus [4], an Erlang BSP implementation that can use HDFS for storage.
5. Golden Orb [5], another Java BSP implementation on Hadoop.
6. Signal/Collect [6], a Scala BSP implementation from the University of Zurich. Signal/Collect also supports extensions including an asynchronous mode.
7. Spark [7], a Scala framework from UC Berkeley that aims to balance efficiency and fault tolerance by providing immutable, distributed, in-memory collections. There's a BSP implementation on top of Spark [8].
8. PEGASUS [9], a pure MapReduce implementation on Hadoop.
Problem domain:
- Hama: a pure BSP implemention. Target general computation, not only graph processing.
- Giraph: similar to Hama, but more specifically for graphs, such as page rank, shared connections, personalization-based popularity, etc. Giraph performs like a map-only job based Hadoop.
- GraphLab: Supports machine learning and graph computation, such as Jacobi, Gaussian Belief Propagation (GaBP), Conjugate gradient, Matrix operations, collaborative filtering, clutering, and belief propagation algorithm, etc.
- Phoebus, Golden Orb, Bagel: Based on Google Pregel. Currently support basic graph computation.
- Sigal/Collect: Support synchronous and asynchronous algorithms on graph, such as, shortest path, belief propagation, coloring, etc.
- PEGASUS: Supports computation for Degree, PageRank, Random Walk with Restart (RWR), Radius, Connected components
Reference:
[1] Giraph: http://incubator.apache.org/giraph/
[2] Hama: http://incubator.apache.org/hama/
[3] GraphLab: http://graphlab.org/
[4] Phoebus: https://github.com/xslogic/phoebus
[5] Golden Orb: http://www.goldenorbos.org/
[6] Signal/Collect: http://code.google.com/p/signal-collect/
[7] Spark: http://spark-project.org/
[8] Bagel (BSP on Spark): https://github.com/mesos/spark/wiki/Bagel-Programming-Guide
[9] PEGASUS: www.cs.cmu.edu/~pegasus/