Fighting bird分享 http://blog.sciencenet.cn/u/tonia

博文

About frequent graph mining

已有 8526 次阅读 2012-4-7 03:09 |个人分类:cloud|系统分类:科研笔记| Graph, mapreduce, bsp

Part 1: Algorithms.
Generally two categories:
1) Apriori-based
- AGM: Inokuchi, et al. (PKDD’00)
- FSG: Kuramochi and Karypis (ICDM’01)
- FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04)
2) Pattern-growth
- MoFa: Borgelt and Berthold (ICDM’02)
- gSpan: Yan and Han (ICDM’02)
- Gaston: Nijssen and Kok (KDD’04)

Part 2: MapReduce/BSP implementations.
Here are some open source graph processing frameworks. Most follow the Bulk Synchronous Parallel model, where vertices send messages to each other in a series of iterations called supersteps.

1. Giraph [1], a Java BSP implementation that runs on existing Hadoop clusters and provides fault tolerance for both the workers and the master using ZooKeeper. Giraph came out of Yahoo! and is now an Apache Incubator project.
2. Apache Hama [2] is a pure BSP(Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms. Also an Apache Incubator project, and works with ZooKeeper.
3. GraphLab [3], a new, asynchronous programming model in C++ from Carnegie Mellon.
4. Phoebus [4], an Erlang BSP implementation that can use HDFS for storage.
5. Golden Orb [5], another Java BSP implementation on Hadoop.
6. Signal/Collect [6], a Scala BSP implementation from the University of Zurich. Signal/Collect also supports extensions including an asynchronous mode.
7. Spark [7], a Scala framework from UC Berkeley that aims to balance efficiency and fault tolerance by providing immutable, distributed, in-memory collections. There's a BSP implementation on top of Spark [8].
8. PEGASUS [9], a pure MapReduce implementation on Hadoop.

Problem domain:
- Hama: a pure BSP implemention. Target general computation, not only graph processing.
- Giraph: similar to Hama, but more specifically for graphs, such as page rank, shared connections, personalization-based popularity, etc. Giraph performs like a map-only job based Hadoop.
- GraphLab: Supports machine learning and graph computation, such as Jacobi, Gaussian Belief Propagation (GaBP), Conjugate gradient, Matrix operations, collaborative filtering, clutering, and belief propagation algorithm, etc.
- Phoebus, Golden Orb, Bagel: Based on Google Pregel. Currently support basic graph computation.
- Sigal/Collect: Support synchronous and asynchronous algorithms on graph, such as, shortest path, belief propagation, coloring,  etc.
- PEGASUS: Supports computation for Degree, PageRank, Random Walk with Restart (RWR), Radius, Connected components 

Reference:
[1] Giraph: http://incubator.apache.org/giraph/
[2] Hama: http://incubator.apache.org/hama/
[3] GraphLab: http://graphlab.org/
[4] Phoebus: https://github.com/xslogic/phoebus
[5] Golden Orb: http://www.goldenorbos.org/
[6] Signal/Collect: http://code.google.com/p/signal-collect/
[7] Spark: http://spark-project.org/
[8] Bagel (BSP on Spark): https://github.com/mesos/spark/wiki/Bagel-Programming-Guide
[9] PEGASUS: www.cs.cmu.edu/~pegasus/


https://blog.sciencenet.cn/blog-425672-556241.html

上一篇:Big Data: Principles and best practices of scalable realtime
下一篇:GibbsLDA++ 使用记录
收藏 IP: 64.134.175.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-11-23 18:18

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部