Fighting bird分享 http://blog.sciencenet.cn/u/tonia

博文

[转载]Hadoop jumps through hoops, becomes mainstream

已有 3595 次阅读 2012-3-12 23:26 |系统分类:博客资讯|关键词:Hadoop| Hadoop |文章来源:转载

  

One of the things I love most about the software industry is the way new technologies can materialize from unlikely places and get applied in unexpected ways.Hadoop is a great example of this. Conceived by the open source community,GoogleYahoo and others, this programming framework has emerged as a promising solution to the big data problem.

I expect Hadoop to become enterprise-ready within the next 18 months. Encouraged by the arrival of innovative Hadoop vendors, many Fortune 500 companies — including eBay,Bank of America and JP Morgan — are experimenting with Hadoop deployments. As a technologist and an investor in this sector (Norwest Venture Partners, where I am a general partner, is an investor in Hadapt), I believe these investigations are quickly evolving into serious roll-outs. The following five key factors will accelerate mainstream adoption, making 2012 and 2013 Hadoop’s breakout years.

  •  1.  SQL provides a “fast pass” to Hadoop

The first hurdle Hadoop must clear is the stigma of its origins. As a product of the open source community, Hadoop and its countless siblings are regarded by traditional IT shops with confusion, suspicion, or even abject terror. Whatever their potential, these revolutionary interlopers threaten huge investments in expensive applications and proprietary technologies.

An SQL interface can help bridge the gap between the future, current and legacy technologies. Organizations are already purchasing Hadoop tools that offer various levels of SQL compatibility. We expect Hadoop to acquire deeper and deeper SQL support — andHive, an open source SQL interface for Hadoop, is a good start.

In the next 18 months, I think we will see large retailers, financial services, Wall Street and the government using this “fast pass” SQL option to initiate much broader Hadoop deployments.

  • 2.  Hadoop performance gets a big boost

One of the leading reasons to use Hadoop is its extreme scalability. To date, that scalability has often come with significant performance penaltiesincluding MapReduce query overhead and a storage layer that requires broad scans across file systems. If big data can’t produce information on demand, then it’s just an albatross.

Fortunately, the entire Hadoop industry — including a rapidly proliferating group of startups (ClouderaHadapt, HortonworksMapR), the amazingly innovative open source community, and such established vendors as IBM — are aggressively tackling these performance issues. The forthcoming Hadoop v0.23 and subsequent releases will include performance-boosting enhancements, including basic file system performance, minimum MapReduce job latency, and higher-level query interface (e.g. Hive, Apache Pig) performance.

  • 3.  Hadoop becomes increasingly reliable

To avoid having a single point of failure, Hadoop needs to address topology and deployment concerns left over from its initial incarnationHadoop employs a master node to keep track of data and to determine how to access it. If this “brain” goes down, everything could be at risk without the correct topology and redundancy. Over time, the Hadoop community will make improvements in this area. Cloudera, Hortonworks, MapR and other commercial vendors are already addressing this.

  •  4.  Mainstream case studies emerge

Hadoop is a grassroots phenomenon that emerged in the social networking and consumer Internet world. As always, there are early adopters who take risks on the cutting edge, and there are more conservative organizations watching the pioneers from the sidelines.

This played out in 2011 as early customer experiences with Hadoop were shared via conferences, online forums and vendor white papers. Experts think Hadoop is on the edge of a tipping point, as some of the earliest Hadoop implementers move from experiments to adoption. As a result, people implementing Hadoop today are benefiting from the lessons learned by the early pioneers.

In 2012 and 2013, we will see a growing body of case studies and the emergence of best practices as Hadoop technology matures and gets deployed in traditional enterprise environments. In short, Hadoop’s momentum will grow exponentially in the next 18 months.

If becoming mainstream is step four in the technology adoption process, Hadoop will move through step two and into step three this year and next.

  • 5.  The architecture evolves

Hadoop applications process vast amounts of data in parallel across many computers, relying upon MapReduce as the enabling distribution framework. Currently, Hadoop tightly couples distributed resource management and a single distributed programming paradigm (MapReduce) into one package. The Hadoop community is now decoupling the two functions. Separating these will provide more control over the different system functions and free up query processing.

Future releases of Hadoop will have an enhanced MapReduce framework and will feature a growing array of alternative distributed computing paradigms. Likely candidates includeMessage Passing Interface (MPI), distributed shell systems, OpenDremel and Bulk Synchronous Parallel (BSP). With these additional programming and distribution options, Hadoop will be able to support an even greater variety of workloads.

Hadoop is here to stay

Over the next few years, Hadoop will become a common component of the standard IT tool belt. To meet this demand, vendors are starting to package Hadoop into commercial off-the-shelf software (COTS).

Hadoop adoption will build on itself as organizations augment Hadoop solutions and grow ecosystems around them. Before our very eyes, Hadoop is becoming a platform.

Matt Howard is a general partner at Norwest Venture Partners (NVP), where he invests in mobile and wireless, big data, security, rich media, networking and storage sectors. He currently serves on the boards of Avere SystemsBlue Jeans NetworkConteXtream,Hadapt, MobileIronRetrevo and Summit Microelectronics. He blogs at NVP Blog.

转自:GIGAOM



http://blog.sciencenet.cn/blog-425672-547101.html

上一篇:Big Data: Principles and best practices of scalable realtime
下一篇:About frequent graph mining

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备14006957 )

GMT+8, 2019-12-9 18:14

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部