Yahoo! Developer Network Blog
« Previous | Main | Next »
June 10, 2009
Announcing the Yahoo! Distribution of Hadoop
Today we're announcing the general availability of the Yahoo! Distribution of Hadoop, a source-only distribution of Apache Hadoop that we deploy here at Yahoo!. In my role as quality and release engineering manager for grid technologies at Yahoo!, including Hadoop, I'm really excited about what this release means for the larger Hadoop ecosystem. Here's why:- We're opening up the results of our investment in quality engineering and scale deployments to the Apache Hadoop community and surrounding ecosystem.
- We're publishing a frequent source distribution that provides a robust foundation on which others can build and deploy their own enterprise distributions, support, and solutions.
- We're committing to keep all of our source code changes for our distributions available as patches in the Apache Hadoop community.
We spend thousands of machine hours to test each release of Hadoop that we deploy internally. We run automated unit, functional, system, and performance tests over a 2-day period on our 500-machine test cluster. This includes interoperability testing of the cross-cluster data-copying tool (distcp), HDFS and MapReduce benchmarks, and various fault scenarios. All of the unit and performance tests are currently available in Apache Hadoop. We are working towards contributing the functional and system tests back to the community. We deploy Hadoop on tens of thousands of machines. These machines are divided into a few tiers, each with many large clusters. In order to support internal feature requests and reliability requirements, we test and deploy frequent bug fix and feature releases to an experimental tier of clusters. Once stabilized sufficiently, these releases progress to additional tiers, eventually landing on a production tier, where Hadoop provides a mission critical platform for many core business units at Yahoo! As a release stabilizes and progresses to new tiers, we inevitably discover, fix, test, and deploy new micro releases quickly. All of this investment in testing and stabilizing Hadoop is now available to anyone. Providing a robust foundation for other distributions, support, and solutions
This distribution is largely a response to the numerous requests that we have received to share Yahoo!'s internally tested and scale-proven releases. As the pace of Hadoop adoption has increased, so have requests for these releases. The Yahoo! Distribution of Hadoop provides a base for others to build their own distributions, commercial support, and solutions. I believe this will broaden the use of Hadoop and speed its development, growth, and quality, by which we will all benefit. To be clear, this is not a new business for Yahoo!. We will not be providing support or services for our distribution, but we hope that by releasing our internally tested version, third parties will build enterprise support and services on top of our distribution. Providing all our patches under the Apache License
The pace of our internal releases and the demand for new features has required a number of features to be internally back-ported. With this release, we're committing to contribute back these internally back-ported features to the community and ensure all code in the Yahoo! Distribution of Hadoop is either in the Apache code repository or posted as patches in the Apache Hadoop community. Hadoop is helping us solve key science and research problems in hours or days instead of months. It provides us a platform to solve extreme problems requiring massive amounts of data processing. It underpins major revenue-generating systems. Opening our distribution enables a faster pace of innovation for the entire Hadoop ecosystem and broadens the use — and ultimately the quality — of this key platform across the industry. Go get it! Nigel Daley
Quality and Release Engineering Manager
Yahoo! Grid Technologies
Posted at June 10, 2009 9:30 AM
Comments
Fantastic news- having access to this level of tested/production code will give us much more confidence in moving more of our critical production infrastructure to Hadoop. Way to go guys!
Posted by: Lance Riedel at June 10, 2009 9:57 AM | Permalink
Awesome! Great news. Keep up the good work!
Posted by: Hamlet Khodaverdian at June 10, 2009 8:14 PM | Permalink
Thats good news!! It would be nice if you could highlight the advantages of this version compared to the one given by Apache.
Posted by: Ritesh M Nayak at June 10, 2009 10:26 PM | Permalink
Why fork?
Why not just contribute back to Apache?
Posted by: Aaron at June 14, 2009 8:47 PM | Permalink
This release is good news, thanks! Does your testing include Pig?
Posted by: David Fallside at June 23, 2009 11:41 AM | Permalink
Hadoop will be a great acquisition for yahoo
Posted by: Emagrecer at July 11, 2009 7:05 AM | Permalink
oh nice topic.
Posted by: uggs at October 15, 2009 2:25 AM | Permalink
is it different from Cloudera or Apache distributions of Hadoop? Why didn't you simple make changes to Apache Hadoop?
Posted by: shahryar ghazi at December 4, 2009 6:35 AM | Permalink
shahryar,
All the code in the Y! Distro is in Apache Hadoop mainline or patches on Apache Hadoop Jira. You could think of the Y! Distro as a preview of what's to come in later releases of Apache Hadoop. Plus it's been tested at large scale.
Posted by: Nigel at December 9, 2009 11:00 PM | Permalink
Post a comment
Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service.
Hadoop is a trademark of the Apache Software Foundation.
Subscribe
Recent Blog Articles
view all
Hadoop Bay Area User Group - Feb 17th at Yahoo!, Sunnyvale
Wed, 03 Feb 2010
Comparing Pig Latin and SQL for Constructing Data Processing Pipelines
Fri, 29 Jan 2010
Video from Jan. 20, 2010 Hadoop Bay Area User Group now online
Thu, 28 Jan 2010
Stomping out Java "concurrency cockroaches" with SureLogic's Flashlight and JSure tools
Tue, 26 Jan 2010
Hadoop Bay Area January 2010 User Group - Recap
Thu, 21 Jan 2010
Recent Links
Appcelerator Titanium + Yahoo YQL on Vimeo
Mon, 08 Feb 2010
Tue, 02 Feb 2010
PhoneGap | Cross platform mobile framework
Sat, 30 Jan 2010
Web developers can rule the iPad - O'Reilly Radar
Sat, 30 Jan 2010
rc3.org - Is the iPad the harbinger of doom for personal computing?
Thu, 28 Jan 2010
Archives
Recent Readers

