Distributed Computing Archive: June 2009
« Previous | Main | Next »
June 11, 2009
Hadoop Test-Related Issues
I'm getting together with some of the Hadoop committers tomorrow. Considering my quality engineering background, these are some of the discussion items at the top of my mind for the project:- Code Review Guidelines: I wrote these up a couple years ago. Are they being followed? Are they the right set? How can we raise the quality of the code reviews being performed before patches are committed?
- Feature Design Documentation: Can we agree that each feature needs a design doc? A proposed template is attached to HADOOP-5587.
- Feature Test Plans: Can we agree that each feature needs a test plan? A proposed template is also attached to HADOOP-5587.
- Warnings: We're working to reduce static analysis (Findbugs), compiler (javac), and documentation (javadoc) warnings to zero. Can we commit to keeping them there?
- Fault Injection Framework: We're working on a fault-inject framework so that my team and others can write tests that inject faults and monitor the effects. The current work is being contributed on HADOOP-5974. What additional requirements might folks have?
- Usability: Web UI and command lines could use some work to be more consistent and user friendly. Can we agree that no stack trace should, by default, be output to the user when using command line?
- Patch Testing: We've saturated the available hardware for test patches. More hardware is on the way. What problems do we have with the current setup (other than speed)? What improvements can we make?
- Fast Commit Builds: We need a quick (10-minute) build and test target in Hadoop. Once committed, how does this new target fit into the contributor/committer workflow?
- Project Split: This is being tracked as HADOOP-4687. How do we manage build and runtime dependencies between all Hadoop projects?
- TestNG vs Junit4: Should we convert to TestNG to take advantage of some of its unique features, such as data provides and test annotations? This is being tracked in HADOOP-4901.
- True Unit Test: So many of our current JUnit tests are really mini-system tests since they are using MiniMRCluster and MiniDFSCluster to bring up a cluster on a single node in a single process. How do we do better to support and monitor contributions with true unit level tests?
- Testing for Backwards Compatibility: There's a strong desire to get to API and configuration backward compatibility from Hadoop 0.21 forward. After Hadoop 0.21, how do we ensure patches are not breaking backwards compatibility?
Quality and Release Engineering
Yahoo! Cloud Computing
Posted by ndaley at 4:52 PM | Comments (2) | TrackBack | Permalink
June 10, 2009
Announcing the Yahoo! Distribution of Hadoop
Today we're announcing the general availability of the Yahoo! Distribution of Hadoop, a source-only distribution of Apache Hadoop that we deploy here at Yahoo!. In my role as quality and release engineering manager for grid technologies at Yahoo!, including Hadoop, I'm really excited about what this release means for the larger Hadoop ecosystem. Here's why:- We're opening up the results of our investment in quality engineering and scale deployments to the Apache Hadoop community and surrounding ecosystem.
- We're publishing a frequent source distribution that provides a robust foundation on which others can build and deploy their own enterprise distributions, support, and solutions.
- We're committing to keep all of our source code changes for our distributions available as patches in the Apache Hadoop community.
We spend thousands of machine hours to test each release of Hadoop that we deploy internally. We run automated unit, functional, system, and performance tests over a 2-day period on our 500-machine test cluster. This includes interoperability testing of the cross-cluster data-copying tool (distcp), HDFS and MapReduce benchmarks, and various fault scenarios. All of the unit and performance tests are currently available in Apache Hadoop. We are working towards contributing the functional and system tests back to the community. We deploy Hadoop on tens of thousands of machines. These machines are divided into a few tiers, each with many large clusters. In order to support internal feature requests and reliability requirements, we test and deploy frequent bug fix and feature releases to an experimental tier of clusters. Once stabilized sufficiently, these releases progress to additional tiers, eventually landing on a production tier, where Hadoop provides a mission critical platform for many core business units at Yahoo! As a release stabilizes and progresses to new tiers, we inevitably discover, fix, test, and deploy new micro releases quickly. All of this investment in testing and stabilizing Hadoop is now available to anyone. Providing a robust foundation for other distributions, support, and solutions
This distribution is largely a response to the numerous requests that we have received to share Yahoo!'s internally tested and scale-proven releases. As the pace of Hadoop adoption has increased, so have requests for these releases. The Yahoo! Distribution of Hadoop provides a base for others to build their own distributions, commercial support, and solutions. I believe this will broaden the use of Hadoop and speed its development, growth, and quality, by which we will all benefit. To be clear, this is not a new business for Yahoo!. We will not be providing support or services for our distribution, but we hope that by releasing our internally tested version, third parties will build enterprise support and services on top of our distribution. Providing all our patches under the Apache License
The pace of our internal releases and the demand for new features has required a number of features to be internally back-ported. With this release, we're committing to contribute back these internally back-ported features to the community and ensure all code in the Yahoo! Distribution of Hadoop is either in the Apache code repository or posted as patches in the Apache Hadoop community. Hadoop is helping us solve key science and research problems in hours or days instead of months. It provides us a platform to solve extreme problems requiring massive amounts of data processing. It underpins major revenue-generating systems. Opening our distribution enables a faster pace of innovation for the entire Hadoop ecosystem and broadens the use — and ultimately the quality — of this key platform across the industry. Go get it! Nigel Daley
Quality and Release Engineering Manager
Yahoo! Grid Technologies
Posted by ndaley at 9:30 AM | Comments (7) | TrackBack | Permalink
Subscribe
Recent Blog Articles
view all
Slides from Hadoop World and University Talks
Wed, 28 Oct 2009
Hadoop User Group (HUG) – Oct 21st at Yahoo!
Fri, 23 Oct 2009
M45 Enables Web-Scale Information Extraction Research
Fri, 23 Oct 2009
Slides of September 23rd Bay Area Hadoop User Group
Mon, 05 Oct 2009
New Update: Yahoo! Distribution of Hadoop
Thu, 01 Oct 2009
Recent Links
Web addresses may adopt non-English characters | Digital Media - CNET News
Mon, 26 Oct 2009
Yahoo Open Hack NYC - Open Blog - NYTimes.com
Thu, 15 Oct 2009
Music Hack Day - Boston - Nov 20-21
Sun, 11 Oct 2009
A List Apart: Articles: Discovering Magic
Tue, 06 Oct 2009
Building iPhone Apps with HTML, CSS, and JavaScript
Sun, 04 Oct 2009
Archives

