Distributed Computing Archive: October 2008
« Previous | Main | Next »
October 28, 2008
Pig - The Road to an Efficient High-level language for Hadoop
Pig started as a research project within Yahoo! in the summer of 2006. The original prototype quickly became very popular with users. It was clear that a higher level language than raw map-reduce was needed to quickly rollout prototypes as well as to build production quality applications. Early adopters within Yahoo! have reported substantial increases in productivity when they migrated from raw map-reduce to Pig.
In the summer of 2007 a team was put together to make the project into a product. Working within an open source community was perceived as one of the important early goals of the project. Pig has been part of the open source community for over a year, joining Apache Incubator in September of 2007. During this time Pig has developed a community of users and developers, and added two new committers. It also gained wide popularity within Yahoo! with 30% of all Hadoop jobs using Pig - which amounts to thousands per day!
A lot of great technical work went into the project which helped with the adoption and popularity of the system. The early work included the addition of streaming operator, parameter substitution, error handling, and some performance improvements like using binary comparators and combiner.
More recently the entire system, from the parser down, has been rebuilt making the code much cleaner, extensible, and efficient. A types system was also added further improving performance and allowing for early error detection. This work is still in progress but the early performance numbers are quite impressive - we are seeing from 40% to 10x speedups between the old and the new code.
Our technical improvements and the growth of the Pig community allowed us to graduate from the Incubator and to join Hadoop as a sub-project. The entire Pig community is excited about reaching this important milestone and the opportunities that being part of the Hadoop family provides! Long live the Pig! :)
Olga Natkovich
Yahoo!
Posted by aanand at 10:31 AM | Comments (2) | TrackBack | Permalink
October 16, 2008
Hadoop User Group Meeting
In response to a number of requests from folks outside the Bay Area to have us record and post the Hadoop User Group presentations, here are the talks from the October meeting which was held this week at the Yahoo! Mission College campus.
We had Jun Rao from IBM Almaden Research talk about “Exploiting database join techniques for analytics with Hadoop”. This was followed by an update on Jaql by Kevin Beyer from IBM, who informed us that Jaql is now available as Open Source. The last talk was a lively discussion with Sriram Rao from Quantcast about his “Experiences moving a Petabyte Data Center”.
Bay Area Hadoop User Group meetings are usually held on the third Wednesday of each month at Yahoo! Mission College in Santa Clara.
Ajay Anand
Yahoo! Grid Computing
Posted by aanand at 2:44 PM | Comments (4) | TrackBack | Permalink
October 9, 2008
Hadoop Camp at ApacheCon
Following up on the interest in the Hadoop Summit which we held a few months ago, we got together with the ApacheCon folks to arrange a Hadoop Camp at their conference this year.
Hadoop Camp will be held on November 6th and 7th in New Orleans as part of ApacheCon this year. Along the lines of the summit, we have speakers from some of the leading companies developing on and using Hadoop, including Facebook, Amazon, IBM, Hewlett-Packard, Sun, Powerset, and Yahoo! in what is possibly the largest gathering of Hadoop committers, developers and users outside of the Bay Area.
In addition to the Camp, there is a Hadoop tutorial on Monday, November 3rd, and we are also looking into coordinating a Hadoop “hack” contest that would run through the week at ApacheCon.
We are looking forward to a strong turnout!
Ajay Anand
Yahoo!
Posted by aanand at 7:58 AM | Comments (0) | TrackBack | Permalink
Subscribe
Recent Blog Articles
view all
Slides from Hadoop World and University Talks
Wed, 28 Oct 2009
Hadoop User Group (HUG) – Oct 21st at Yahoo!
Fri, 23 Oct 2009
M45 Enables Web-Scale Information Extraction Research
Fri, 23 Oct 2009
Slides of September 23rd Bay Area Hadoop User Group
Mon, 05 Oct 2009
New Update: Yahoo! Distribution of Hadoop
Thu, 01 Oct 2009
Recent Links
Web addresses may adopt non-English characters | Digital Media - CNET News
Mon, 26 Oct 2009
Yahoo Open Hack NYC - Open Blog - NYTimes.com
Thu, 15 Oct 2009
Music Hack Day - Boston - Nov 20-21
Sun, 11 Oct 2009
A List Apart: Articles: Discovering Magic
Tue, 06 Oct 2009
Building iPhone Apps with HTML, CSS, and JavaScript
Sun, 04 Oct 2009
Archives

