Welcome, guest Sign In

Yahoo! Developer Network Blog

« Previous | Main | Next »


October 28, 2008

Pig - The Road to an Efficient High-level language for Hadoop

Pig started as a research project within Yahoo! in the summer of 2006. The original prototype quickly became very popular with users. It was clear that a higher level language than raw map-reduce was needed to quickly rollout prototypes as well as to build production quality applications. Early adopters within Yahoo! have reported substantial increases in productivity when they migrated from raw map-reduce to Pig.

In the summer of 2007 a team was put together to make the project into a product. Working within an open source community was perceived as one of the important early goals of the project. Pig has been part of the open source community for over a year, joining Apache Incubator in September of 2007. During this time Pig has developed a community of users and developers, and added two new committers. It also gained wide popularity within Yahoo! with 30% of all Hadoop jobs using Pig - which amounts to thousands per day!

A lot of great technical work went into the project which helped with the adoption and popularity of the system. The early work included the addition of streaming operator, parameter substitution, error handling, and some performance improvements like using binary comparators and combiner.

More recently the entire system, from the parser down, has been rebuilt making the code much cleaner, extensible, and efficient. A types system was also added further improving performance and allowing for early error detection. This work is still in progress but the early performance numbers are quite impressive - we are seeing from 40% to 10x speedups between the old and the new code.

Our technical improvements and the growth of the Pig community allowed us to graduate from the Incubator and to join Hadoop as a sub-project. The entire Pig community is excited about reaching this important milestone and the opportunities that being part of the Hadoop family provides! Long live the Pig! :)

Olga Natkovich
Yahoo!

Posted at October 28, 2008 10:31 AM

Comments

How can I use Pig to calculate a simple moving average of a databag of tuples that are setup like: ?

Posted by: lunk at November 8, 2008 10:07 PM | Permalink

Yahoo's Parand Darugar spoke about Hadoop at the DataServices World conference in San Jose. The slides, podcast and video of that presentation are available at

http://www.DataServicesWorld.com/People/PDarugar.htm

Video and podcast duration is 49:44.

Posted by: Ken North at December 16, 2008 12:22 PM | Permalink

Delicious Bookmark this on Delicious

Post a comment

Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service.

Remember Me?

Hadoop is a trademark of the Apache Software Foundation.

Copyright © 2009 Yahoo! Inc. All rights reserved. Copyright | Privacy Policy

Help us continue to improve the Yahoo! Developer Network: Send Your Suggestions