Yahoo! Developer Network Blog
« Previous | Main | Next »
February 19, 2008
Hadoop running in production on the Yahoo! Search Webmap
Some big news in the world of Hadoop comes out of Yahoo! today. We believe we're now running the world's largest Hadoop application, a 10,000 core Linux cluster producing data used by the Yahoo! Search Webmap.
As you can see from the announcement on the Hadoop Blog:
The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.Some Webmap size data:
- Number of links between pages in the index: roughly 1 trillion links
- Size of output: over 300 TB, compressed!
- Number of cores used to run a single Map-Reduce job: over 10,000
- Raw disk used in the production cluster: over 5 Petabytes
In this video, YDN's Jeremy Zawodny interviews Arnab Bhattacharjee (manager of the Yahoo! Webmap Team) and Sameer Paranjpye (manager of our Hadoop development) to learn more about what all this means:
Video embed code:
I'm sure there will be more to come on the Hadoop front, so watch this space.
Matt McAlister
Posted at February 19, 2008 10:44 AM | Permalink
Comments
thanks for the info.
Posted by: paisley at February 19, 2008 5:23 PM
Wow this is great. Thanks for sharing.
~Katie
Posted by: Katie at February 19, 2008 6:11 PM
Great to see that Hadoop scales this far. Thanks for the info.
Posted by: Johann at February 20, 2008 1:06 AM
Video quality has issues, can't hear the speakers. Next time use sound engineers.
Posted by: Thomas at February 20, 2008 12:04 PM
Thomas: tell me more. I can't reproduce the problem here. I've seen the video 3 or 4 times now.
Posted by: Jeremy Zawodny at February 20, 2008 12:08 PM
That's a sexy cluster!
note: it claims over 10,000 cores are required to run a Map-Reduce, but the cluster size is 10,000 cores?
Posted by: Sol Young at February 20, 2008 12:18 PM
It seems the video is down (for me anyways) on this blog post, but up on the hadoop blog. Also video ids don't match. This is the working ID: 6418984.
Posted by: Yvo at February 21, 2008 3:52 AM
Hadoop sounds like the Woot of web mapping technology
Posted by: CVOS man at February 21, 2008 12:30 PM
Dudes ... whomever thinks Microsoft is stupid for trying to buy Yahoo! may want to wake up and smell the coffee.
Yet another GREAT JOB GUYS!
Posted by: MattStark at March 6, 2008 11:47 AM
Post a comment
Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service. Fields marked with asterisk '*' are required.
Subscribe
Recent Blog Articles
view all
YQL Open Table for Google Buzz now live
Tue, 09 Feb 2010
INSERT INTO twitter.status ...
Mon, 08 Feb 2010
Announcing the Yahoo! Brasil Open Hack Day 2010, 20-21 March
Mon, 08 Feb 2010
Marketing hacks, linchpins, and tech women of valor
Sun, 07 Feb 2010
Yahoo! India invites you to join the first India Hadoop Summit
Thu, 04 Feb 2010
Recent Links
Appcelerator Titanium + Yahoo YQL on Vimeo
Mon, 08 Feb 2010
Tue, 02 Feb 2010
PhoneGap | Cross platform mobile framework
Sat, 30 Jan 2010
Web developers can rule the iPad - O'Reilly Radar
Sat, 30 Jan 2010
rc3.org - Is the iPad the harbinger of doom for personal computing?
Thu, 28 Jan 2010
Archives
2010
2009
2008
2007
2006
2005
Recent Readers

