Yahoo! Developer Network Blog
« Previous | Main | Next »
February 22, 2008
Mailtrust Hadoop Talk in Virginia on Monday
Just a quick heads up to Hadoop fans in the Virginia area. Bill Boebel, CTO of Mailtrust, will be giving a MapReduce vs. SQL Talk on Monday the 25th. (Mailtrust is the email division of Rackspace, a large hosting provider.)
Stu Hood, one of Mailtrust's software engineers wrote about MapReduce at Rackspace back in January, detailing how they use Hadoop for processing "several hundred gigabytes of email log data" every day.
The way it works is that raw logs get streamed from hundreds of mail servers to the Hadoop Distributed File System (”HDFS”) in real time, and scheduled MapReduce jobs run to index the new data using Apache Lucene and Solr. Once the indexes have been built, they are compressed and stored away in HDFS. Each Hadoop datanode also runs a Tomcat servlet container, which hosts a number of Solr instances that pull and merge the new indexes, and provide really fast search results to our support team.
Additionally, using MapReduce we are now able to look at our log data in all sorts of interesting ways. For example, we run nightly MapReduce jobs to collect statistics about our mail system, such as spam counts by domain, bytes transferred and number of logins. Now whenever we think of complex question about our customers’ usage patterns, we can pull the answer from our logs within hours via MapReduce. This is powerful stuff.
Read the whole posting for some interesting email stats they extracted.
Bill's talk should provide an excellent overview of Hadoop and some good insight into the Rackspace deployment.
Jeremy Zawodny
Yahoo! Developer Network
Posted at February 22, 2008 11:11 AM
Comments
Hello, everyone.
Does anyone have experience running Hadoop cluster nodes on Amazon S3 storage via vendors like RightScale? IO throughput would be a big concern, right?
If anyone can comment on running Hadoop nodes on AS3, I really appreciate it.
Thanks.
Posted by: Trung Nguyen at February 22, 2008 5:59 PM | Permalink
Post a comment
Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service.
Hadoop is a trademark of the Apache Software Foundation.
Subscribe
Recent Blog Articles
view all
Slides from Hadoop World and University Talks
Wed, 28 Oct 2009
Hadoop User Group (HUG) – Oct 21st at Yahoo!
Fri, 23 Oct 2009
M45 Enables Web-Scale Information Extraction Research
Fri, 23 Oct 2009
Slides of September 23rd Bay Area Hadoop User Group
Mon, 05 Oct 2009
New Update: Yahoo! Distribution of Hadoop
Thu, 01 Oct 2009
Recent Links
Web addresses may adopt non-English characters | Digital Media - CNET News
Mon, 26 Oct 2009
Yahoo Open Hack NYC - Open Blog - NYTimes.com
Thu, 15 Oct 2009
Music Hack Day - Boston - Nov 20-21
Sun, 11 Oct 2009
A List Apart: Articles: Discovering Magic
Tue, 06 Oct 2009
Building iPhone Apps with HTML, CSS, and JavaScript
Sun, 04 Oct 2009
Archives

