Yahoo! Developer Network Blog

« Previous | Main | Next »


March 4, 2009

Using Hadoop to fight spam - Part 1

We interviewed Mark Risher and Jay Pujara, leaders in the war against spam for Yahoo! Mail. With over 300 million users and billions of mesages, looking for problems or patterns to identify spammers can be a daunting task. Mark and Jay describe how their previous approach using databases quickly ran into scalability limitations as they analyzed data aggregated over a month or more. They explain how Hadoop, with Pig and Streaming, now enables them to slice through billions of messages to isolate patterns and identify spammers. They can now create new queries and get results within minutes, for problems that took hours or were considered impossible with their previous approach. Listen in as Mark and Jay describe their experiences fighting spam:


Posted at March 4, 2009 1:24 PM

Bookmark this on Delicious

Comments

Your readers, particularly Hadoop developers and users, might be intersted in Aster Data Systems' upcoming webinar on MapReduce for Data Warehousing and Ananlytics. There is an overlapping interest in both Hadoop and MapReduce, and the two frameworks can sometimes be complementary. To register for the webinar, please visit www.asterdata.com/mapreduce_webinar.

Posted by: Ryan at March 17, 2009 11:20 AM | Permalink

Here is a spam site I can't get rid of. How did my email address get attched to this site?
Official VIAGRA (R)Store;

Posted by: Terry Coleman at November 26, 2009 12:54 PM | Permalink

Post a comment

Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service.

Remember Me?

Hadoop is a trademark of the Apache Software Foundation.

Copyright © 2010 Yahoo! Inc. All rights reserved. Copyright | Privacy Policy

Help us continue to improve the Yahoo! Developer Network: Send Your Suggestions