Yahoo! Developer Network Blog
« Previous | Main | Next »
March 4, 2009
Using Hadoop to fight spam - Part 1
We interviewed Mark Risher and Jay Pujara, leaders in the war against spam for Yahoo! Mail. With over 300 million users and billions of mesages, looking for problems or patterns to identify spammers can be a daunting task. Mark and Jay describe how their previous approach using databases quickly ran into scalability limitations as they analyzed data aggregated over a month or more. They explain how Hadoop, with Pig and Streaming, now enables them to slice through billions of messages to isolate patterns and identify spammers. They can now create new queries and get results within minutes, for problems that took hours or were considered impossible with their previous approach. Listen in as Mark and Jay describe their experiences fighting spam:
Posted at March 4, 2009 1:24 PM
Comments
Your readers, particularly Hadoop developers and users, might be intersted in Aster Data Systems' upcoming webinar on MapReduce for Data Warehousing and Ananlytics. There is an overlapping interest in both Hadoop and MapReduce, and the two frameworks can sometimes be complementary. To register for the webinar, please visit www.asterdata.com/mapreduce_webinar.
Posted by: Ryan at March 17, 2009 11:20 AM | Permalink
Here is a spam site I can't get rid of. How did my email address get attched to this site?
Official VIAGRA (R)Store;
Posted by: Terry Coleman at November 26, 2009 12:54 PM | Permalink
Post a comment
Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service.
Hadoop is a trademark of the Apache Software Foundation.
Subscribe
Recent Blog Articles
view all
Hadoop Bay Area User Group - Feb 17th at Yahoo!, Sunnyvale
Wed, 03 Feb 2010
Comparing Pig Latin and SQL for Constructing Data Processing Pipelines
Fri, 29 Jan 2010
Video from Jan. 20, 2010 Hadoop Bay Area User Group now online
Thu, 28 Jan 2010
Stomping out Java "concurrency cockroaches" with SureLogic's Flashlight and JSure tools
Tue, 26 Jan 2010
Hadoop Bay Area January 2010 User Group - Recap
Thu, 21 Jan 2010
Recent Links
Appcelerator Titanium + Yahoo YQL on Vimeo
Mon, 08 Feb 2010
Tue, 02 Feb 2010
PhoneGap | Cross platform mobile framework
Sat, 30 Jan 2010
Web developers can rule the iPad - O'Reilly Radar
Sat, 30 Jan 2010
rc3.org - Is the iPad the harbinger of doom for personal computing?
Thu, 28 Jan 2010
Archives
Recent Readers

