Yahoo! Developer Network Blog
« Previous | Main | Next »
September 22, 2008
SearchMonkey Support for RDFa Enabled
Yahoo! Search is now extracting RDFa data across the World Wide Web and making this information available to the public via SearchMonkey. RDFa is an open standard for embedding structured data directly in HTML. Along with our previous support for eRDF and a number of popular microformats, SearchMonkey now supports a wide variety of popular semantic technologies.
What is structured data, and why is structured data good for search? Traditional search engines crawl the web and extract what metadata they can: the page title, an autogenerated summary, the file size, the MIME-type, the last-updated date, and so on. However, this sort of analysis pales in comparison to what a human being can do simply by glancing at the page. A human can look at the words "Joe's Home Page" and infer, "ah, this page probably belongs to Joe," or look at an image and infer, "ah, that's probably a picture of Joe, the owner of the page." That's easy enough for humans... but what if the search engine could pick out this info and display it directly in the search result?
RDFa relies on using attributes to embed structured data in XHTML. These attributes are not valid in HTML 4, but the W3C has provided an XHTML DTD to validate against. The following example illustrates a simple home page marked up with RDFa data (in bold):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
lang="en" xml:lang="en">
<head>
<title>The Amazing Home Page of Joe Smith</title>
</head>
<body>
<h1 property="dc:title">Joe's Home Page</h1>
<div rel="foaf:maker">
<h2 property="foaf:name">Joe Smith</h2>
<div rel="foaf:depiction" resource="http://joesmith.org/images/jsmith.png">
<img src="/images/jsmith.png" alt="Smiling headshot of Joe" />
<p property="dc:rights">Creative Commons Attribution 3.0 Unported</p>
</div>
</div>
</body>
</html>
In this page, the designer has explicitly stated that the image is a "depiction of the person who made the web page." Adding this information as RDFa can potentially benefit many applications. In the case of Yahoo!, we've designed our search index to extract and store this information.
RDFa support has already enabled some interesting new SearchMonkey applications. For instance, Creative Commons has recently started to deploy RDFa across the web in the form of copyright and licensing information. Every time a Creative Commons user selects a CC license, the generated HTML badge contains RDFa markup indicating the nature of the license. The Creative Commons Infobar uses this data to selectively trigger on pages that declare their license using structured markup:

To get started with RDFa:
- Learn the basics with the W3C RDFa Primer
- Dive into the details with the full RDFa Specification
- Join the community at the RDFa homepage
- Test your structured markup skills with the RDFa Distiller
- Filter on RDFa in Yahoo! searches with the
searchmonkeyid:com.yahoo.rdf.rdfaparameter - Start displaying RDFa to millions of users with the SearchMonkey developer tool
Evan Goer
Yahoo! SearchMonkey Team
Posted at September 22, 2008 2:00 PM
Comments
I thought you had been extracting RDFa for a while now... Wasn't that the case?
Posted by: David Peterson at September 24, 2008 7:06 AM
Nope, we launched back in May with eRDF and a few popular microformats such as hcard, hcalendar, and XFN. With the Candidate Recommendation on June 20, we were able to move forward with RDFa.
Posted by: Evan at September 24, 2008 11:51 AM
Is it possible to query on specific values set with RDFa markup? For example, I just added the following to http://www.snee.com/bobdc.blog/2008/02/the_future_of_rdfa.html:
<span xmlns:sn="http://www.snee.com/ns/whatever/"
about="http://www.flickr.com/photos/bobdc/1881958623"
property="sn:goofinessFactor" content="3.4"/>
Once the Yahoo crawlers know that this page has been updated, is there a query that could access the page or any of the information in the triple based on the information in the triple?
thanks,
Bob
Posted by: Bob DuCharme at September 27, 2008 12:55 PM
Hi Bob -- Right now, the only way to inspect and play with the actual contents of the structured markup is the SearchMonkey devtool. The front end of the search engine just supports simple filtering. You can say, "Give me all pages that A) have hresume data and B) happen to have the string 'PHP'", but not "Give me all pages with 'PHP' marked up as part of the hresume."
Posted by: Evan at September 29, 2008 9:27 AM
It sounds like the front end currently doesn't support anything specific to RDFa. Is this correct?
thanks,
BOb
Posted by: Bob DuCharme at October 1, 2008 12:28 PM
Post a comment
Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service.
Subscribe
Recent Blog Articles
view all
The state of mobile browsers - PPK in London
Thu, 02 Jul 2009
GeoMaker - Turning web content into maps made easy
Wed, 01 Jul 2009
Tue, 30 Jun 2009
Hacking Up North : Winners of the Sunderland Hack Challenge
Fri, 26 Jun 2009
ConvergeSC web event comes to South Carolina
Wed, 24 Jun 2009
Recent Links
Junta42 blog: News Flash: Guardian Seeks to Grow through Products, Not Content
Fri, 03 Jul 2009
Twitter Approval Matrix - June 2009 - O'Reilly Radar
Thu, 02 Jul 2009
YUI 3.0 with Jonathan LeBlanc from the Yahoo Developer Network | Unmatched Style
Wed, 01 Jul 2009
Yahoo! Search Blog: VoCampers Converge at Yahoo! Headquarters in Sunnyvale
Thu, 25 Jun 2009
Make: Online : Dorkbot London June 23
Mon, 22 Jun 2009
Archives
2009
2008
2007
2006
2005

