Yahoo! Developer Network Blog
« Previous | Main | Next »
December 10, 2008
Opening the web and retrieving all the goodies
The internet is an interesting thing, as it is a bit like the matrix. Whilst normal end users see something like this:
Developers have the more outside-the-matrix point of view as we tend to look at the data behind the facade:
And if you are one of the true believers in web2.0/web3.0 where the web is the platform and the framework then it turns into something like this:
There is nothing better than yummy yummy data that you can retrieve, mix with the right other ingredients and spices to create something that is even healthier, more nutritious or even caters for special diets. In essence, giving access to data will make your product all the more successful as other chefs can cater for you.
Getting to the yummy parts of one or several sources can be a bit of an problem though. Imagine a tin of good solid food you want to get to. The easiest and most versatile tool would be a swiss army knife with a can opener.
The web equivalent of a pocket knife is cURL, a library that allows a developer to make scripts behave like a browser and get access to the source of any web site or web service. You can for example go to the command line and simply enter the following:
curl --url http://www.thedailypuppy.com
The result is the source code of the page that you could run through other commands to get to the bits you want to retrieve.
The same works for RSS feeds or other types of data:
curl --url http://thedailypuppy.com/rss
cURL is amazingly powerful when you know how to use it - you can simulate other user agents, send and retrieve data, even spoof cookies. However, just like with the swiss army knife you'll have to put a lot of work and effort into getting to the goodies. Regular Expressions are most likely the most versatile way to do it and when it comes to being a developer they are not the first thing to go into your head easily.
What the web needed was a very fast, electrical can opener that also might be coupled with a microwave to pre-heat your dish. The equivalent for that would be Yahoo Pipes.
Yahoo Pipes is amazingly powerful as it gives you a very handy and beautiful interface to remix the web:
This pipe for example searches twitter.com for my name and filters common false positives. The outcome of your pipe laying is then available as a very simple URL that can take parameters and give you the output in a lot of different formats:
- Show as RSS feed:
http://pipes.yahoo.com/pipes/pipe.run?_id=92feb878651258ca1d4575d3568766e9&_render=rss&s=heilmann - Show as JSON:
http://pipes.yahoo.com/pipes/pipe.run?_id=92feb878651258ca1d4575d3568766e9&_render=json&s=heilmann - Show as JSON and wrap in foo():
http://pipes.yahoo.com/pipes/pipe.run?_id=92feb878651258ca1d4575d3568766e9&_render=json&_callback=foo&s=heilmann - Show as JSON, wrap in foo() and search for "Christian":
http://pipes.yahoo.com/pipes/pipe.run?_id=92feb878651258ca1d4575d3568766e9&_render=json&_callback=foo&s=Christian
If that is too low-level for you and all you wanted to do is show a badge that you can change the look and feel, this is possible, too:
And this is where it got tricky. Whenever you build an interface that is beautiful, intuitive and terribly powerful you will get one request: can we have a command line interface to this. This is just how developers roll, there is not much we can do about it.
The other issue with Pipes is that it is high maintenance to some degree. Whilst you can provide parameters, it is still a very graphical interface that is impossible to use for somebody who for example cannot use a mouse or see the interface. This might not be a large group, but in the end I myself find using a keyboard tool like Quicksilver for example easier than dragging and dropping and using my mouse a lot. When you want to change the functionality of a pipe beyond parameters then you'll need to go back to the editor, something that made several people unhappy, too. In other words, we needed a good, sturdy can opener that doesn't need batteries.
This is where the newest tool to open the web comes in: Yahoo Query Language or short YQL. With YQL you have a SQL style syntax to get very detailed information from all the services Yahoo offers the world and you can also access the web through it.
The main thing to try out YQL is the interactive console at http://developer.yahoo.com/yql/console/. There you can select from a lot of demo queries and you can see the outcome live below your query.
The real power of YQL lies in using and mixing Yahoo services and - with authentication - the Yahoo Social graph. However, for now let's just look at another thing to do: remix the web. If you scroll down on the right hand side you'll find "Available Data Tables" and there is a "data" sub-menu with the items atom, csv, feed, html, json, rss and xml.
This can be used to create YQL queries for anything on the web. Say for example you only want the names of the latest dailypuppy.com puppies, this can be done with the statement select title from feed where url='http://feeds.feedburner.com/TheDailyPuppy' and wrapped in the correct REST call it becomes:
http://query.yahooapis.com/v1/public/yql?q=select%20title%20from%20feed%20where%20url%3D'http%3A%2F%2Ffeeds.feedburner.com%2FTheDailyPuppy'&format=xml
Notice that you need to add a "public" before the yql to use the information without authentication!
If you want the data in JSON and wrapped in a function called myPuppies, just add the correct parameters called format and callback:
http://query.yahooapis.com/v1/public/yql?q=select%20title%20from%20feed%20where%20url%3D'http%3A%2F%2Ffeeds.feedburner.com%2FTheDailyPuppy'&format=json&callback=myPuppies
Where it gets really interesting is the html option. Whilst Pipes has the option to retrieve an HTML document and get it as a string, YQL went further and actually allows you to use XPATH queries over the HTML document. Say you want to get all the latest images in my blog posts. You could use select * from html where url="http://www.wait-till-i.com" and xpath='//div[@id="content"]//img' for this:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.wait-till-i.com%22%20and%20xpath%3D'%2F%2Fdiv%5B%40id%3D%22content%22%5D%2F%2Fimg'&format=xml
The opportunities are endless, especially once you dive deeper into the YQL documentation and learn about joining queries.
Want more? Comment about your needs and wishes :)
Chris Heilmann
Yahoo Developer Network
Posted at December 10, 2008 3:53 PM | Permalink
Comments
Thanks for pointing out the "public" element. I've been playing around with the console but didn't find the non-authorized key.
The YQL console is also a quick way to do some research for data. Even if you are not building a web service with it. I wanted to quickly find some geolocation data for different cities. The console made this super easy.
Posted by: Ted Drake at December 12, 2008 9:22 AM
A question, how does Pipes handle load? Is there a request and a compilation beeing made each time the service is called or does the system cache some of the more recurring requests?
Could this be used as a part in a normal size system without having Pipes beeing the bottle neck?
Posted by: Kalle Hoppe at December 12, 2008 12:19 PM
Pipes caches at many many places in the architecture, and will only re-run a pipe after a certain period of time, or if the parameters to the pipe "change" (because you can have "inputs" to pipes that can be supplied on the GET that make the way the pipe runs different).
There are various limits on how many pipes and how much a pipe can be run in place (quite a few things are monitored). However, if you are accessing the same pipe with only a few variations on input, its not really "counted" as we're giving you a cached version (until we decide that the pipe needs to be refreshed and re-run).
There are a few other things we do to handle load but I think this is the information you're asking about. Please feel free to contact pipes-bd@yahoo-inc.com with any specific questions you might have.
Posted by: Jonathan Trevor at December 12, 2008 2:31 PM
Post a comment
Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service. Fields marked with asterisk '*' are required.
Subscribe
Recent Blog Articles
view all
YQL Open Table for Google Buzz now live
Tue, 09 Feb 2010
INSERT INTO twitter.status ...
Mon, 08 Feb 2010
Announcing the Yahoo! Brasil Open Hack Day 2010, 20-21 March
Mon, 08 Feb 2010
Marketing hacks, linchpins, and tech women of valor
Sun, 07 Feb 2010
Yahoo! India invites you to join the first India Hadoop Summit
Thu, 04 Feb 2010
Recent Links
Appcelerator Titanium + Yahoo YQL on Vimeo
Mon, 08 Feb 2010
Tue, 02 Feb 2010
PhoneGap | Cross platform mobile framework
Sat, 30 Jan 2010
Web developers can rule the iPad - O'Reilly Radar
Sat, 30 Jan 2010
rc3.org - Is the iPad the harbinger of doom for personal computing?
Thu, 28 Jan 2010
Archives
2010
2009
2008
2007
2006
2005
Recent Readers

