Yahoo! Developer Network Blog
« Previous | Main | Next »
May 5, 2009
The Future of Vertical Search Engines
Vertical search engines are emerging every day, as people look for information that was otherwise hidden in the enormous indexes of Yahoo!, Google, Microsoft, and other generic search engines. You can find search engines devoted to food, cars, people, environmentalism, and much more. Many of today's vertical search engines are pushing beyond the repackaging of existing search results and creating enhanced, better results by focusing on their core audience.
I recently spoke at the www2009 conference in Madrid, Spain about this topic. The complete slides, The Future of Vertical Search Engines, can be found on slideshare.net.
The Past, Present, and Future of Building a Search Engine
Search engine construction has evolved from the early days of building from scratch to today's plethora of data APIs that make tomorrow's vertical search engines more powerful and easier to build.
Past
- Huge expenses to build the index, find the data, maintain the process.
- Majority of time spent on building relevancy and less on design and creating a unique experience.
Present
- Search APIs reduce the complexity of building an index.
- Vertical search engines still spend significant resources on creating unique data.
- More resources are spent on designing the best relevancy and a unique experience.
Future
- New search engines tap into huge amounts of distributed data.
- More time for developing unique approaches to presenting relevant information and creating a unique experience.
Relevancy
Vertical search engines have a distinct advantage over the general search engines. They already know what their users are interested in. A search for Jaguar in Yahoo! may return the automobile, the Mac OS, or the animal. However, vertical search engines that specialize in sports, autos, or animals would not have that problem. This assumption of user interest gives vertical search engines more flexibility in creating new models of relevancy ranking.
Let's look at some emerging trends among recent Yahoo! BOSS-powered sites and applications.
Location-based awareness
Yahoo! Fire Eagle is a location standardization and distribution platform that allows developers to use location-based services very easily. Location-based search is one of the most promising areas of future sites. This is especially true as mobile phones make it easier to determine location and display relevant information.
It's great to know the restaurant, shops, and friends that are around me right now. But it is even more interesting to know what I can find in the next block, mile, or town. This topic was discussed in the www2009 conference paper: Mining Interesting Locations and Travel Sequences from GPS Trajectories for Mobile Users by Yu Zheng, Lizhu Zhang, Xing Xie and Wei-Ying Ma.
FirePin, an iPhone application that lets you generate a sharable map, uses a combination of Fire Eagle and Google Maps to plot your route in real-time. This could easily be connected to a search engine that computes the probable next location and returns local businesses, census data, historical information, and availability of friends.
Secondary Sources
With the vast amount of data avialable on the net there is no reason to limit your site to just a search API. Many sites are using the user's query to trigger a series of APIs for related information about the subject. A search for Tiger Woods on a sports-related site could build modules based on data from Wikipedia, video from YouTube, latest tournament results, and even build a map of championship golf courses.
Some sites are also using secondary sources to enhance relevance of their search results. This topic was discussed in the www2009 conference paper: Understanding User's Query Intent with Wikipedia by Jian Hu, Gang Wang, Fred Lochovsky and Zheng Chen. DuckDuckGo is a new search engine that is combining these ideas. They use Wikipedia to help enhance relevance, as well as using multiple data sources to provide a more rounded experience.
The amount and variety of data available is rather surprising. The OpenData movement has made data sharing and discovery much more transparent and efficient. For more information on OpenData visit: DataMob.org, TheInfo.org, InfoChimps.org.
Internal and External Data Sources
Yahoo! BOSS offers a custom search experience for larger partners, such as TechCrunch. This allows BOSS to index proprietary data that is normally not available to search spiders. This data can be combined with whitelisted sources and feeds to create a unique set of expert sources. The custom approach also allows for structured search options, such as displaying only articles published within a certain time frame, by a particular author, and about a specific topic.
Even without the BOSS custom approach, vertical search engines can develop their own unique data sets, whether it is the index of books in a library, the statistics generated by research, or other unique data for a subject.
For example, we could build an art search engine. A user searches for "Mona Lisa" and the Louvre's web site is returned as the first result. This could be combined with internal data to display additional information about the painting, Leonardo Da Vinci, the Rennaissance, or the Louvre Museum. Perhaps the site adds a list of related artists: Raphael, Michelangelo to the result for further exploration.
Offline Analysis
Coloralo is an interesting search engine that uses offline analysis to produce specific results. The site was a product of neccessity as the engineer wanted to find new images for his children to color. Coloralo is an image search engine that only returns black and white line drawings for kids to draw on.
When a user searches for "horse" the site requests many images, caches, and analyzes them for the number of colors and distribution of blacks and whites. This analysis returns a smaller list of images that are appropriate.
Vertical Focus
Truevert is an environmental vertical search engine that is going beyond the basic assumption of a niche user's intentions. They build a unique natural language dictionary to enhance relevancy. A search for "CFL" on a regular search engine could return "Canadian Football League" but Truevert recognizes this as the acronym for "Compact Flourescent Lighting", a much more relevant term for environmental concerns.
Beyond Search as a Site or Function
Vik Singh, the architect of Yahoo! BOSS, described the Yahoo! BOSS API not as a search API but as a data API during www2009's Web Search APIs: The Next Generation panel. Singh suggests search is the best way to work with the wealth of data on the internet. This data doesn't have to be used to ceate a set of results on a search page.
Search as a Function
Chris Heilmann created Keyword Finder when BOSS began displaying the keyterms associated with a result. These keyterms are the words that have been associated with a web site inside the Yahoo! Search Index. Keyword Finder looks at the top results for a term and returns the keyterms that are the most effective for that term. This helps a site user plan their Search Engine Optimization strategy.
Another site that replaces a list of results with a single answer is Bossy. This site analyzes the results to determine a consensus to decide what is correct. An example of Bossy would be: Q. Where is the Prado?. A. Madrid.
Beyond the Search Engine Web Site
Future search projects will also go beyond the basic browser. It's time to think about this data and new applications. Let's look at what we can do.
Search on the desktop
Xobni is a great search application for the desktop computer. Xobni extends Microsoft Outlook, providing a much stronger search functionality as well as tying into social networks, such as LinkedIn.
Search as a tool
Zemanta is a search-based tool that discovers related content for people who write blog posts. Zemanta is a FireFox plugin that analyzes the context of what you are writing and searches for similar images, articles, and even products on Amazon.
Inquisitor is another tool that has taken search into the browser. Inquisitor replaces the browser's search interface with much more powerful and faster search-suggestion generator.
Search as a module
The OpenSocial standard allows developers to build a single web application and have it appear on multiple social networks at the same time. For example, you build an application which finds daily statistics, gossip, and news about the players in a user's fantasy football league. This single set of code could be simultaneously used in Facebook, Yahoo!, MySpace, Bebo, and more.
Search outside a computer
Web-based applications are moving beyond basic computers. Yahoo! has recently announced partnerships with Intel and television manufacturers to allow applications that work alongside normal broadcast programming. Imagine searching the latest Twitter stream for opinions and statistics while watching the Super Bowl.
This new application standard may also be extended to web-enabled household appliances and automotive computers, as well as home entertainment systems.
Resources
- Yahoo! BOSS: Developer.Yahoo.Com/BOSS
- YQL: Developer.Yahoo.Com/YQL
- Fire Eagle: Developer.Yahoo.Com/FireEagle
- Google App Engine: AppEngine.Google.Com
- Amazon Web Services: AWS.Amazon.Com
- OAuth: OAuth.Net
- OpenSocial: OpenSocial.Org
- Open Data: TheInfo.Org
- Alt Search Engines: AltSearchEngines.Com
Ted Drake
Web Developer, Yahoo! Paris
Posted at May 5, 2009 7:42 AM | Permalink
Comments
Searching external data sources sounds great in theory. But in practice, it depends on the implementation. I work with biological data. A while ago there was an initiative to search distributed databases for collection holdings. The idea was that if one search could return results from many institutions, the researchers would be able to more easily conduct their research. In practice, this works horribly. If one provider is down, the whole thing is down. If one provider is slow, the whole process is slow. In other words, the quality of service is bad.
The best way to solve this problem is to create an local index of the distributed databases and search the index. That way the search is fast. The problem this introduces is if the results point to a database that is currently down. Then the user is presented with relevant results to resources that are no available. That's is not good.
These are troublesome problems. I would love to know how other people are approaching them.
Posted by: Brett at May 5, 2009 9:44 AM
Hi,
thank you for mentioning Zemanta! Indeed these new ways of approaching discovery are next generation of search, or more precisely search going into new places :)
I have to agree with previous commenter on external databases or indexes. We use them along internal index in Zemanta, but they bring a lot of pain:
- latencies
- you can never tune them the way you can tune your own search (quality of results)
- usually no flexibility in query creation (advanced syntax, weights...)
- you never get all the metadata you need to do what you want
- it's near impossible to combine rankings from heterogeneus federated search. even when search engines are nice enough to tell you scores, they are useless inter-engines
- reliability
It is easy to create a prototype mashup or some proof of concept. But when you need a product with all the finishing touches, problems arise. The reasons above make it hard to create a superb user experience. But still you sometimes have to do it (to get to exclusive content or maybe because your index cannot be comprehensive enough).
Efforts like Lucene, Solr and Hadoop need more engagement to get going (and are immature in certain respects), but they can get you much further when you really need search tailored to your needs.
Yahoo's efforts to reinvent search are great. Not only BOSS, but Flickr search, SearchMonkey and support for Hadoop are all truly big contributions to the future of internet!
Anrdaz Tori, Zemanta
Posted by: Anrdaz Tori at May 5, 2009 2:42 PM
These vertical search engines, most of the time, provide better results than conventional search engines. http://aafter.com/ is one of them that guarantees better and simple web search, always.
It does not stick to any particular utility or field; rather it offers multifaceted utilities to the users. Besides providing conventional search-based results, it also provides cash back, coupons as well as reverse phone look up services, reverse address search services, URL shortening service, study tools for students, allergy information and lots more that you will just love to grab.
I would recommend all the viewers to take advantage of this high privacy search engine.
Regards,
SharonHill
Posted by: SharonHill at May 7, 2009 6:38 AM
http://developer.yahoo.net/blog/archives/2009/05/future_vertical_search.html
Yes, can we get permission to republish this post on AltSearchEngines - with full attribution?
Thanks,
Charles Knight, editor
AltSearchEngines.com
Posted by: Charles Knight at May 12, 2009 9:11 AM
Charles, Ted and I would be delighted to have you repost this with full author attribution, including a mention that this was originally published on Yahoo! Developer Network Blog. Thanks very much! -- Havi Hoffman, YDN Blog editor
Posted by: Havi Hoffman at May 22, 2009 7:59 AM
Post a comment
Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service. Fields marked with asterisk '*' are required.
Subscribe
Recent Blog Articles
view all
YQL Open Table for Google Buzz now live
Tue, 09 Feb 2010
INSERT INTO twitter.status ...
Mon, 08 Feb 2010
Announcing the Yahoo! Brasil Open Hack Day 2010, 20-21 March
Mon, 08 Feb 2010
Marketing hacks, linchpins, and tech women of valor
Sun, 07 Feb 2010
Yahoo! India invites you to join the first India Hadoop Summit
Thu, 04 Feb 2010
Recent Links
Appcelerator Titanium + Yahoo YQL on Vimeo
Mon, 08 Feb 2010
Tue, 02 Feb 2010
PhoneGap | Cross platform mobile framework
Sat, 30 Jan 2010
Web developers can rule the iPad - O'Reilly Radar
Sat, 30 Jan 2010
rc3.org - Is the iPad the harbinger of doom for personal computing?
Thu, 28 Jan 2010
Archives
2010
2009
2008
2007
2006
2005
Recent Readers

