Yahoo! Developer Network Blog

« Previous | Main | Next »


May 5, 2009

The Future of Vertical Search Engines

Vertical search engines are emerging every day, as people look for information that was otherwise hidden in the enormous indexes of Yahoo!, Google, Microsoft, and other generic search engines. You can find search engines devoted to food, cars, people, environmentalism, and much more. Many of today's vertical search engines are pushing beyond the repackaging of existing search results and creating enhanced, better results by focusing on their core audience.

I recently spoke at the www2009 conference in Madrid, Spain about this topic. The complete slides, The Future of Vertical Search Engines, can be found on slideshare.net.

The Past, Present, and Future of Building a Search Engine

Search engine construction has evolved from the early days of building from scratch to today's plethora of data APIs that make tomorrow's vertical search engines more powerful and easier to build.

Past

  • Huge expenses to build the index, find the data, maintain the process.
  • Majority of time spent on building relevancy and less on design and creating a unique experience.

Present

  • Search APIs reduce the complexity of building an index.
  • Vertical search engines still spend significant resources on creating unique data.
  • More resources are spent on designing the best relevancy and a unique experience.

Future

  • New search engines tap into huge amounts of distributed data.
  • More time for developing unique approaches to presenting relevant information and creating a unique experience.

Relevancy

Vertical search engines have a distinct advantage over the general search engines. They already know what their users are interested in. A search for Jaguar in Yahoo! may return the automobile, the Mac OS, or the animal. However, vertical search engines that specialize in sports, autos, or animals would not have that problem. This assumption of user interest gives vertical search engines more flexibility in creating new models of relevancy ranking.

Let's look at some emerging trends among recent Yahoo! BOSS-powered sites and applications.

Location-based awareness

Yahoo! Fire Eagle is a location standardization and distribution platform that allows developers to use location-based services very easily. Location-based search is one of the most promising areas of future sites. This is especially true as mobile phones make it easier to determine location and display relevant information.

It's great to know the restaurant, shops, and friends that are around me right now. But it is even more interesting to know what I can find in the next block, mile, or town. This topic was discussed in the www2009 conference paper: Mining Interesting Locations and Travel Sequences from GPS Trajectories for Mobile Users by Yu Zheng, Lizhu Zhang, Xing Xie and Wei-Ying Ma.

FirePin, an iPhone application that lets you generate a sharable map, uses a combination of Fire Eagle and Google Maps to plot your route in real-time. This could easily be connected to a search engine that computes the probable next location and returns local businesses, census data, historical information, and availability of friends.

Secondary Sources

With the vast amount of data avialable on the net there is no reason to limit your site to just a search API. Many sites are using the user's query to trigger a series of APIs for related information about the subject. A search for Tiger Woods on a sports-related site could build modules based on data from Wikipedia, video from YouTube, latest tournament results, and even build a map of championship golf courses.

Some sites are also using secondary sources to enhance relevance of their search results. This topic was discussed in the www2009 conference paper: Understanding User's Query Intent with Wikipedia by Jian Hu, Gang Wang, Fred Lochovsky and Zheng Chen. DuckDuckGo is a new search engine that is combining these ideas. They use Wikipedia to help enhance relevance, as well as using multiple data sources to provide a more rounded experience.

The amount and variety of data available is rather surprising. The OpenData movement has made data sharing and discovery much more transparent and efficient. For more information on OpenData visit: DataMob.org, TheInfo.org, InfoChimps.org.

Internal and External Data Sources

Yahoo! BOSS offers a custom search experience for larger partners, such as TechCrunch. This allows BOSS to index proprietary data that is normally not available to search spiders. This data can be combined with whitelisted sources and feeds to create a unique set of expert sources. The custom approach also allows for structured search options, such as displaying only articles published within a certain time frame, by a particular author, and about a specific topic.

Even without the BOSS custom approach, vertical search engines can develop their own unique data sets, whether it is the index of books in a library, the statistics generated by research, or other unique data for a subject.

For example, we could build an art search engine. A user searches for "Mona Lisa" and the Louvre's web site is returned as the first result. This could be combined with internal data to display additional information about the painting, Leonardo Da Vinci, the Rennaissance, or the Louvre Museum. Perhaps the site adds a list of related artists: Raphael, Michelangelo to the result for further exploration.

Offline Analysis

Coloralo is an interesting search engine that uses offline analysis to produce specific results. The site was a product of neccessity as the engineer wanted to find new images for his children to color. Coloralo is an image search engine that only returns black and white line drawings for kids to draw on.

When a user searches for "horse" the site requests many images, caches, and analyzes them for the number of colors and distribution of blacks and whites. This analysis returns a smaller list of images that are appropriate.

Vertical Focus

Truevert is an environmental vertical search engine that is going beyond the basic assumption of a niche user's intentions. They build a unique natural language dictionary to enhance relevancy. A search for "CFL" on a regular search engine could return "Canadian Football League" but Truevert recognizes this as the acronym for "Compact Flourescent Lighting", a much more relevant term for environmental concerns.

Beyond Search as a Site or Function

WWW2009 conference panel: Web Search APIs: The Next Generation

Vik Singh, the architect of Yahoo! BOSS, described the Yahoo! BOSS API not as a search API but as a data API during www2009's Web Search APIs: The Next Generation panel. Singh suggests search is the best way to work with the wealth of data on the internet. This data doesn't have to be used to ceate a set of results on a search page.

Search as a Function

Chris Heilmann created Keyword Finder when BOSS began displaying the keyterms associated with a result. These keyterms are the words that have been associated with a web site inside the Yahoo! Search Index. Keyword Finder looks at the top results for a term and returns the keyterms that are the most effective for that term. This helps a site user plan their Search Engine Optimization strategy.

Another site that replaces a list of results with a single answer is Bossy. This site analyzes the results to determine a consensus to decide what is correct. An example of Bossy would be: Q. Where is the Prado?. A. Madrid.

Beyond the Search Engine Web Site

Future search projects will also go beyond the basic browser. It's time to think about this data and new applications. Let's look at what we can do.

Search on the desktop

Xobni is a great search application for the desktop computer. Xobni extends Microsoft Outlook, providing a much stronger search functionality as well as tying into social networks, such as LinkedIn.

Search as a tool

Zemanta is a search-based tool that discovers related content for people who write blog posts. Zemanta is a FireFox plugin that analyzes the context of what you are writing and searches for similar images, articles, and even products on Amazon.

Inquisitor is another tool that has taken search into the browser. Inquisitor replaces the browser's search interface with much more powerful and faster search-suggestion generator.

Search as a module

The OpenSocial standard allows developers to build a single web application and have it appear on multiple social networks at the same time. For example, you build an application which finds daily statistics, gossip, and news about the players in a user's fantasy football league. This single set of code could be simultaneously used in Facebook, Yahoo!, MySpace, Bebo, and more.

Search outside a computer

Web-based applications are moving beyond basic computers. Yahoo! has recently announced partnerships with Intel and television manufacturers to allow applications that work alongside normal broadcast programming. Imagine searching the latest Twitter stream for opinions and statistics while watching the Super Bowl.

This new application standard may also be extended to web-enabled household appliances and automotive computers, as well as home entertainment systems.

Resources

Ted Drake
Web Developer, Yahoo! Paris

Posted at May 5, 2009 7:42 AM | Permalink

Bookmark this on Delicious

Comments

Searching external data sources sounds great in theory. But in practice, it depends on the implementation. I work with biological data. A while ago there was an initiative to search distributed databases for collection holdings. The idea was that if one search could return results from many institutions, the researchers would be able to more easily conduct their research. In practice, this works horribly. If one provider is down, the whole thing is down. If one provider is slow, the whole process is slow. In other words, the quality of service is bad.

The best way to solve this problem is to create an local index of the distributed databases and search the index. That way the search is fast. The problem this introduces is if the results point to a database that is currently down. Then the user is presented with relevant results to resources that are no available. That's is not good.

These are troublesome problems. I would love to know how other people are approaching them.

Posted by: Brett at May 5, 2009 9:44 AM

Hi,
thank you for mentioning Zemanta! Indeed these new ways of approaching discovery are next generation of search, or more precisely search going into new places :)

I have to agree with previous commenter on external databases or indexes. We use them along internal index in Zemanta, but they bring a lot of pain:
- latencies
- you can never tune them the way you can tune your own search (quality of results)
- usually no flexibility in query creation (advanced syntax, weights...)
- you never get all the metadata you need to do what you want
- it's near impossible to combine rankings from heterogeneus federated search. even when search engines are nice enough to tell you scores, they are useless inter-engines
- reliability

It is easy to create a prototype mashup or some proof of concept. But when you need a product with all the finishing touches, problems arise. The reasons above make it hard to create a superb user experience. But still you sometimes have to do it (to get to exclusive content or maybe because your index cannot be comprehensive enough).

Efforts like Lucene, Solr and Hadoop need more engagement to get going (and are immature in certain respects), but they can get you much further when you really need search tailored to your needs.

Yahoo's efforts to reinvent search are great. Not only BOSS, but Flickr search, SearchMonkey and support for Hadoop are all truly big contributions to the future of internet!

Anrdaz Tori, Zemanta

Posted by: Anrdaz Tori at May 5, 2009 2:42 PM

These vertical search engines, most of the time, provide better results than conventional search engines. http://aafter.com/ is one of them that guarantees better and simple web search, always.

It does not stick to any particular utility or field; rather it offers multifaceted utilities to the users. Besides providing conventional search-based results, it also provides cash back, coupons as well as reverse phone look up services, reverse address search services, URL shortening service, study tools for students, allergy information and lots more that you will just love to grab.

I would recommend all the viewers to take advantage of this high privacy search engine.

Regards,
SharonHill

Posted by: SharonHill at May 7, 2009 6:38 AM

http://developer.yahoo.net/blog/archives/2009/05/future_vertical_search.html

Yes, can we get permission to republish this post on AltSearchEngines - with full attribution?

Thanks,

Charles Knight, editor
AltSearchEngines.com

Posted by: Charles Knight at May 12, 2009 9:11 AM

Charles, Ted and I would be delighted to have you repost this with full author attribution, including a mention that this was originally published on Yahoo! Developer Network Blog. Thanks very much! -- Havi Hoffman, YDN Blog editor

Posted by: Havi Hoffman at May 22, 2009 7:59 AM

Post a comment

Comment Policy: We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service. Fields marked with asterisk '*' are required.

Remember Me?

Subscribe

YDN Blog: Get Yahoo! Developer Network Blog on your personalized My Yahoo! home page.

Add To My RSS Feed

YDN Link Blog: Get Yahoo! Developer Network Linkblog on your personalized My Yahoo! home page.

Add To My RSS Feed

Recent Readers

Copyright © 2010 Yahoo! Inc. All rights reserved. Copyright | Privacy Policy

Help us continue to improve the Yahoo! Developer Network: Send Your Suggestions