Accessing SearchMonkey Structured Objects via BOSS
SearchMonkey and the structured Web
We’ve just announced an all-new Yahoo! Search experience, with many new features powered by SearchMonkey data. Since launching our open developer platform in May 2008, Yahoo! Search has enabled thousands of developers to shape the search experience for millions of Yahoo! users. If you are interested in building semantic applications similar to what we’ve come up with at Yahoo! Search, here are some details to get you started.
What structured objects are available?
All of the objects listed on the SearchMonkey homepage are available to you. With the new feature “object refiners,” users can now restrict the search results to specific object types. Site owners contribute data of these objects by marking up their pages with RDF or microformats, or by providing dataRSS feeds. If you’re interested in the actual data of these objects, use the Yahoo! Search BOSS API to request the SearchMonkey data as part of the search request.
How can I access these structured objects?
The SearchMonkey team has been encouraging developers to use our structured data to build semantic Web applications ever since we partnered with BOSS. Using the BOSS API, you can access SearchMonkey structured objects.
To restrict the result set to pages with SearchMonkey objects, just add “searchmonkey:<objectType>” to your query. The result set from BOSS will only contain URLs that have objects of that type.
For example, the following query returns all of the documents in the Yahoo! Web index that has the words “Sunnyvale” and “pizza” – about 3 million pages.
But if you only want pages with local business objects on them, you can add “searchmonkey:local” to the query:
This query returns about 25,000 pages.
Yes, we’ve just thrown out over 90 percent of the result set – but we are after the most relevant results, not simply the greatest number of results. Our new object refiners use SearchMonkey’s structured data to narrow your query from “pizza+Sunnyvale” to actual local business listings within those results. You can use BOSS to retrieve the same structured data and construct any presentation you like.
You can take it a step further and add any of these terms to the query:
- searchmonkey:video – restricts the result set to videos.
- searchmonkey:product – restricts the result set to products.
- searchmonkey:local – restricts the result set to local businesses.
- searchmonkey:event – restricts the result set to events.
- searchmonkey:document – restricts the result set to presentations, spreadsheets, and similar document formats.
- searchmonkey:discussion – restricts the result set to blogs and forums.
- searchmonkey:game – restricts the result set to Flash games.
What don’t I get?
Not all structured data we’ve collected is part of the BOSS API. For example, some third parties who provide us with feeds have elected to keep that data outside of BOSS. Structured data annotations from technologies built by Yahoo! Research are also not available to third party developers via BOSS. However, we aim to include all data we find embedded in web pages that deploy microformats or RDFa.
Our goal is a successful semantic Web where we extract the semantics as we process Web content. Every page marked up with semantic data makes that much easier for us to extract meaning from that page. And it’s not just us! Google Video Search has recently adopted the same video markup (RDFa and Facebook Share) that SearchMonkey supports.
We will make many more object types available to you soon. In the mean time, you can learn more about SearchMonkey and how we acquire structured data annotations from this new from this post on the YDN Blog.
Senior engineering manager, Yahoo! SearchMonkey
- 9 Comments