<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Yahoo! Search Blog &#187; Interviews</title>
	<atom:link href="http://www.ysearchblog.com/category/interviews/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ysearchblog.com</link>
	<description></description>
	<lastBuildDate>Thu, 19 Nov 2009 17:12:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>An Interview with Dr. Rudi Studer on Semantic Search Technologies</title>
		<link>http://www.ysearchblog.com/2008/12/16/an-interview-with-dr-rudi-studer-on-semantic-search-technologies/</link>
		<comments>http://www.ysearchblog.com/2008/12/16/an-interview-with-dr-rudi-studer-on-semantic-search-technologies/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 17:39:59 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Interviews]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://ysearchblog.com/?p=676</guid>
		<description><![CDATA[Dr. Rudi Studer is no stranger to the world of semantic search. A full professor in Applied Informatics at University of Karlsruhe, Dr. Studer is also director of the Karlsruhe Service Research Institute, an interdisciplinary center designed to spur new concepts and technologies for a services-based economy. His areas of research include ontology management, semantic [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.aifb.uni-karlsruhe.de/Staff/Personen/viewPersonenglish?id_db=57" target="_blank">Dr. Rudi Studer</a> is no stranger to the world of semantic search. A full professor in Applied Informatics at <a href="http://www.uni-karlsruhe.de/" target="_blank">University of Karlsruhe</a>, Dr. Studer is also director of the <a href="http://www.ksri.uni-karlsruhe.de/" target="_blank">Karlsruhe Service Research Institute</a>, an interdisciplinary center designed to spur new concepts and technologies for a services-based economy. His areas of research include <a href="http://www.aifb.uni-karlsruhe.de/Staff/Forschungsgebiete/viewForschungsgebietenglish?fgebiet_id=156" target="_blank">ontology management</a>, <a href="http://www.aifb.uni-karlsruhe.de/Staff/Forschungsgebiete/viewForschungsgebietenglish?fgebiet_id=119" target="_blank">semantic web services</a>, and <a href="http://www.aifb.uni-karlsruhe.de/Staff/Forschungsgebiete/viewForschungsgebietenglish?fgebiet_id=102" target="_blank">knowledge management</a>. He has been a past president of the <a href="http://www.iswsa.org/" target="_blank">Semantic Web Science Association</a> and has served as Editor-in-Chief of the journal <a href="http://www.websemanticsjournal.org/" target="_blank"><em>Web Semantics</em></a>.</p>
<p>In addition to his duties as director of the KSRI, Dr. Studer is a vice president for <a href="http://www.sti2.org/" target="_blank">Semantic Technologies Institute International</a> and helped found <a href="http://www.ontoprise.de/" target="_blank">ontoprise GmbH</a>, an enterprise software company built around deploying semantic technologies. Dr. Studer recently gave a talk at Yahoo! about semantic technologies, and he was kind enough to answer a set of follow-up questions about the future of semantic search.</p>
<p><strong>Yahoo! (Y!): </strong>Could you please tell us about your research on semantic search at the University of Karlsruhe?</p>
<p><strong>Rudi Studer (RS):</strong> We look at semantic search as a process of information access, where one or several activities can be supported by semantic technologies. These activities include preprocessing and extraction of information, the interpretation of user information needs, the actual query processing, the presentation of results, and finally, the processing of user feedback for subsequent queries and to generate improved refinements. In all of these steps, semantic technologies can be exploited. For example, with respect to interpreting user information needs, we work on techniques to automatically translate information needs, expressed in either natural language queries or keyword-based queries, into expressive queries that are specified in structured query languages, such as <a href="http://www.w3.org/TR/rdf-sparql-query/" target="_blank" target="_blank">SPARQL</a>.</p>
<p><strong>Y!:</strong> Early on, semantic technologies drew criticism for overestimating their own short-term impact and failing to embrace some of the realities of the Web. In what ways do you think the semantic web community has matured since then?</p>
<p><strong>RS:</strong> It’s true that in the Semantic Web community a lot of emphasis has been put on Semantics rather than on <em>Web</em> aspects. But, important to note, semantic technologies are not only about the Web. Many of these technologies, e.g. in the context of <a href="http://en.wikipedia.org/wiki/Enterprise_Information_Integration" target="_blank">Enterprise Information Integration</a>, were indeed successful in closed and controlled environments. Now, we’re beginning to see that these technologies are more and more applied to open Web environments, as well.</p>
<p>Of course there have also been many developments that focus on Web aspects in particular. In the context of combining Web 2.0 and Semantic Web technologies, we see that the Web is the central point. In terms of short term impact, Web 2.0 has clearly passed the Semantic Web, but in the long run there is a lot that Semantic Web technologies can contribute. We see especially promising advancements in developing and deploying lightweight semantic approaches.</p>
<p><strong>Y!:</strong> In principle, semantic technologies should be able to help search engines more precisely match the user&#8217;s intent with the content on the page. But again, this has proven to be harder to realize than originally expected. Are we getting closer to the solution?</p>
<p><strong>RS:</strong> No one ever said that it was going to be easy! But yes, we are getting closer. As I indicated before, many of the technologies today work well in closed environments (e.g. Enterprise scenarios), but do not necessarily scale to the Web (yet). But of course there is improvement on that side as well. <a href="http://www.powerset.com/" target="_blank">Powerset</a> (acquired by Microsoft this year), for example, is a good indicator of where we’re headed and certainly a proof point that we’re getting closer.</p>
<p><strong>Y!: </strong>The semantic web suffers from a chicken-and-egg problem, where developers are unwilling to create applications due to a lack of metadata, and publishers are unwilling to expose metadata due to a lack of applications. What are some of the ways to break out of this deadlock?</p>
<p><strong>RS:</strong> There are two solutions to this: First, we need to make it easier for publishers to produce semantic metadata and second, we need to make the benefits more obvious for the application developers.</p>
<p>With regard to the first aspect, a lot of the data is already available in structured form (e.g. in databases of the deep web), and technically straight-forward to expose in the form of RDF. The <a href="http://linkeddata.org/" target="_blank">Open Linked Data</a> Initiative is a good example of large numbers of data sources that have been published as RDF data. Then there is the unstructured data. Technologies like semantic wikis (e.g. the <a href="http://semantic-mediawiki.org/" target="_blank">Semantic MediaWiki</a>) allow the easy and seamless construction of semantic metadata as the content is produced.</p>
<p>The benefits of semantic metadata are becoming more and more obvious.  At <a href="http://iswc2008.semanticweb.org/" target="_blank">this year&#8217;s ISWC</a> the Billion Triple Challenge uncovered a number of useful applications that show the benefits of combining existing Semantic Web data sources in an intelligent way.</p>
<p><strong>Y!:</strong> How do you think major search engines supporting semantic technologies might contribute to the growth of the semantic web?</p>
<p><strong>RS: </strong>Once search engines index Semantic Web data, the benefits will be even more obvious and immediate to the end user. Yahoo!&#8217;s <a href="http://developer.yahoo.com/searchmonkey" target="_blank">SearchMonkey</a> is a good example of this. In turn, if there is a benefit for the end user, content providers will make their data available using Semantic Web standards.</p>
<p><strong>Y!:</strong> What do you think are some of the commercial opportunities left to be explored by semantic technologies?</p>
<p><strong>RS: </strong>So far, semantic technologies have been used in commercial products for data integration, enterprise semantic search and content management, etc. I expect this area to grow, but prospectively I see more and more potential for business opportunities in the combination of the social web and semantic technologies as well as in the context of mashups. An area that is also still largely unexplored is the area of advertisements in the context of semantic search.</p>
<p><strong>Y!:</strong> What are some of the pitfalls that developers run into when they first start investigating or deploying semantic metadata?</p>
<p><strong>RS:</strong> One problem in the early days was that the tool support was not as mature as for other technologies. This has changed over the years as we now have stable tooling infrastructure available. This also becomes apparent when looking at the at this year&#8217;s <a href="http://challenge.semanticweb.org/" target="_blank">Semantic Web Challenge</a>.</p>
<p>Another aspect is the complexity of some of the technologies. For example, understanding the foundation of languages such as <a href="http://www.w3.org/TR/owl-ref/" target="_blank" target="_blank">OWL</a> (being based on <a href="http://en.wikipedia.org/wiki/Description_logic" target="_blank">Description Logics</a>) is not trivial. At the same time, doing useful stuff does not require being an expert in Logics – many things can already be done exploiting only a small subset of all the language features.</p>
<p><strong>Y!:</strong> If you&#8217;re a front-end developer who&#8217;s interested in finding out more about semantic metadata, where should you get started?</p>
<p><strong>RS: </strong>There are now numerous books out there, e.g. <em><a href="http://www.amazon.com/Semantic-Primer-Cooperative-Information-Systems/dp/0262012103/" target="_blank">Antoniou/van Harmelen: A Semantic Web Primer</a></em>, <em><a href="http://www.amazon.com/Semantic-Web-Technologies-Research-Ontology-based/dp/0470025964/" target="_blank">Davies et al. (eds.): Semantic Web Technologies</a></em>, and <a href="http://www.amazon.com/Handbook-Ontologies-International-Handbooks-Information/dp/3540408347/" target="_blank"><em>Staab/Studer (eds.): Handbook on Ontologies</em></a>.  There is also a large collection of video lectures at videolectures.net.</p>
<p>Of course the W3C recommendations for <a href="http://www.w3.org/RDF/" target="_blank">RDF</a>, <a href="http://www.w3.org/TR/owl-ref/" target="_blank" target="_blank">OWL</a> and <a href="http://www.w3.org/TR/rdf-sparql-query/" target="_blank" target="_blank">SPARQL</a> are a useful reference. For inspiration, I recommend looking at some of the sites exploiting semantic technologies, e.g. <a href="http://semanticweb.org/" target="_blank">semanticweb.org</a>, <a href="http://twine.com/" target="_blank">Twine</a>, or <a href="http://www.freebase.com/" target="_blank">Freebase</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ysearchblog.com/2008/12/16/an-interview-with-dr-rudi-studer-on-semantic-search-technologies/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Yahoo! Chats with Semantic Web Expert, Ben Adida</title>
		<link>http://www.ysearchblog.com/2008/06/24/yahoo-chats-with-semantic-web-expert-ben-adida/</link>
		<comments>http://www.ysearchblog.com/2008/06/24/yahoo-chats-with-semantic-web-expert-ben-adida/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 16:30:00 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://ysearchblog.com/blog/2008/06/24/yahoo-chats-with-semantic-web-expert-ben-adida/</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Yahoo!&#8217;s plans to &#8220;open up&#8221; really started circulating at the beginning of this year. Not long after, Yahoo! Search <a href="http://www.ysearchblog.com/archives/000527.html">announced</a> its plans to support semantic mark-ups, specifically our crawler support for markups like <a href="http://en.wikipedia.org/wiki/RDFa" target="_blank">RDFa</a> and <a href="http://en.wikipedia.org/wiki/Embedded_RDF" target="_blank">eRDF</a>, as well as provided a glimpse into our open approach to search.</p>
<p>As Yahoo! prepares to support standards, like RDFa for example, we&#8217;ve continued to work closely with the best and brightest in the semantic markup community. We were thrilled to have <a href="http://ben.adida.net/" target="_blank">Ben Adida</a> visit the Sunnyvale campus a few weeks ago. Ben is a member of the Faculty at Harvard Medical School and at the Children&#8217;s Hospital Informatics Program, as well as a research fellow with the Center for Research on Computation and Society with the Harvard School of Engineering and Applied Sciences. He is also the Creative Commons representative to the <a href="http://www.w3.org/Consortium/Process/Process-19991111/tr.html" target="_blank">W3C</a> and chair of the <a href="http://www.w3.org/2001/sw/BestPractices/HTML/" target="_blank">RDF-in-HTML task force</a>, focusing on bridging the semantic and clickable webs.</p>
<p>Ben was kind enough to submit himself to a barrage of questions on RDFa, its development and the opportunities it provides. Take a look and feel free to drop questions you have in the comments. We&#8217;ll do our best to cycle them through to Ben.</p>
<p>Lawrence Kim, Yahoo! Search &#038;<br />
Peter Mika, Yahoo! Research</p>
<p><b>Yahoo! (Y!): RDFa has been long in the making&#8230; is it ready now?</b><br />
Ben Adida (BA): Indeed it has been long in the making, and for good reason. We had to make sure we didn&#8217;t step on other specifications&#8217; toes, that we respected existing design and uses of HTML, that we enabled the expression of enough flexible data to be useful in a number of current and future use cases, and that we had a valid processing model with test cases to help implementors.</p>
<p>We have all of that now. So yes, RDFa is ready. It has just been approved by the W3C as a Candidate Recommendation, with the specific text of the specification and a brand new Primer published on June 20th.</p>
<p><b>Y!: What can I do with RDFa?</b><br />
BA: You can tell the world what various components on your web page mean by marking up things like:</p>
<ul>
<li>The title of a photo</li>
<li>Your name and contact information</li>
<li>The license under which you&#8217;re distributing your latest MP3</li>
<li>The ingredients of a cooking recipe</li>
<li>The price of an item</li>
<li>A gene on which you recently wrote a paper</li>
<li>&#8230; Anything that you want to make more machine-readable</li>
</ul>
<p>With RDFa, you can reuse existing concepts, e.g. the title and price of an item, no matter what that item is. If there&#8217;s a field you need that doesn&#8217;t exist, you can create it.</p>
<p>This level of granularity encourages you to mark up your content as fully as possible, while letting applications consume only as much of the data as it needs.</p>
<p><b>Y!: Who is supporting RDFa?</b><br />
BA: Creative Commons and Digg are two early adopters of RDFa, and there are a number of smaller web publishers who have begun adding RDFa markup to their pages. Weâve also just heard that the UK National Archives are committed to adopting RDFa.</p>
<p><b>Y!: What advantages does RDFa provide compared to microformats, eRDF and AB Meta?</b><br />
BA: Microformats, eRDF and RDFa share a common goal: to make it easy for HTML authors to add machine-readable tags to express the meaning of their web data. So before we get into a fight, it&#8217;s important to realize that all three share this important common goal.</p>
<p>Microformats work well for well-defined items, such as contact information (hCard) and calendar items (hCal). They tend to become more complicated when the data gets more varied. Fields can&#8217;t easily be shared across microformats, and all microformats must go through a centralized approval process to make sure no conflicts arise.</p>
<p>RDFa doesn&#8217;t have vocabulary conflicts: data fields, e.g. &#8220;title&#8221; can be reused by anyone, and there&#8217;s never any confusion as to what a given field means, since fields are, in fact, URLs. Entirely different types of data can share fields, which is exactly what applications need for extensibility. Multiple data items can be published on a single web page and, in contrast with microformats, relationships between the data items can be easily expressed.</p>
<p>eRDF has a similar vocabulary approach to RDFa, but it cannot express nearly as much data as RDFa. In particular, expressing relations between multiple items on a page is more complicated, and describing inline PDFs or images is not always possible. Also, eRDF is not quite as modular: vocabularies can only be imported in the HEAD of a document, so a widget-ized page would have an easier time using RDFa over eRDF.</p>
<p>AB Meta, which is new to me, appears to be a small subset of the intersection between RDFa and eRDF. Because it is a limited subset, it suffers a bit from the limitations of microformats: who gets to extend AB Meta? I would recommend sticking to the collaborative efforts such as RDFa and eRDF.</p>
<p>If you need more complete expressivity and the modularity required in a widget-ized web world, then you need RDFa.</p>
<p><b>Y!: What would you say to the critics who say that RDFa is too difficult to author?</b><br />
BA: It&#8217;s a matter of taste and finding the right compromise.</p>
<p>In my opinion, RDFa and eRDF have similar levels of complexity as far as authors are concerned. I prefer writing RDFa, and I&#8217;m sure <a href="http://iandavis.com/blog/about" target="_blank">Ian Davis</a> prefers writing eRDF. But I don&#8217;t think either one of us would seriously argue that one is much easier than the other.</p>
<p>It&#8217;s a little bit more complicated to write RDFa than it is to write microformats, but that&#8217;s not surprising given that microformats are more limited in scope, and there are notable extensibility costs to using microformats.</p>
<p>In general, we expect that web publishers will write RDFa in HTML templates, rather than every time they have an item to publish. Most microformat deployments work this way, too, few people write them by hand each time. So the increased complexity is negligible in the bigger picture.</p>
<p><b>Y!: Unlike microformats, RDFa depends on the availability of shared vocabularies (ontologies). Is that a problem?</b><br />
BA: A number of vocabularies are already available and particularly stable: <a href="http://en.wikipedia.org/wiki/Dublin_Core'" target="_blank">Dublin Core</a> for documents, <a href="http://en.wikipedia.org/wiki/FOAF_(software)" target="_blank">FOAF</a> for people and their networks, <a href="http://en.wikipedia.org/wiki/Creative_Commons_licenses" target="_blank">Creative Commons</a> for document licensing, <a href="http://microformats.org/wiki/haudio" target="_blank">hAudio</a> and hVideo for online media. Then there are highly specialized vocabularies, like <a href="http://www.pir.uniprot.org/" target="_blank">Uniprot</a> and the <a href="http://en.wikipedia.org/wiki/Open_Biomedical_Ontologies" target="_blank">Open Biomedical Ontologies </a>(OBO) for the life sciences.</p>
<p>In my opinion, this is a huge win for RDFa. You really want vocabularies developed by experts in the appropriate field. Bio-informaticians develop vocabularies for biomedical research, musicians develop vocabularies for music, and lawyers develop vocabularies for copyright licensing.</p>
<p><b>Y!: What&#8217;s next for RDFa?</b><br />
BA: For the next few months, we&#8217;re going to focus on helping publishers produce RDFa and tool builders parse it correctly. Yahoo! is playing a pivotal role in this space with SearchMonkey. We hope to see Yahoo! properties publish RDFa soon!</p>
<p><b>Y!: Where can I learn more about RDFa?</b><br />
BA: Our wiki has all the relevant material: <a href="http://rdfa.info/wiki" target="_blank">http://rdfa.info/wiki</a></p>
<p>And you should join our brand new users&#8217; mailing list: <a href="http://lists.w3.org/Archives/Public/public-rdfa/" target="_blank">http://lists.w3.org/Archives/Public/public-rdfa/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ysearchblog.com/2008/06/24/yahoo-chats-with-semantic-web-expert-ben-adida/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A Chat with Yahoo! Research Director Ricardo Baeza-Yates</title>
		<link>http://www.ysearchblog.com/2006/10/16/a-chat-with-yahoo-research-director-ricardo-baeza-yates/</link>
		<comments>http://www.ysearchblog.com/2006/10/16/a-chat-with-yahoo-research-director-ricardo-baeza-yates/#comments</comments>
		<pubDate>Tue, 17 Oct 2006 06:38:56 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/10/16/a-chat-with-yahoo-research-director-ricardo-baeza-yates/</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p><img alt="rbaeza-mayo-2006-3.jpg" src="http://www.ysearchblog.com/archives/rbaeza-mayo-2006-3.jpg" width="127" height="180" border="0" align="right" hspace="4" vspace="4"/><br />
<a href="http://www.baeza.cl/" target="_blank">Ricardo Baeza-Yates</a> is the Director of the Yahoo! Research Labs in Barcelona, Spain and Santiago, Chile.  Prior to joining Yahoo! Research, Ricardo was the Director of the Center for Web Research at the <a href="http://www.dcc.uchile.cl/" target="_blank">Department of Computer Science of the University of Chile</a>; and <a href="http://www.icrea.es/" target="_blank">ICREA</a> Professor at the <a href="http://www.upf.es/dtecn/" target="_blank">Department of Technology of the University Pompeu Fabra in Barcelona</a>. He maintains ties with both universities as a part-time professor, leveraging his affiliations with both to collaborate on joint research.  We sat down with Ricardo to discuss his role in starting Yahoo! Research Labs in Spain and Chile and his thoughts on today?s web search.</p>
<p><B>On Joining Yahoo!</B><br />
Q.  What is the most exciting part of expanding research offices in new regions for Yahoo!?<br />
A.  There are a couple of things that have made my role at Yahoo! especially exciting.  The first has been to build a lab from scratch ? to start something from nothing.  The second part is building the lab with many different people from around the world that want to be a part of creating new ideas.  The research being explored in Europe and South America can be very different from that of the U.S. and it?s been impressive to see talent coming from different backgrounds and regions.</p>
<p>Q.  How does your affiliation with the <a href="http://www.cwr.cl/" target="_blank">Center for Web Research</a> complement your research endeavors at Yahoo!?<br />
A.  I founded the Center for Web Research almost five years ago at the University of Chile in Santiago through a large <a href="http://www.mideplan.cl/milenio/" target="_blank">Millennium Program</a> grant from the Chilean Planning Ministry, and served as the first director of the program.  I continue to be associated with the Center for Web Research to create a synergistic relationship with Yahoo! Research. Although both centers are completely separate ventures, we collaborate on joint research.  A similar symbiosis happens in Barcelona.</p>
<p>Q:  What are your top three goals for incorporating web search and web data mining into Yahoo!?s research?<br />
A:  Web search and web data mining is successfully practiced already among Yahoo! research experts in the U.S., but I hope to add new knowledge, particularly in the latter field.  The main three goals for me are to explore the potential of all web-related information ? to improve current systems, find new ideas for products or services, and discover new ways to analyze information ? for many, many different purposes.  Also, to leverage the different backgrounds and expertise here at the Yahoo! Research Center in order to obtain a fresh look, a new perspective and a different angle that will allow us to come up with new breakthroughs around existing problems.  And finally, I think utilizing our location as a tie-in to strengthen European search will be important ? for example search in non-English languages.</p>
<p>Q:  Looking back over the last 9 months, what has been your most exciting professional challenge?<br />
A:  The most exciting and rewarding part of developing the Yahoo! Research Center in Spain and Chile has been establishing the research group and I think we have finished the initial stage.  We have various researchers that span across several different countries, including Belgium, Brazil, Chile, France, Greece, Italy, Spain, The Netherlands and U.K. These folks have an excellent research background and ultimately had an interest in participating in our research adventure.  I think we have a pretty good mix of people, not only technically, but that also people that bring in a positive attitude and open minds.</p>
<p><B>On Today?s Web Search</B><br />
Q: Do you find search usage patterns different in the various parts of the world?<br />
A:  In my experience the main usage patterns within search are not really different.  The language changes, but the statistics are very similar.  The purpose of the search may be a little bit different but currently there are no studies of the categories of search ? i.e. entertainment, e-commerce, etc.  Perhaps this is a study we look to do in the future.</p>
<p>I think usage patterns change according to the devices being used ? going from a PC to a mobile device will change the patterns, and this could be influenced by regional locality or different cultural issues.</p>
<p>Q:  How is the growth of social media, such as blogs, vlogs and social networks, impacting and challenging web search?<br />
A: Social media implies user generated content &#8211; that is, people doing things like tagging content or media, commenting on pictures and text, etc.   However, it also could imply other user actions, like clicking on links or asking queries. This contributed explicit and implicit knowledge can be used, for example, to improve search.  The collective knowledge of all of these contributors is more then the knowledge of any expert on any topic.  It?s the collection that makes up what?s called the wisdom of the crowds.  Hence, social media provides the knowledge of many, many people that is encoded and we only have to decode it to be able to utilize the knowledge to strengthen search.  So the main challenge is basically how to decode this information to better understand the Web, not as individual users, but as a collective aggregation of all of them.</p>
<p><B>On Other Things</B><br />
Q: What Chilean dish should be added to URLs, the Yahoo! cafeteria in Sunnyvale?<br />
A:  Ah, two of my three favorite things to talk about ? food and wine.  I don?t know if I can narrow it down to one favorite dish.  If it?s okay, I?ll tell you a few.  My favorites would probably be ceviche in either Peruvian or Chilean style, but don?t put ketchup on it like in North America? it ruins it!  I also favor Pastel de Choclo, which is like a meat, chicken and corn baked pie, and of course, Chilean Empanadas ? probably filled with seafood is my favorite.</p>
<p>Q: Napa Valley wine or Chilean wine?<br />
A:  That?s an easy answer.  Chilean wine, specifically a <a href="http://www.winepros.org/wine101/grape_profiles/carmenere.htm" target="_blank">Carmenere grape</a>, which can only be found in Chile. You get one of the best price/quality ratios in the wine world.</p>
<p>Q: Anything else you?d like to share?<br />
A: Sure &#8211; I love old maps and in general geography. I love applied geography &#8211; traveling. In my office I have an upside-down map of the world to remind people that there is always another valid point of view, something that a researcher should never forget.</p>
<p>Ricardo will be traveling for invited talks to the Czech Republic in January for the <a href="http://www.cs.cas.cz/sofsem/07/" target="_blank">Current Trends in Theory and Practice of Computer Science Conference</a>, off to Istanbul, Turkey in April for the <a href="http://www.srdc.metu.edu.tr/webpage/icde/index.php" target="_blank">International Conference on Data Engineering</a>, and to Warsaw, Poland in September for the <a href="http://www.ecmlpkdd2007.org/" target="_blank">European Conference on Principles and Practice of Knowledge Discovery in Databases</a>. You can also catch him at the <a href="http://www2007.org/" target="_blank">World Wide Web Conference</a> (coming to Banff, Canada, in May) and the <a href="http://www.sigir2007.org/" target="_blank">ACM SIGIR Conference</a> (Amsterdam, The Netherlands, in July).</p>
<p>Thanks, Ricardo!</p>
<p>- Yahoo! Search blog team</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ysearchblog.com/2006/10/16/a-chat-with-yahoo-research-director-ricardo-baeza-yates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In the Lion&#8217;s Den</title>
		<link>http://www.ysearchblog.com/2006/08/10/in-the-lions-den/</link>
		<comments>http://www.ysearchblog.com/2006/08/10/in-the-lions-den/#comments</comments>
		<pubDate>Fri, 11 Aug 2006 01:20:34 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/08/10/in-the-lions-den/</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Today we?re pleased to bring you a post by Jan Pedersen, Chief Scientist for Yahoo!?s Search and Marketplace. Jan began his career at Xerox&#8217;s <a href="http://www.parc.com/" target="_blank">Palo Alto Research Center</a> (PARC) where he managed a research program on information access technologies, and then went on to work with Verity, Infoseek, and Alta Vista, purchased by Overture.</p>
<p>Today, Jan is a very familiar sight around our campus, often in the company of the search industry?s leading minds ? folks that work at Yahoo!, like Andrei Broder, as well as many others from IBM, Google, Microsoft, and major academic institutions here and abroad. The role these scientists play in advancing the search industry and raising the game for all of us cannot be underestimated. Jan is also on the board for the <a href="http://www.acm.org/" target="_blank">Association for Computing Machinery</a> special interest group for information retrieval, <a href="http://www.sigir2006.org/" target="_blank" target="_blank">SIGIR</a>.</p>
<p><HR NOSHADE SIZE="1"></p>
<p>The <a href="http://www.sigir2006.org/" target="_blank" target="_blank">29th annual meeting of the SIGIR</a> is currently taking place among the picturesque buildings of the University of Washington in Seattle. Though last year&#8217;s was in Brazil, this year?s conference is very well attended indeed; more than 700 hundred academics, industry scientists and other search aficionados have gathered together to hear the thoughts of the very select 20% of submitters who cleared the tough referee bar this year.  The winning papers range in topic from web search (of special interest to us, although fairly new to this audience) to papers addressing the backbone issues in machine learning, efficiency and system evaluation.</p>
<p>My current favorite, which also happened to win the best paper award at the conference banquet last night (here are some <a href="http://www.flickr.com/photos/sigir2006" target="_blank">photos</a>), describes how through clever sampling techniques one can <a href="http://portal.acm.org/citation.cfm?id=1148219&#038;coll=portal&#038;dl=ACM&#038;CFID=3221603&#038;CFTOKEN=44408596" target="_blank">dramatically reduce the editorial cost of a comparative search engine evaluation</a>. Another interesting paper by the folks at Microsoft describes how they might incorporate user click behavior and other feedback into their search engine ranking.</p>
<p>Speaking of Microsoft, their presence at the conference is large and impressive, not only because of the conference?s proximity to the Redmond campus this year, but also because the various Microsoft research groups are hogging the limelight with twelve papers, around 17% of the total program, an unprecedented showing.   Yahoo! is presenting three papers (all of extraordinary quality&#8230;) and Google is presenting two.  But to twist the lion?s tail, too bad search share isn?t in the same order ;-)</p>
<p>Mixing with colleagues I haven?t seen for years is of course a key dimension of these events. Would you believe it ? we search scientists know how to party! The conference banquet was quite an event, a luau-like salmon dinner with the required (but thankfully brief) display of dance performace, and the ferry ride across Puget Sound in the evening with the Seattle skyline laid out before us was extraordinarily beautiful.  Of course the Yahoo! reception Tuesday night at the <a href="http://www.sfhomeworld.org/" target="_blank">Science Fiction History Museum</a> set exactly the right tone &#8212; Yahootini?s were had by all.</p>
<p>Jan Pedersen<br />
Chief Scientist, Search and Marketplace</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ysearchblog.com/2006/08/10/in-the-lions-den/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A chat with Andrei Broder (Part II)</title>
		<link>http://www.ysearchblog.com/2006/03/09/a-chat-with-andrei-broder-part-ii/</link>
		<comments>http://www.ysearchblog.com/2006/03/09/a-chat-with-andrei-broder-part-ii/#comments</comments>
		<pubDate>Thu, 09 Mar 2006 17:00:10 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/03/09/a-chat-with-andrei-broder-part-ii/</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>Last week, we published the <a href="http://www.ysearchblog.com/archives/000257.html">first of a three-part interview</a> with Andrei Broder, Yahoo! Research Fellow and VP of emerging search technology for Yahoo! In today&#8217;s segment, we spend some time chatting with Andrei about what he means by &#8217;search without a box&#8217; and moving from information retrieval to information supply.</p>
<p><B>Where do you see web search being right now?</B></p>
<p>Some things still haven&#8217;t been solved. If you look back at papers in the WWW conferences from the mid-90s, about duplication, crawling strategies, web graph analysis, and so on, they are still relevant now. All the problems are still with us and plenty of improvements are possible. In the same vein if you look at cars, you still have technology improvements in steel, engines, structural framing, but the focus of research is on hybrid cars and so on. For web search I believe that the next stage for research is on the side of Information Supply and the integration of multiple sources.</p>
<p><B>Would you say we (as an industry) have made good steps since the beginning days of search?</B></p>
<p>Yes, absolutely. When AltaVista first came out, we needed 3 months to build a 30 million corpus of documents, and it had lots of duplicates and other problems.  But, a 50,000 corpus was big in the early 90s. Then &#8216;big&#8217; meant millions, and now &#8216;big&#8217; is tens of billions.  It wasn&#8217;t just quantitative, but qualitative, improvements that happened and made web search much better.</p>
<p><B>So, Andrei, where do we go from here?</B><br />
My paper on the <a href="http://portal.acm.org/citation.cfm?doid=792550.792552" target="_blank">taxonomy of Web search</a> talks about three generations of web search. I believe that we are now entering an entirely new phase. I call this next phase &#8217;search without a box&#8217;. Search today is confined to putting in something and getting something back, a pull model. The next stage is for information to come in a context without actively searching, a push model. My favorite example is GPS.  Instead of looking up your way on a paper map, you are in your car, and your GPS navigator gives you directions,  shows gas stations near you, and so on.  A year or two from now perhaps it will show you where those gas stations are, but only when you are low on gas.  So you get information on an &#8216;as needed, when needed&#8217; basis without explicitly asking for it.  In the same vein, we will move from information retrieval to information supply.</p>
<p><B>Is RSS like that?</B></p>
<p>Alerts are an information supply that answers recurrent needs. What I&#8217;m talking about is more contextual. For example, advertising is a form of contextual information supply.  The key is for the supply to be appropriate to the context. For instance in a skiing magazine &#8216; ads for skis are a perfectly desirable form of content. Information supply as a science will continue to grow because of advertising.</p>
<p><B>And those are some of the things you are working on?</B><br />
Yes, I am trying to understand how the information supply will take shape&#8211; there is a fine line between annoying and useful. We also want the user to help define their role in this. You have to understand the context, the user, and the social effects. If we understand what other people like you are doing, we can sometime move from information retrieval to information supply by understanding the class of equivalent users.  But we still do not have a theory of information supply, or a definitive model. It&#8217;s completely open area. it is not  necessarily something we&#8217;ll see next year, but it&#8217;s the next stage.</p>
<p>In fact, we&#8217;re already pretty good in some contexts, commerce sites for example. You go to a travel site, you do a search, you find that the temperature is nice or stormy or whatever, and here are some hotels where you might like to stay, and here are some things you might want to do, etc. That&#8217;s already a case of information supply. But we have to come up with how to do it in less constrained contexts.</p>
<p>Essentially we&#8217;re going from 2.7 words per query to 0! How do we do that? There&#8217;s a funny Dilbert cartoon about buying things online, instead of 1-click shopping, you have 0-click shopping. If you don&#8217;t say no fast enough, Dogbert ships you something! (He laughs). It&#8217;s tricky, you need a lot of magic behind the curtains and good UI to hide it; it&#8217;s a good research direction.</p>
<p><B>Know where this cartoon is? Drop a comment below. Next week, in our third and final segment, Andrei fields several reader questions. Stay tuned!</B></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ysearchblog.com/2006/03/09/a-chat-with-andrei-broder-part-ii/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>&#8220;Search without a box&#8221; &#8211; A chat with Andrei Broder (Part 1)</title>
		<link>http://www.ysearchblog.com/2006/03/03/search-without-a-box-a-chat-with-andrei-broder-part-1/</link>
		<comments>http://www.ysearchblog.com/2006/03/03/search-without-a-box-a-chat-with-andrei-broder-part-1/#comments</comments>
		<pubDate>Fri, 03 Mar 2006 19:10:29 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/03/03/search-without-a-box-a-chat-with-andrei-broder-part-1/</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p>A while back, we spent an hour interviewing a new colleague of ours, Andrei Broder. Andrei joins our talented team here at Yahoo!, in the role of Yahoo! Research Fellow and Vice President of Emerging Search Technology. Andrei&#8217;s decades-long career in search includes his time at AltaVista as vice president for research and chief scientist, and as we noted before, Broder is co-winner of the Best Paper award at WWW6 for his work on <a href="http://www.std.org/%7Emsm/common/clustering.html" target="_blank">duplicate elimination of web pages</a> and at WWW9 for his work on <a href="http://www.almaden.ibm.com/cs/k53/www9.final/" target="_blank">mapping the web</a>.</p>
<p>In this first segment of a three-part interview, we asked Andrei about his decision to come to Yahoo!, and generally got out of the way as we listened in on Andrei&#8217;s extraordinary relationship with search. We have combined the normal Q&#038;A format with some audio for your listening pleasure.</p>
<p>Happy reading!<br />
Tim, Jeremy and Tara</p>
<p><HR NOSHADE SIZE="1"></p>
<p><img alt="broder.jpg" src="http://www.ysearchblog.com/archives/broder.jpg" width="150" height="200" border="0" /></p>
<p><B>When it was announced that you were joining Yahoo!, you mentioned in an interview that you knew you&#8217;d be disappointing 2/3rds of your friends. Why did you say that?</B></p>
<p>Well, the industry is pretty small, and I had offers from Yahoo! and the other big guys in search. I have many friends at all three, and no matter which one I chose, two-thirds of my friends would be unhappy that I didn&#8217;t chose them!</p>
<p><B>So&#8217; why did you choose Yahoo!?</B></p>
<p>My background is research. People often ask what is the difference between research and advanced development. It&#8217;s a very interesting question these days, because it used to be that research looks five years forward and advanced development is much shorter term. That&#8217;s not true any longer because the cycle has become so short.  Research and advanced development are beginning to sync up.</p>
<p>But there is a fundamental difference: The goal of research is to advance the state of the art in the world.  The entire research community together advances the state of the art. Companies, such as IBM and Microsoft, support research because the pie gets larger and everyone benefits. Yahoo! intends to pursue a similar open approach to development, research and publishing and the research environment and goals at Yahoo are more compelling to me right now.</p>
<p><B>Where were you before?</B></p>
<p>I was in New York, but I am very glad to be back in California. I was working in Hawthorne, just outside of Manhattan, and lived in Riverdale, it was nice. There&#8217;s no place like New York, culturally. And by the way, we have offices in New York; Yahoo! Research has an outfit there in the old HotJobs office.</p>
<p><B>What do you do outside of work?</B></p>
<p>I ski. I broke my shoulder skiing four years ago, and now that I&#8217;ve moved back to California, I&#8217;m ready to go skiing again!</p>
<p><B>Have you ever had an epiphany about your research or work while you were skiing?</B></p>
<p>Ha! Not while skiing, but While I was at AltaVista, I traveled a lot. On a trip from Rome to Zurich, I was writing email and doing other things you normally do on a business trip, and seated next to me was a Korean-American girl, 9 years old, very talkative. She was asking me lots of questions, what do you do, what kind of computer is that. And I was telling her I work at AltaVista, and she said, &#8216;Oh, I know that: it&#8217;s a search engine! But we are not allowed to use it.&#8217; So a precocious 9 year old knows what I am working on. And that was pretty amazing. If I had said Digital or Compaq, she would have no idea what I was talking about. That&#8217;s the magic of the web.</p>
<p><B>At what point did you decide to get into search?</B></p>
<p><I>In this audio segment, Andrei talks about his graduate student roots, his advisor Don Knuth&#8217;s impact on his future, and one of his earliest, and best known papers on New Duplicates.</I></p>
<p><a href="http://ysearchblog.com/files/andrei.mp3" target="_blank">Download</a></p>
<p><B>That&#8217;s it for today. Next week, Andrei talks about moving from information retrieval to information supply, and &#8217;search without a box.&#8217;</B></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ysearchblog.com/2006/03/03/search-without-a-box-a-chat-with-andrei-broder-part-1/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
