« Vegas Baby! | Main | All Streaks Must End »
An Interview with Tim Converse
Tim Converse isn't your average search geek or for that matter, your average guy.
Though he's ostensibly shy and unassuming, he has a definite adventurous streak. Tim has tried everything from skydiving to African safaris, he likes rap (but only old school) and he once played keyboard in a punk rock band. But what really sets Tim apart is his knowledge of the inner workings of search. Unlike most of us who may have a decent understanding of the search world (or us novices who know just enough to type in a query and hit the "search" button), Tim understands the mysteries behind what makes it all work and how to make it better.
As an engineering manager in Yahoo!'s Content group, Tim and his group help make search results more relevant.
Jeremy and I spoke with Tim last week. Here's what we learned about content classification, what Tim likes to do for fun, and some little-known facts about Yahoo!'s obsession with foosball.
JQ: You've said that your group is charged with content classification. What exactly is content classification and why is it important to search?
A: Well, the more we know about documents the better. So part of what the Classification group does is label web pages and sites, or put them into categories. And while I can't get into specifics about the categories we use, a big part of this is trying to detect who's spamming us--or trying to trick us into ranking their sites higher in our search results.
Our classification code gets deployed in the Content system, which does the crawling and indexing to build search indexes that we end up serving queries from. That's mainly for our own group YST [Yahoo! Search Technology], which handles the back end of web search, but we also provide data to other groups, including Image Search.
My group also writes tools to interact with the Content system. We can query it in all sorts of ways to find out what's happening with particular sites or URLs. This is a challenge because the Content system is very distributed and heterogeneous.
YQ: It seems to me that if you're writing code for something, at some point, you've written it and it's done...
A: Well, we're never really "done." A few years ago my cousin asked me what I was working on and I told him "Excite's web search engine." He said, "so that would imply it's not done? Or it needs work or what?" (laughs) And so yeah, especially with how competitive the market is, these things are always under development. There's far more ways we can think of to make it better than we have engineers to do it. So even just with our list of ideas right now, we could be going for five years and there are always new ideas.
I should point out that although we're focused on deploying code for YST, there's a lot of expertise in the company and several different groups of scientists focused on classification. A lot of the challenge for me is just managing to benefit from that expertise for YST...
JQ: And connecting those dots in the company?
A: Yeah. It's kind of a cool job because I'm sort of in between scientists and programmers and there's such a spectrum of roles and responsibilities. We have people all the way from kernel hackers to linguists and needless to say, a kernel hacker can't really talk intelligently to a linguist or vice versa but you have to have this long chain of people who can really talk to each other so I've kind of got scientists on one side and programmers on the other.
YQ: What has been the biggest change in the way you approach writing the code and how you approach content classification?
A: I don't think the way we think about writing the code has changed. The way we're approaching search itself has changed a lot.
For instance, comprehensiveness is a much bigger deal these days. In the Inktomi days we wanted just one copy of anything that was good because serving documents costs so much. Now we'd really like to have everything.
So then the challenge is ranking everything appropriately. You really want to put everything out there but then...
JQ: ...that assumes there'll be a lot of junk?
A: Right and so then the challenge is identifying and appropriately ranking it all.
The big things for us are "relevance," "comprehensiveness," "freshness," and "presentation." That's "RCFP" and it's kind of our mantra. I'm much more focused on the "R" part of the relevance, although we have a whole group of scientists and modelers who are totally devoted to relevance too. My buddies in my group who work on crawling and indexing are focused on comprehensiveness and freshness as well.
YQ: Switching gears for a moment, Tim. What do you do for fun? What kinds of things are you into?
A: I like games. All kinds. Strategy games, pool, 8-ball, 9-ball, billiards. I'm a little worried about Yahoo!'s growth plans, because I think our pool table may not be scaling with the hiring we're doing. Is anyone looking into that? We had a nice one at Inktomi, but I think it's in storage somewhere.
I guess foosball is a Yahoo! game of choice, so I'm trying to catch up on it. It's a little known fact that one of the game's experts, Phu Hoang, is here at Yahoo!, and the game was named after him. I'm also interested in music and I just recently bought a piano. I hadn't played piano in a long long time.
JQ: When you're hiring someone for your team, what are you looking for?
A: We look for pretty senior engineers and like I said before, it takes a lot of types of expertise to make a web search engine. In terms of skills, we're looking for C++ coders, strong problem solvers, and people who understand CS algorithms. Obviously there are particular roles that require some particular expertise like experience in classification and textual analysis.
Right now, we're hiring pretty aggressively.
JQ: When it comes to fighting spam, there's all kinds of software and many people trying to stop spam attempts. With all of us trying to detect this, is there a way to tell the search engines about it?
A: We get a lot of that data on our own. We have a pretty large view and we're approaching the spam problem from a lot of different directions. But nobody should expect to see any sudden change in spam just yet.
Take weblog comment spam, for example. Two things will have to happen for comment spam attempts to decrease; one is that spamming will have to not work for search engines and the second is that comment spammers will have to realize it. (laughs). There could be a long lag there where, even if every search engine totally nailed them, spammers could still operate under the belief that it worked. What we can do from the search engine point of view is make spam not help.
Next week Tim talks about redirects, index-able pages and why he doesn't listen to music while he's programming.
Yvette Irvin
Y! Profiler


Comments
Interesting, listen music don't distract me when i am programming, is the firs person in my life who say that i think :D
Posted by: Carlos | December 2, 2004 03:01 PM
first, person, sorry.
for me yahoo is moving good the pieces, good luck.
nothing more to say.
Thanks for the info :)
Posted by: Carlos | December 2, 2004 03:06 PM
Hello,
It is good to read the emphasis placed on the "R" of the "RCFP" priorities.
I would be curious to hear Tim's take on the value of paid submission to Yahoo's directory since it appears that the vast majority of searching that goes on at Yahoo is in the search engine.
Thanks,
Ben
Posted by: Ben | December 3, 2004 04:27 AM
Interview with Tim Converse,The Yahoo! Search Blog is running an interview with engineer Tim Converse, who discusses the goals of the search team.”The big things for us are “relevance,” “comprehensiveness,” “freshness,” and “presentation.” That’s “RCFP” and it’s kind of our mantra. I’m much more focused on the “R” part of the relevance, although we have a whole group of scientists and modelers who are totally devoted to relevance too. My buddies in my group who work on crawling and indexing are focused on comprehensiveness and freshness as well.”
Posted by: promsale | March 6, 2005 10:38 PM