Thursday, July 10, 2008

Enough ants may bite an elephant to death

A traditional Chinese idiom says that enough ants may bite an elephant to death (Chinese: 蚁多咬死象). The idiom perfectly describes the strategy beneath Yahoo's newest Web search service---BOSS (Build your Own Search Service). Google is facing the most severe challenge ever in its history.

As we all know, until now Google is likely unbeatable by its dominating power on Web search. Neither Yahoo nor Microsoft, nor even Microhoo, may compete to Google head-to-head. Google is a huge elephant and nobody may even shake it.

Although even a leopard cannot fight an elephant, by raising a huge amount of ants they may bite the elephant to death. Yahoo thinks so, and I agree to it. Google can easily defeat any individual competitor, but Google cannot beat the united force of all competitors. It is the power of collectivity. Brilliant strategy, Yahoo!

On the other hand, however, this action is a double-edged sword. If the ants are so many numbered that they may bite an elephant to death, certainly they can kill a leopard without a question. Therefore, Yahoo search itself might be swapped out of the market even before the decaying of Google search.

A few people may be curious on why Yahoo search might be hurt. Isn't Yahoo that is the service provider of BOSS? If Amazon can make great deal of profit from its public Web service, how could Yahoo's public service eventually hurt itself? To answer this question, we must take a deep look at the difference between Yahoo' service product and Amazon's service product.

Amazon self-produces a large amount of data and makes itself be a huge data center. Amazon Web services allow users accessing and using Amazon's data for varied purposes, including business purposes. To the end, however, Amazon controls the source of data. Hence Amazon have the control over all of its service users.

Now let's turn to Yahoo. Surely Yahoo also produces data, but original data production is the secondary task in Yahoo. Primarily, Yahoo indexes the Web. In contrast to Amazon as a data-resource producer, Yahoo is primarily a link-resource producer. The problem of a link-resource producer is that it actually does not have the power of controlling the linked data. In short, Yahoo knows links, but Yahoo actually does not own the linked data. Because of this reason, any niche search engine may use Yahoo infrastructure to build up a self index of links in its niche domain during the steal mode. After the niche search engine comes to public, it can primarily works on its own index and just using Yahoo infrastructure to be a reference checker. The key point here is that links are normally not the end data users look for.

Does it mean that Yahoo has done a work only hurts others but not benefits itself? Not at all. The key point is that Yahoo must not try to charge the search flow through its opened infrastructure ever after, even if some search flow is for commercial purposes. Be not like Amazon Web services because they are totally different (thought they are similar on surface but completely varied in essence).

Yahoo should make itself be the biggest player over its own free and open search infrastructure in contrast to make itself be the leader of its not totally free and not totally open search infrastructure. Or in other words, Yahoo makes itself to be the largest ant instead of a leopard. If Yahoo can keep on its strategy in this way, Yahoo will have a chance to not only compete to Google, but also defeat Google at the end.

Keep on your great work! Jerry, your passion is truly respectful!

For readers who want to learn more details about BOSS, here are a few resources they can start for looking.

  • Read/WriteWeb: Search War: Yahoo! Opens Its Search Engine to Attack Google With An Army of Verticals
  • Yahoo! Search Blog: BOSS -- The Next Step in our Open Search Ecosystem
  • Between the Lines: Yahoo’s desperate search times call for open source
  • VentureBeat: Yahoo opens up its search platform to third parties, Me.dium takes the plunge
  • GigaOM: Yahoo, Now Offering Search as a Web Service
  • TechCrunch: Yahoo Radically Opens Web Search With BOSS
  • zooie’s blog (a BOSS team insider): Yahoo! Boss - An Insider View


Zemantic dreams said...

You've read just the PR or took a precise look at the API?

No way to get useful data for building your own index from there.

There might be other differences between Amazon and Yahoo, but this is not it.

Andraz Tori, Zemanta

Yihong Ding said...


I have not gotten the chance to look into the deep API yet. At present I just read several introduction of their services on the Yahoo BOSS site.

Probably you are right. My point is, however, that Yahoo's link resources are different from Amazon's data resources. To the end, Yahoo does not own data, but only "links" to data. This difference makes Yahoo's service fundamentally different from Amazon.

It might not be straightforward for users to dig out a hidden niche network. But it must be achievable, or otherwise the Yahoo infrastructure itself must be a joke.

Amazon updates data by itself, so its users cannot survive without continuously hooked to Amazon. Yahoo, however, cannot update data for its linked sites. Once a niche search engine obtains the linked network, it can build more efficient ways to dig into the specific network. So you see, Yahoo does not have the same service controlling power over the users as Amazon does because from the beginning the types of their resources are different.


Zemantic dreams said...

Have you ever written a crawler or aggregator that needs to keep its data in shape over the period of time while sites, urls and content change? This is non-trivial thing.

I don't think Yahoo is canibalizing itself with this.

The real difference between Amazon data services and BOSS is that Amazon is a destination site, where Yahoo BOSS does not bring Yahoo's destinations into the game.

But they demand you to display Yahoo ads alongside search results, so they don't really care anyway.

Quoting the FAQ:

"After the ad infrastructure is ready it will be a requirement to publish Yahoo! Sponsored Search ads as part of search applications that exceed a set QPD (Queries Per Day) level."

Yihong Ding said...


I have the experiences on coding not only just Web crawler but also performing text categorization, Web data extraction, and data mapping over obtained Web pages.

Certainly I know none of these things are trivial. Otherwise, why should these software engineers be paid with high salary?

But there is one thing---non-trivial, but achievable. Certainly, however, if Yahoo is willing to be more open in the future (and I hope it will because more open will do itself better), Yahoo may even make the process be trivial. But this is of course another story.

I am unsure of your meaning of "destination". But it is fair for Yahoo to ask for infrastructure uses put ads in their service if the services are constructed upon BOSS. As long as there are no extra charge (as someone else has suggested Yahoo may charge in the future), this requirement is fair based on the current Web ecosystem.


Zemantic dreams said...

I believe we agree on most of the points then :).

I completely agree that Yahoo offer is fair to both sides! :)

Only point I was disputing was the thesis that this could somehow be seen as Yahoo cannibalizing itself.


Yihong Ding said...

thank you, Andraz. It's my pleasure in discussion. Debates always improve thinking.