Task force/China/Public recognition for top websites in China

The Chinese Internet market is different from US, it is helpful that we could have a good understanding for it. So we did a survey on the public recognition for top websites in China.

Survey method

The method we take is relative cheap and easy, just using number of items in Google index to judge the popularity of a specific brand.

The brand we take into account are from Alexa top 60 websites (merged or filtered some websites out, for they are obviously duplicated or biased data by Alexa). The brands are as below: 百度, QQ, 谷歌, 新浪, 淘宝, 网易, 搜狐, 开心网, 优酷, 土豆网, soso, 雅虎, 天涯, 人人网, 搜房, 凤凰网, MSN, 迅雷, 搜狗, 猫扑, 我乐网, 新华网, 阿里巴巴, hao123, tom, 豆瓣, 我要啦, 人民网, 和讯网, 东方财富, 北青网, 天极网, 有道, IT168, VeryCD, CSDN, 51job, 维基百科, 百度百科, 互动百科.

We then send request to Google and limits the query in Tianya, the largest online forum in China. The returned indexed page number will reflect the popularity of a specific brand.

The program to perform the query is as below:

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'cgi'

brands = ['百度','QQ','谷歌','新浪','淘宝','网易','搜狐','开心网','优酷','土豆网','soso', \
'雅虎','天涯','人人网','搜房','凤凰网','MSN','迅雷','搜狗','猫扑','我乐网','新华网','阿里巴巴', \
'hao123','tom','豆瓣','我要啦','人民网','和讯网','东方财富','北青网','天极网','友道', \
'it168 ','verycd','csdn','51job','维基百科','百度百科','互动百科']


puts "=================================="
brands.each do |b|
  query = CGI::escape(b)
  doc = open("http://www.google.com/search?q=site%3Awww.tianya.cn+inurl%3Ahttp%3A%2F%2Fwww.tianya.cn%2Fpublicforum%2F+%22" + query + "%22") { |f| Hpricot(f) }
  result = doc.search("//p[@id='resultStats']")[0].search("b")
  if(result.size > 0)
    puts b + ": " + /(\d|,)+/.match(result[2].to_html)[0]
  else
    puts b + ": 0"
  end
  sleep(15 + rand(10))
end
puts "=================================="

Result

The results which above 10,000 indexed pages are as below (hence Tianya is biased on Tianya itself, I removed the result of Tianya):

Nov 8, 2009

  1. MSN 4620000
  2. QQ 3960000
  3. 百度 457000
  4. 淘宝 324000
  5. 新浪 200000
  6. 搜狐 171000
  7. 网易 104000
  8. 迅雷 85300
  9. 新华网 79000
  10. 土豆(tudou) 65500
  11. tom 61700
  12. 人民网 50300
  13. 谷歌 50100
  14. 猫扑 49300
  15. 阿里巴巴 32800
  16. 豆瓣 30300
  17. 雅虎 29500
  18. 优酷 27400
  19. 凤凰网 19100
  20. 搜狗 18700
  21. 百度百科 18700
  22. verycd 13300
  23. 开心网 13100
  24. 维基百科 11400
  25. 搜房 10800

April 7, 2010

  1. QQ 7490000
  2. 百度 2830000
  3. MSN 2190000
  4. 淘宝 1750000
  5. 谷歌 1520000
  6. 新浪 1140000
  7. 猫扑 795000
  8. 新华网 755000
  9. verycd 764000
  10. 人民网 703000
  11. 网易 499000
  12. 搜房 475000
  13. 阿里巴巴 461000
  14. 搜狐 428000
  15. 凤凰网 410000
  16. 百度百科 365000
  17. Tom 271000
  18. 迅雷 244000
  19. 开心网 221000
  20. 雅虎 148000
  21. 土豆 117000
  22. 豆瓣 74100
  23. 优酷 56900
  24. 和讯网 56600
  25. 搜狗 39700
  26. 维基百科 37700