Task force/China/Public recognition for top websites in China
The Chinese Internet market is different from US, it is helpful that we could have a good understanding for it. So we did a survey on the public recognition for top websites in China.
Survey method
The method we take is relative cheap and easy, just using number of items in Google index to judge the popularity of a specific brand.
The brand we take into account are from Alexa top 60 websites (merged or filtered some websites out, for they are obviously duplicated or biased data by Alexa). The brands are as below: 百度, QQ, 谷歌, 新浪, 淘宝, 网易, 搜狐, 开心网, 优酷, 土豆网, soso, 雅虎, 天涯, 人人网, 搜房, 凤凰网, MSN, 迅雷, 搜狗, 猫扑, 我乐网, 新华网, 阿里巴巴, hao123, tom, 豆瓣, 我要啦, 人民网, 和讯网, 东方财富, 北青网, 天极网, 有道, IT168, VeryCD, CSDN, 51job, 维基百科, 百度百科, 互动百科.
We then send request to Google and limits the query in Tianya, the largest online forum in China. The returned indexed page number will reflect the popularity of a specific brand.
The program to perform the query is as below:
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'cgi'
brands = ['百度','QQ','谷歌','新浪','淘宝','网易','搜狐','开心网','优酷','土豆网','soso', \
'雅虎','天涯','人人网','搜房','凤凰网','MSN','迅雷','搜狗','猫扑','我乐网','新华网','阿里巴巴', \
'hao123','tom','豆瓣','我要啦','人民网','和讯网','东方财富','北青网','天极网','友道', \
'it168 ','verycd','csdn','51job','维基百科','百度百科','互动百科']
puts "=================================="
brands.each do |b|
query = CGI::escape(b)
doc = open("http://www.google.com/search?q=site%3Awww.tianya.cn+inurl%3Ahttp%3A%2F%2Fwww.tianya.cn%2Fpublicforum%2F+%22" + query + "%22") { |f| Hpricot(f) }
result = doc.search("//p[@id='resultStats']")[0].search("b")
if(result.size > 0)
puts b + ": " + /(\d|,)+/.match(result[2].to_html)[0]
else
puts b + ": 0"
end
sleep(15 + rand(10))
end
puts "=================================="
Result
The results which above 10,000 indexed pages are as below (hence Tianya is biased on Tianya itself, I removed the result of Tianya):
Nov 8, 2009
- MSN 4620000
- QQ 3960000
- 百度 457000
- 淘宝 324000
- 新浪 200000
- 搜狐 171000
- 网易 104000
- 迅雷 85300
- 新华网 79000
- 土豆(tudou) 65500
- tom 61700
- 人民网 50300
- 谷歌 50100
- 猫扑 49300
- 阿里巴巴 32800
- 豆瓣 30300
- 雅虎 29500
- 优酷 27400
- 凤凰网 19100
- 搜狗 18700
- 百度百科 18700
- verycd 13300
- 开心网 13100
- 维基百科 11400
- 搜房 10800
April 7, 2010
- QQ 7490000
- 百度 2830000
- MSN 2190000
- 淘宝 1750000
- 谷歌 1520000
- 新浪 1140000
- 猫扑 795000
- 新华网 755000
- verycd 764000
- 人民网 703000
- 网易 499000
- 搜房 475000
- 阿里巴巴 461000
- 搜狐 428000
- 凤凰网 410000
- 百度百科 365000
- Tom 271000
- 迅雷 244000
- 开心网 221000
- 雅虎 148000
- 土豆 117000
- 豆瓣 74100
- 优酷 56900
- 和讯网 56600
- 搜狗 39700
- 维基百科 37700