搜索引擎们是不是应该考虑下流量问题?


02/14/06。  cathayan.org版权所有,保留一切权利。转载请保留此说明。谢绝商业转载。

昨晚看本站的Cpanel(虚拟主机管理),本月流量居然超过了10G,我每月流量是15G,现在才过了一半,真是吓我一跳。因为以前整个月一般只有6-7GB,上次Virushuo说这事的时候我还不以为然。

Cpanel里的Analog统计如下:

#reqs   %bytes  organization
2818691 24.36% 61.135
1954294 11.50% 202.160
759987 7.48% 220.181

看了下原始Log文件,61.135开头的是BaiduSpider+,大体有两个IP(61.135.145.204, 202.108.250.196),后一个IP也吃掉了2.2%的流量。202.160开头的是Yahoo,Yahoo看来有很多IP,不过这个段对这里最厉害,大体是202.160.180.132/37/70/63等等,此外还有72.33.177., 68.142.249/250., 66.196.90/91.等等。220.181是新兴起的,叫sogou spider,以前是不是叫sohu agent?准确IP只观察到一个:220.181.19.95。

听说写robots.txt应该起作用,但又听说也许得2-4周的时候才发生效用,人家有技术难题,也没有办法。幸好Cpanel也提供一个IP Deny的功能,希望它发生效用吧,我已经把上述IP全部加入了,理论上应该能节约43.34%的流量。

求助一下,这3家的agent似乎是这样,robots.txt该怎么写呢?尤其是Yahoo!那个,总不能把Mozilla写上吧,似乎是写Slurp?。

“Baiduspider+(+http://www.baidu.com/search/spider.htm)”

“Mozilla/5.0 (compatible; Yahoo! Slurp China;http://misc.yahoo.com.cn/help.html)”

“Mozilla/5.0 (compatible; Yahoo! Slurp;http://help.yahoo.com/help/us/ysearch/slurp)”

“sogou spider”

02/14/06 10:32:33,由cathayan发表。目录:电脑

Comments

14条评论

cheeky:

????, ???robots.txt????????, ????*?????????, ?????????. ???????????robots???????

于 02/14/06 11:04:12 发表  

cathayan:

?robots.txt????????????????????????????????????????????IP??????????IP??IP????????????????????

????????????????????????????????????????????

于 02/14/06 11:14:23 发表  

qyt:

??????????????
??blog???Google???blog???????

于 02/14/06 12:14:54 发表  http://qiuyingtao.blogchina.com

??:

?????????????????? ?????

http://www.chedong.com/cgi-...

????? : ?????? ??? 3:2???

??

于 02/14/06 12:22:37 发表  http://www.chedong.com/blog/

cathayan:

????Cpanel???????????????????

????????????50?????????????????

于 02/14/06 12:42:49 发表  

windtear:

??????????
?????????
?? ?????

????????
cft

于 02/14/06 15:51:18 发表  http://windtear.net/

cathayan:

???????

??Log???Baiduspider+?403??? :P

于 02/14/06 16:16:54 发表  

wanderor:

????????Baidu???MSN Space?????

于 02/14/06 17:12:08 发表  http://spaces.msn.com/members/richardfang/

digg china:

?IP???????????????.htaccess?????.htaccess??????????????????yahoo?

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [OR]
RewriteCond %{HTTP_USER_AGENT} ^sogou spider [OR]
RewriteRule ^.* - [F]

于 02/14/06 17:45:22 发表  http://www.livedigg.com/

digg china:

Yahoo ????? robot ?? .htaccess ????? Mozilla/5.0 ??? USer Agent, firefox ???????????? Slurp ??Yahoo ????????

于 02/14/06 17:49:36 发表  http://www.livedigg.com/

??:

??????????????.htaccess?????
order allow,deny
deny from 222.181.89.109
deny from 61.135.145.219
deny from xd-22-85-a8.bta.net.cn
deny from agava.net
deny from 222.181.86.168
deny from tpiol.tpiol.com
deny from xd-23-81-a8.bta.net.cn
allow from all
?????baidu?ip?????????????????
?????????????

于 02/15/06 05:09:51 发表  http://yanfeng.org/blog

shunz:

Googlebot 43145+41 304.17 M?? 2006? ?? 16? 21:44
BaiDuSpider 34386+14 1.25 G?? 2006? ?? 16? 22:18
Yahoo Slurp 25040+1450 263.37 M?? 2006? ?? 16? 22:20
Unknown robot (identified by 'spider') 11818+2 234.42 M?? 2006? ?? 16? 22:17
larbin 11493+257 211.97 M?? 2006? ?? 16? 20:01
Google AdSense 8692+16 111.52 M?? 2006? ?? 16? 22:15
MSNBot 6806+235 226.19 M?? 2006? ?? 16? 22:19

????blog?2?????yahoo????????????baidu??????????????????????

于 02/16/06 22:34:08 发表  http://www.shunz.net/

cathayan:

???Slurp China????Slurp????

于 02/16/06 22:44:00 发表  

????:

??.htaccess???????????????????????
???????????agent
于 08/24/07 18:09:48 发表  http://www.83blog.com

Add Comments

This item is closed, it's not possible to add new comments to it or to vote on it

TrackBack