過濾訪問日志裡的無效ip和robot
定期更新ip的腳本:
www.2cto.com
#!/bin/sh
#定時更新公司IP, 以用來過濾
#author: Felix Zhang
#date: 2012-12-29
filedir=/opt/logdata/companyip
adate=$(date -d "today" +"%Y%m%d")
filename="${filedir}/ip.${adate}"
ip=`/usr/bin/host yourcompany.3322.org|awk '{print $4}'`
if [ '' != "`grep $ip ${filename}`" ]; then
exit 0
fi
echo "$ip" >> ${filename}
#Set how long you want to save
save_days=30
#delete 30 days ago nginx log files
find ${filedir} -mtime +${save_days} -exec rm -rf {} \;
分析日志的腳本: www.2cto.com
#!/bin/sh
ipdir=/opt/logdata/companyip
adate=$(date -d "today" +"%Y%m%d")
ipfile="${ipdir}/ip.${adate}"
ipreg="127.0.0.1"
if [ -e ${ipfile} ]; then
ipreg=`cat ${ipfile} |sed ':a N;s/\n/|/;ta'`
echo "1"
fi
if [ "${ipreg}" = "" ]; then
ipreg="127.0.0.1"
echo "2"
fi
echo ${ipreg}
#cat ip.test |grep -E -v '127.0.0.1|126.23.23.44'
fileName=$1;
echo '分析文件'$fileName
cat $fileName | egrep -v ${ipreg} |awk '{print $7}'
這樣分析日志時就可以過濾掉自己公司的IP了. 當然根據機器人的特征也可以過濾機器人, 此處不在多說了, 此處僅給出幾個機器人
cat ${logfile} |grep -E -v ${ipreg} |grep -E -v "DNSPod-monitor|bot.htm|spider.htm|webmasters.htm" >${cleanlogfile}