您现在的位置： Linux教程網 >> UnixLinux > >> Linux編程 >> Linux編程

Python學習之urlib模塊和urllib2模塊學習

一 urlib模塊

利用urllib模塊可以打開任意個url。
1.
urlopen() 打開一個url返回一個文件對象，可以進行類似文件對象的操作。

In [308]: import urllib

In [309]: file=urllib.urlopen('

In [310]: file.readline()

Out[310]: '<!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8

可以用read(),readlines(),fileno(),close()這些函數

In [337]: file.info()

Out[337]: <httplib.HTTPMessage instance at 0x2394a70>

In [338]: file.getcode()

Out[338]: 200

In [339]: file.geturl()

Out[339]: 'http://www.baidu.com/'

2.urlretrieve() 將url對應的html頁面保存為文件

In [404]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')

In [405]: type (filename)

Out[405]: <type 'tuple'>

In [406]: filename[0]

Out[406]: '/tmp/baidu.html'

In [407]: filename

Out[407]: ('/tmp/baidu.html', <httplib.HTTPMessage instance at 0x23ba878>)

In [408]: filename[1]

Out[408]: <httplib.HTTPMessage instance at 0x23ba878>

3.urlcleanup() 清除由urlretrieve()產生的緩存

In [454]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')

In [455]: urllib.urlcleanup()

4.urllib.quote()和urllib.quote_plus() 將url進行編碼

In [483]: urllib.quote('http://www.baidu.com')

Out[483]: 'http%3A//www.baidu.com'

In [484]: urllib.quote_plus('http://www.baidu.com')

Out[484]: 'http%3A%2F%2Fwww.baidu.com'

5.urllib.unquote()和urllib.unquote_plus() 將編碼後的url解碼

In [514]: urllib.unquote('http%3A//www.baidu.com')

Out[514]: 'http://www.baidu.com'

In [515]: urllib.unquote_plus('http%3A%2F%2Fwww.baidu.com')

Out[515]: 'http://www.baidu.com'

6.urllib.urlencode() 將url中的鍵值對以&劃分，可以結合urlopen()實現POST方法和GET方法

In [560]: import urllib

In [561]: params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0})

In [562]: f=urllib.urlopen("http://python.org/query?%s" %params)

In [563]: f.readline()

Out[563]: '<!doctype html>\n'

In [564]: f.readlines()

Out[564]:

['\n',

'\n',

'\n',

'<html class="no-js" lang="en" dir="ltr"> \n',

'\n',

二 urllib2模塊

urllib2比urllib多了些功能，例如提供基本的認證，重定向，cookie等功能

https://docs.python.org/2/library/urllib2.html

https://docs.python.org/2/howto/urllib2.html

In [566]: import urllib2

In [567]: f=urllib2.urlopen('http://www.python.org/')

In [568]: print f.read(100)

--------> print(f.read(100))

<!doctype html>

打開python的官網並返回頭100個字節內容

HTTP基於請求和響應，客戶端發送請求，服務器響應請求。urllib2使用一個Request對象代表發送的請求，調用urlopen()打開Request對象可以返回一個response對象。reponse對象是一個類似文件的對象，可以像文件一樣進行操作

In [630]: import urllib2

In [631]: req=urllib2.Request('http://www.baidu.com')

In [632]: response=urllib2.urlopen(req)

In [633]: the_page=response.read()

In [634]: the_page

Out[634]: '<!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.

通常情況下需要向一個url以POST的方式發送數據。

In [763]: import urllib

In [764]: import urllib2

In [765]: url='http://xxxxxx/login.php'

In [766]: values={'ver' : '1.7.1', 'email' : 'xxxxx', 'password' : 'xxxx', 'mac' : '111111111111'}

In [767]: data=urllib.urlencode(values)

In [768]: req=urllib2.Request(url,data)

In [769]: response=urllib2.urlopen(req)

In [770]: the_page=response.read()

In [771]: the_page

如果不使用urllib2.Request()發送data參數，urllib2使用GET請求，GET請求和POST請求差別在於POST請求常有副作用，POST請求會通過某些方式改變系統的狀態。也可以通過GET請求發送數據。

In [55]: import urllib2

In [56]: import urllib

In [57]: url='http://xxx/login.php'

In [58]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxx'}

In [59]: data=urllib.urlencode(values)

In [60]: full_url=url + '?' + data

In [61]: the_page=urllib2.urlopen(full_url)

In [63]: the_page.read()

Out[63]: '{"result":0,"data":0}'

默認情況下,urllib2使用Python-urllib/2.6 表明浏覽器類型，可以通過增加User-Agent HTTP頭

In [107]: import urllib

In [108]: import urllib2

In [109]: url='http://xxx/login.php'

In [110]: user_agent='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

In [111]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxxx'}

In [112]: headers={'User-Agent' : user_agent}

In [114]: data=urllib.urlencode(values)

In [115]: req=urllib2.Request(url,data,headers)

In [116]: response=urllib2.urlopen(req)

In [117]: the_page=response.read()

In [118]: the_page

當給定的url不能連接時，urlopen()將報URLError異常，當給定的url內容不能訪問時，urlopen()會報HTTPError異常

#/usr/bin/python

from urllib2 import Request,urlopen,URLError,HTTPError

req=Request('http://10.10.41.42/index.html')

try:

response=urlopen(req)

except HTTPError as e:

print 'The server couldn\'t fulfill the request.'

print 'Error code:',e.code

except URLError as e:

print 'We failed to fetch a server.'

print 'Reason:',e.reason

else:

print "Everything is fine"

這裡需要注意的是在寫異常處理時，HTTPError必須要寫在URLError前面

#/usr/bin/python

from urllib2 import Request,urlopen,URLError,HTTPError

req=Request('http://10.10.41.42')

try:

response=urlopen(req)

except URLError as e:

if hasattr(e,'reason'):

print 'We failed to fetch a server.'

print 'Reason:',e.reason

elif hasattr(e,'code'):

print 'The server couldn\'t fulfill the request.'

print 'Error code:',e.code

else:

print "Everything is fine"

hasattr()函數判斷一個對象是否有給定的屬性

《Python開發技術詳解》.( 周偉,宗傑).[高清PDF掃描版+隨書視頻+代碼] http://www.linuxidc.com/Linux/2013-11/92693.htm

Python腳本獲取Linux系統信息 http://www.linuxidc.com/Linux/2013-08/88531.htm

Python下使用MySQLdb模塊 http://www.linuxidc.com/Linux/2012-06/63620.htm

Python 的詳細介紹：請點這裡
Python 的下載地址：請點這裡

上一篇文章： Linux C實現cp功能
下一篇文章： Python學習之socket模塊

Linux編程

Python入門(一)----什麼是python?python及模塊的安裝

Python模塊學習之json

Python time模塊學習

Python學習筆記：關於ftplib模塊

Python學習之logging模塊

Python學習之socket模塊

Python學習之MySQLdb模塊

Python 2.6.6安裝MySQL-python模塊

相關文章

Python入門第六章模塊

Python linecache模塊

Python logging 模塊簡介

用Python的turtle模塊畫國旗

Python之PrettyTable模塊

Python模塊之logging

Python日志模塊logging

關於Python模塊和包

為什麼學習Python及Python環境安裝

Python 之itertools模塊

Python 之getpass模塊

Python 之 paramiko 模塊

Linux編程

SHELL編程

PERL編程