新手求助urllib库爬取千图网的海报内容为0

0
代码:

url = "http://www.58pic.com/"
header = ("User-Agent", "Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/58.0")
opener = urllib.request.build_opener()
opener.add_handler = [header]
urllib.request.install_opener(opener)
rep = urllib.request.urlopen(url).read().decode('utf-8', "ignore")
pat = 'src="(.*?)"'
img = re.compile(pat).findall(rep)
print(img)

结果:

C:\Users\Tan\PycharmProjects\scrapy_learn\venv\Scripts\python.exe C:/Users/Tan/PycharmProjects/scrapy_learn/com/dangdang.py
['//icon.qiantucdn.com/static/images/header_v1.0/logo1.2.png', 'http://icon.qiantucdn.com/images/icont/icon-billboard.gif', 'http://pic.qiantucdn.com/yang/img/icon4.png', 'http://pic.qiantucdn.com/images/banner/5a90e2774ad2b.jpg', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.58pic.com/static/images/0.gif', 'http://icon.qiantucdn.com/img/searchnew/wechat-g.png', '', '', 'http://icon.qiantucdn.com/img/searchnew/wechat-g.png', 'https://a.gdt.qq.com/pixel?user_action_set_id=1106614899&action_type=PAGE_VIEW&noscript=1', '//icon.qiantucdn.com/static/js/qt-ui_3c90aaf0c8c5fbbe.js', '//icon.qiantucdn.com/static/js/index/addon_1d5867f8b6e1d2b6.js', '//icon.qiantucdn.com/static/js/index/index_bd263a5a971107dd.js']

Process finished with exit code 0

20180225204158.png

如上图所示多数图片都为空
已邀请:
0

一只写程序的猿 - 一个圣骑士成熟的标志是不再向盲人解释阳光。公众号:Python攻城狮 2018-02-27 回答

你获取的rep使用正则匹配的内容就是这个 http://icon.58pic.com/static/images/0.gif 这个链接本身图片就为空
可以把rep打印出来看下  参考获取 data-url 标签属性来获取海报链接

要回复问题请先登录注册