0
推荐
1156
阅读
爬虫札记2——正则re模块爬取maoyan
import reimport requestsfor i in range(10): url = 'https://maoyan.com/board/4?offset={}'.format(i) content = requests.get(url).text pattern = re.compile('<dl.*?board-wrapper.*?href="https://ask.hellobi.com/(.*?)".*?title="(.*?)".*?movie-item-info.*?star">(.*?)</p>.*?releasetime">(.*?)</p>.*...
1
推荐
1484
阅读
爬虫札记1_requests+xlwt+lxml 爬取maoyan
```
import jsonfrom lxml import etreeimport requestsimport xlwtclass MaoyanSpider: # 通过建立一个类,多个函数 实现
# 代码实现通过lxml和xpath对猫眼电影top100的爬取
# 保存成TXT和Excel表格中
# 初始化url和headers def __init__(self): self.start_url = 'https://maoyan.com/bo...