Python网络爬虫（六）- Scrapy框架

发表: 2017-10-16 浏览: 49989

Python

Python网络爬虫（一）- 入门基础
Python网络爬虫（二）- urllib爬虫案例
Python网络爬虫（三）- 爬虫进阶
Python网络爬虫（四）- XPath
Python网络爬虫（五）- Requests和Beautiful Soup
Python网络爬虫（六）- Scrapy框架
Python网络爬虫（七）- 深度爬虫CrawlSpider
Python网络爬虫（八） - 利用有道词典实现一个简单翻译程序

1.Scrapy

Scrapy介绍
- 纯python开发实现的一个爬虫框架
- 包含爬取数据、提取结构性数据、应用框架
- 底层通过Twisted异步网络框架处理网络通讯
- 可扩展、高性能、多线程、分布式爬虫框架

scrapy体系结构

Scrapy Engine（引擎组件）：

负责Spider、ItemPipeline、Downloader、Scheduler的工作调度、信息通讯、数据传递等工作

Scheduler（调度组件）：

负责接收引擎传递过来的请求，按照具体规则添加队列处理，最终返回给引擎

Downloader（下载组件）：

负责下载引擎传递过来的所有Request请求，最终服务器的响应数据返回给引擎组件

Spider（爬虫）：

处理所有Response响应，分析提取Item数据
如果数据中有二次请求，继续交给引擎组件

ItemPipeline（管道）：

负责[分析、过滤、存储]处理由Spiders获取到的Item数据

Scrapy Engine(Scrapy核心) 负责数据流在各个组件之间的流。Spiders(爬虫)发出Requests请求，经由Scrapy Engine(Scrapy核心) 交给Scheduler(调度器)，Downloader(下载器)Scheduler(调度器) 获得Requests请求，然后根据Requests请求，从网络下载数据。Downloader(下载器)的Responses响应再传递给Spiders进行分析。根据需求提取出Items，交给Item Pipeline进行下载。Spiders和Item Pipeline是需要用户根据响应的需求进行编写的。除此之外，还有两个中间件，Downloaders Mddlewares和Spider Middlewares，这两个中间件为用户提供方面，通过插入自定义代码扩展Scrapy的功能，例如去重等。

常用命令

startproject：创建一个新项目
genspider：根据模板生成一个新爬虫
crawl：执行爬虫
shell：启动交互式抓取控制台

2.安装和配置

我的系统是 Win7，所以这里只详细介绍Windows 平台的安装，首先，你要有Python，我用的是2.7.7版本和3.5的版本共存。

官网文档：http://doc.scrapy.org/en/latest/intro/install.html
中文文档

说点题外话，其实并不是所有的官网文档都很难看懂，每次进入英文的网站，你觉得很难只是你对英文网站反射性的抵触而已，慢慢的读下去，不懂的可以查有道词典，慢慢的你看到一些全是英文网站会发现其实没有想象的那么难了。言归正传，我们简单介绍下ubuntu和mac os下的Scrapy安装

ubuntu安装

apt-get install python-dev python-pip libxml12-dev libxstl1-dev 

    zlig1g-dev libssl-dev

pip install scrapy

mac os安装

官方：建议不要使用自带的python环境

安装：参考官方文档

1.windows安装

在命令窗口输入：

pip install scrapy

安装完毕之后，输入 scrapy

显示如下即安装成功

同时需要安装win32py，提供win32api，下载地址：https://sourceforge.net/projects/pywin32/files/

点击pywin32

点击最新的

找到适合自己的版本，我用的是python2.7

下载完成以后，这是一个exe文件，直接双击安装就可以了。点击下一步。

第二步，你会看到你的python安装目录，如果没有检测到你的python安装目录，八成你现在的pywin32版本是不对的，重新下载。点击下一步

看到这个界面，说明你安装完成

在python中，引入win32com，测试一下，如果没有错误提示，说明安装成功

3.安装过程常见错误

如果是这个错误，这是pip版本的问题,需要更新pip的版本

在命令窗口输入：

pip install -U pip
更新成功

如果出现的错误是ReadTimeout，则是超时的原因，重新安装一遍就行。
其他错误参考网站：python+scrapy安装教程，一步步来一遍看到底是哪一步出错。

4.代码操作 - 创建一个Scrapy项目

流程：

创建一个Scrapy项目；
定义提取的Item；
编写爬取网站的 spider 并提取 Item；
编写 Item Pipeline 来存储提取到的Item(即数据)。

1.爬取智联招聘相关python搜索页数据

分析：
（1）分析智联招聘网址构成；
（2）获取网页结构，找出对应的Xpath；
（3）写入html文档。

分析过程：

通过审查元素找到url访问的真实地址

真实url的地址

分析网页中数据对应的Xpath,

# 当前页面中所有的岗位描述

//div[@id="newlist_list_div"]//table



# 招聘岗位

//div[@id="newlist_list_div"]//table//td[1]//a



# 反馈概率

//div[@id="newlist_list_div"]//table//td[2]//span



# 发布公司

//div[@id="newlist_list_div"]//table//td[3]//a/text()



# 岗位月薪

//div[@id="newlist_list_div"]//table//td[4]/text()

创建第一个Scrapy框架第一个项目
- 在命令窗口输入

scrapy startproject firPro

会创建一个firPro的文件夹，结构如下：

|-- firProl/                        # 项目文件夹

    |-- scrapy.cfg              # 项目发布配置

    |-- spiders/                    # 项目模块存储了实际的爬虫代码

        |-- __init__.py         # 模块描述文件

        |-- items.py                # 定义了待抓取域的模型

        |-- pipelines.py            # 项目pipelines定义文件

        |--settings.py          # 项目全局配置，定义了一些设置，如用户代理、爬取延时等。

        |-- spiders/                # 爬虫模块<开发>

            |-- __init__.py     # 模块描述文件

1.`items.py`中代码

# -*- coding: utf-8 -*-



# Define here the models for your scraped items

#

# See documentation in:

# http://doc.scrapy.org/en/latest/topics/items.html



import scrapy





class FirproItem(scrapy.Item):

   # define the fields for your item here like:

   # name = scrapy.Field()



   #定义保存岗位的名称的字段

   name = scrapy.Field()

   #反馈概率

   percent = scrapy.Field()

   #发布公司

   company = scrapy.Field()

   #岗位月薪

   salary = scrapy.Field()

   #工作地点

   position = scrapy.Field()

2.在spiders创建`fir_spider.py`文件

# -*- coding: utf-8 -*-

import scrapy



#自定义的爬虫程序处理类，要继承scrapy模块的spider类型

class Firspider(scrapy.Spider):

    #定义爬虫程序的名称，用于程序的启动使用

    name = 'firspider'

    #定义爬虫程序运行的作用域--域名

    allow_domains = 'http://sou.zhaopin.com'

    #定义爬虫程序真实爬取url地址的列表/原组

    start_urls = ('http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E4%B8%8A%E6%B5%B7&kw=python&sm=0&p=1&source=0',)



    #定义爬虫获取到的响应数据处理类

    #response就是爬取程序获取的数据

    def parse(self,response):

        with open(u'智联.html','w') as f:

            f.write(response.body)

3.在当前文件夹进入命令窗口

输入命令运行：

#这里运行的名字是fir_spider.py中定义爬虫程序的名称

scrapy crawl firspider

这里爬取到了整个网页的html,我们可以通过Xpath匹配到我们想要的数据

4.保存我们想要的数据

# -*- coding: utf-8 -*-

import scrapy

from firPro.items import FirproItem



#自定义的爬虫程序处理类，要继承scrapy模块的spider类型

class Firspider(scrapy.Spider):

    #定义爬虫程序的名称，用于程序的启动使用

    name = 'firspider'

    #定义爬虫程序运行的作用域--域名

    allow_domains = 'http://sou.zhaopin.com'

    #定义爬虫程序真实爬取url地址的列表/原组

    start_urls = ('http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E4%B8%8A%E6%B5%B7&kw=python&sm=0&p=1&source=0',)



    #定义爬虫获取到的响应数据处理类

    #response就是爬取程序获取的数据

    # def parse(self,response):

    #     with open(u'智联.html','w') as f:

    #         f.write(response.body)





    def parse(self, response):

        print (response.body)

        #获取所匹配的岗位

        job_list= response.xpath('//div[@id="newlist_list_div"]//table')



        #用于存放需要的岗位数据

        job_lists = []



        for job in job_list:

            #创建一个Item对象，用于存放匹配的目标数据

            item = FirproItem()



            #想要显示全，就需要extract()方法，转换成字符串输出

            item["name"] = job.xpath(".//td[1]//a/text()[1]").extract()

            item["percent"] = job.xpath(".//td[2]//span")

            item["company"] = job.xpath(".//td[3]//a/text()")

            item["salary"] = job.xpath(".//td[4]/text()")

            item["position"] = job.xpath(".//td[5]/text()")



            #保存数据

            job_lists.append(item)



            #将数据提交给模块pipelines处理

            yield item

同时settings.py中需伪装请求头

DEFAULT_REQUEST_HEADERS = {

  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

  'Accept-Language': 'en',

  'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36',

}



#把ITEM_PIPELINES的注释取消

ITEM_PIPELINES = {

   'firPro.pipelines.FirproPipeline': 300,

}

settings.py介绍
- ROBOTSTXT_OBEY = True：是否遵守robots.txt
- CONCURRENT_REQUESTS = 16：开启线程数量，默认16
- AUTOTHROTTLE_START_DELAY = 3：开始下载时限速并延迟时间
- AUTOTHROTTLE_MAX_DELAY = 60：高并发请求时最大延迟时间
- BOT_NAME：自动生成的内容,根名字;
- SPIDER_MODULES：自动生成的内容;
- NEWSPIDER_MODULE：自动生成的内容；
- ROBOTSTXT_OBEY：自动生成的内容,是否遵守robots.txt规则，这里选择不遵守；
- ITEM_PIPELINES：定义item的pipeline；
- IMAGES_STORE:图片存储的根路径；
- COOKIES_ENABLED:Cookie使能，这里禁止Cookie;
- DOWNLOAD_DELAY：下载延时，默认为3s。

附：Python yield 使用浅析

这只是简单的爬虫,接下来我们保存我们想要的数据

items.py

# -*- coding: utf-8 -*-



# Define here the models for your scraped items

#

# See documentation in:

# http://doc.scrapy.org/en/latest/topics/items.html



import scrapy





class FirproItem(scrapy.Item):

    # define the fields for your item here like:

    # name = scrapy.Field()



    #定义保存岗位的名称的字段

    name = scrapy.Field()

    #反馈概率

    percent = scrapy.Field()

    #发布公司

    company = scrapy.Field()

    #岗位月薪

    salary = scrapy.Field()

    #工作地点

    position = scrapy.Field()

pipelines.py

# -*- coding: utf-8 -*-



# Define your item pipelines here

#

# Don't forget to add your pipeline to the ITEM_PIPELINES setting

# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html



import json



class FirproPipeline(object):

    def __init__(self):

        self.file=open('zhilian.json','w')



    def process_item(self, item, spider):

        text = json.dumps(dict(item),ensure_ascii=False)

        self.file.write(text.encode('utf-8'))

        print '-----------------'



    def close_spider(self,spider):

        self.file.close()



        #return item

fir_spider.py

# -*- coding: utf-8 -*-

import scrapy

from firPro.items import FirproItem

import re



#自定义的爬虫程序处理类，要继承scrapy模块的spider类型

class Firspider(scrapy.Spider):



    #定义正则匹配，把匹配到的数据进行替换

    reg = re.compile('\s*')

    #定义爬虫程序的名称，用于程序的启动使用

    name = 'firspider'

    #定义爬虫程序运行的作用域--域名

    allow_domains = 'http://sou.zhaopin.com'

    #定义爬虫程序真实爬取url地址的列表/原组

    url = 'http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%E4%B8%8A%E6%B5%B7&kw=python&sm=0&source=0&sg=b8e8fb4080fa47afa69cd683dfbfccf9&p='

    p = 1

    start_urls = [url + str(p)]



    def parse(self, response):

        # print (response.body)

        #获取所匹配的岗位

        job_list= response.xpath('//div[@id="newlist_list_div"]//table')[2:]





        for job in job_list:

            #创建一个Item对象，用于存放匹配的目标数据

            item = FirproItem()

            name =job.xpath(".//tr[1]//td[1]//a")





            # name = self.reg.sub('', job.xpath(".//td[1]//a/text()[1]").extract())



            item["name"] = self.reg.sub('',name.xpath("string(.)").extract()[0])

           

            item["percent"] = job.xpath(".//td[2]//span[1]/text()").extract()

            item["company"] = job.xpath(".//td[3]//a/text()").extract()

            item["salary"] = job.xpath(".//td[4]/text()").extract()

            item["position"] = job.xpath(".//td[5]/text()").extract()

            # 将数据提交给模块pipelines处理

            yield item



        if self.p<=10:

            self.p+=1



        yield scrapy.Request(self.url + str(self.p),callback=self.parse)

同时settings.py中需伪装请求头

DEFAULT_REQUEST_HEADERS = {

  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

  'Accept-Language': 'en',

  'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36',

}



#把ITEM_PIPELINES的注释取消

ITEM_PIPELINES = {

   'firPro.pipelines.FirproPipeline': 300,

}

爬取的zhilian.json数据

2.爬取中华英才网招聘相关python搜索页数据

items.py

# -*- coding: utf-8 -*-



# Define here the models for your scraped items

#

# See documentation in:

# http://doc.scrapy.org/en/latest/topics/items.html



import scrapy





class ZhycItem(scrapy.Item):

    # define the fields for your item here like:

    # name = scrapy.Field()

    # 定义需要封装的字段

    name = scrapy.Field()

    publish = scrapy.Field()

    company = scrapy.Field()

    require = scrapy.Field()

    salary = scrapy.Field()

    desc = scrapy.Field()

pipelines.py

# -*- coding: utf-8 -*-



# Define your item pipelines here

#

# Don't forget to add your pipeline to the ITEM_PIPELINES setting

# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import json



class ZhycPipeline(object):

    def __init__(self):

        self.file = open("zhonghuayingcai.json", "w")



    def process_item(self, item, spider):

        text = json.dumps(dict(item), ensure_ascii=False)

        self.file.write(text.encode("utf-8"))

        print "*****************************************"

        #return item



    def close_spider(self, spider):

        self.file.close()

zhycspider.py

# -*- coding: utf-8 -*-

import scrapy

import re

from zhyc.items import ZhycItem



class ZhycspiderSpider(scrapy.Spider):

    reg = re.compile("\s*")

    name = 'zhycspider'

    allowed_domains = ['www.chinahr.com']



    url = "http://www.chinahr.com/sou/?orderField=relate&keyword=python&city=36,400&page="

    page = 1

    start_urls = [url + str(page)]



    def parse(self, response):

        job_list_xpath = response.xpath('//div[@class="jobList"]')



        for jobitem in job_list_xpath:



            item = ZhycItem()



            name = jobitem.xpath(".//li[1]//span[1]//a")

            item["name"] = self.reg.sub("", name.xpath("string(.)").extract()[0])

           

            item["publish"] = self.reg.sub("", jobitem.xpath(".//li[1]//span[2]/text()").extract()[0])



            item["company"] = self.reg.sub("", jobitem.xpath(".//li[1]//span[3]//a/text()").extract()[0])

            item["require"] = self.reg.sub("", jobitem.xpath(".//li[2]//span[1]//text()").extract()[0])

            item["salary"] = self.reg.sub("", jobitem.xpath(".//li[2]//span[2]//text()").extract()[0])

            desc = jobitem.xpath(".//li[2]//span[3]")

            item["desc"] = self.reg.sub("", desc.xpath("string(.)").extract()[0])



            #print name, publish, company, require, salary, desc

            #job_list.append(item)



            yield item

        

        if self.page <= 10:

            self.page += 1

        

        yield scrapy.Request(self.url + str(self.page), callback=self.parse)

        #return job_list

同时settings.py中需伪装请求头

DEFAULT_REQUEST_HEADERS = {

  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

  'Accept-Language': 'en',

  'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36',

}



#把ITEM_PIPELINES的注释取消

ITEM_PIPELINES = {

   'firPro.pipelines.FirproPipeline': 300,

}

爬取数据文件zhonghuayingcai.json

{

  "salary": "8000-15000",

  "name": "python测试工程师",

  "company": "Fonrich",

  "publish": "今天",

  "require": "[上海市/闵行]应届生/本科",

  "desc": "电子/半导体/集成电路|民营/私企|51－100人"

}{

  "salary": "7000-10000",

  "name": "风险软件工程师(Python方向)",

  "company": "中银消费金融有限公司",

  "publish": "今天",

  "require": "[上海市/黄浦]2年/本科",

  "desc": "证券|民营/私企|101－300人"

}{

  "salary": "8000-15000",

  "name": "Python爬虫开发工程师",

  "company": "维赛特财经",

  "publish": "今天",

  "require": "[上海市/虹口]1年/大专",

  "desc": "计算机软件|民营/私企|101－300人"

}{

  "salary": "8000-16000",

  "name": "python爬虫开发工程师",

  "company": "上海时来",

  "publish": "今天",

  "require": "[上海市/长宁]应届生/大专",

  "desc": "数据服务|民营/私企|21－50人"

}{

  "salary": "3000-6000",

  "name": "Python讲师-上海",

  "company": "伊屋装饰",

  "publish": "8-11",

  "require": "[上海市/黄浦]2年/大专",

  "desc": "移动互联网|民营/私企|20人以下"

}{

  "salary": "6000-8000",

  "name": "python开发工程师",

  "company": "华住酒店管理有限公司",

  "publish": "7-27",

  "require": "[上海市/闵行]应届生/本科",

  "desc": "酒店|外商独资|500人以上"

}{

  "salary": "15000-25000",

  "name": "赴日Python工程师",

  "company": "SunWell",

  "publish": "昨天",

  "require": "[海外/海外/]4年/本科",

  "desc": "人才服务|民营/私企|101－300人"

}

.........

.........

5.Scrapy框架进阶 - 深度爬虫

爬取智联python招聘岗位

items.py

# -*- coding: utf-8 -*-

import scrapy



class ZlItem(scrapy.Item):

    # define the fields for your item here like:

    # name = scrapy.Field()

    #岗位名称

    name = scrapy.Field()

    #反馈率

    percent = scrapy.Field()

    #公司名称

    company = scrapy.Field()

    #职位月薪

    salary = scrapy.Field()

    #工作地点

    position = scrapy.Field()

pipelines.py

# -*- coding: utf-8 -*-



# Define your item pipelines here

#

# Don't forget to add your pipeline to the ITEM_PIPELINES setting

# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html



import json



class ZlPipeline(object):

    def __init__(self):

        self.file = open("sdzp.json", "w")



    def process_item(self, item, spider):

        text = json.dumps(dict(item), ensure_ascii=False)

        self.file.write(text.encode("utf-8"))

        #return item



    def close_spider(self, spider):

        self.file.close()

zlzp.py

# -*- coding: utf-8 -*-

from scrapy.spiders import CrawlSpider,Rule

from scrapy.linkextractors import LinkExtractor

from zl.items import ZlItem



class ZlzpSpider(CrawlSpider):



    name = 'sdzpspider'

    allowed_domains = ['zhaopin.com']

    start_urls = ['http://sou.zhaopin.com/jobs/searchresult.ashx?jl=%e4%b8%8a%e6%b5%b7&kw=python&sm=0&source=0&sg=936e2219abfb4f07a17009a930d54a37&p=1']



    #定义超链接的提取规则

    page_link = LinkExtractor(allow=('&sg=936e2219abfb4f07a17009a930d54a37&p=\d+'))



    #定义爬虫爬取数据的规则

    rules=[

        Rule(page_link,callback='parse_content',follow=True)



    ]



    #定义处理函数

    def parse_content(self, response):

        #获取整个我们需要的数据区域

        job_list = response.xpath('//div[@id="newlist_list_content_table"]//table//tr[1]')





        for job in job_list:

            #定义一个item,用于存放目标数据

            item = ZlItem()

            name = job.xpath(".//td[1]//a")

            if len(name)>0:

                item['name'] = name.xpath('string(.)').extract()[0]





            percent = job.xpath('.//td[2]//span/text()')

            if len(percent)>0:

                item['percent']=percent.extract()[0]



            company = job.xpath(".//td[3]//a[1]/text()")

            if len(company) > 0:

                item["company"] = company.extract()[0]



            salary = job.xpath(".//td[4]/text()")

            if len(salary) > 0:

                item["salary"] = salary.extract()[0]

            position = job.xpath(".//td[5]/text()")

            if len(position) > 0:

                item["position"] = position.extract()[0]



            yield item

爬取结果显示：

{}{

  "salary": "15000-25000",

  "position": "上海",

  "company": "Aon Hewitt 怡安翰威特",

  "name": "Senior Web Developer (Python)"

}{}{}{

  "salary": "20001-30000",

  "position": "上海",

  "company": "上海英方软件股份有限公司",

  "name": "PHP/Python资深研发工程师"

}{

  "salary": "10000-20000",

  "position": "上海",

  "company": "上海英方软件股份有限公司",

  "name": "PHP/Python高级研发工程师："

}{

  "salary": "15000-30000",

  "position": "上海-长宁区",

  "company": "携程计算机技术(上海)有限公司",

  "name": "大数据产品开发"

}{

  "salary": "面议",

  "position": "上海",

  "company": "Michelin China 米其林中国",

  "name": "DevOps Expert"

}{

  "salary": "10001-15000",

  "position": "上海",

  "company": "中兴通讯股份有限公司",

  "name": "高级软件工程师J11015"

}{

  "salary": "10000-20000",

  "position": "上海",

  "company": "上海微创软件股份有限公司",

  "name": "高级系统运维工程师（赴迪卡侬）"

}{

  "salary": "10000-15000",

  "position": "上海-浦东新区",

  "company": "北京尚学堂科技有限公司",

  "name": "Python讲师（Web方向）"

}{}{

  "salary": "30000-50000",

  "position": "上海",

  "company": "上海复星高科技（集团）有限公司",

  "name": "系统架构负责人"

}{

  "salary": "面议",

  "position": "上海-长宁区",

  "company": "美团点评",

  "name": "前端开发工程师"

}{

  "salary": "12000-18000",

  "position": "上海",

  "company": "上海微创软件股份有限公司",

  "name": "Web前端工程师"

}{

  "salary": "10000-13000",

  "position": "上海",

  "company": "上海微创软件股份有限公司",

  "name": "测试工程师（Test Engineer）（赴诺亚财富）"

}{

  "salary": "10000-20000",

  "position": "上海-浦东新区",

  "company": "上海洞识信息科技有限公司",

  "name": "高级python研发人员"

}{

  "salary": "6001-8000",

  "position": "上海-徐汇区",

  "company": "上海域鸣网络科技有限公司",

  "name": "Python软件开发"

}{

  "salary": "15000-25000",

  "position": "上海-浦东新区",

  "company": "中移德电网络科技有限公司",

  "percent": "62%",

  "name": "大数据架构师"

}{

  "salary": "18000-22000",

  "position": "上海-浦东新区",

  "company": "北京中亦安图科技股份有限公司",

  "name": "大数据开发工程师"

}

......

......

0 个评论

要回复文章请先登录或注册

Python网络爬虫（六）- Scrapy框架

目录：

1.Scrapy

scrapy体系结构

常用命令

2.安装和配置

1.windows安装

3.安装过程常见错误

4.代码操作 - 创建一个Scrapy项目

流程：

1.爬取智联招聘相关python搜索页数据

1.items.py中代码

2.在spiders创建fir_spider.py文件

3.在当前文件夹进入命令窗口

4.保存我们想要的数据

这只是简单的爬虫,接下来我们保存我们想要的数据

2.爬取中华英才网招聘相关python搜索页数据

5.Scrapy框架进阶 - 深度爬虫

爬取智联python招聘岗位

0 个评论

1.`items.py`中代码

2.在spiders创建`fir_spider.py`文件