我不知道代码错在哪里,请老师指导!
0
import scrapy from scrapy.http
import Request,FormRequest
import urllib.request
class DoubanwangSpider(scrapy.Spider):
name = 'doubanwang'
allowed_domains = ['douban.com']
header= ("User-Agent", "Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/59.0")
def start_request(self):
return [Request("http://ocsp.pki.goog/GTSGIAG3",callback=self.parse ,meta={"cookiejar":1})]
def parse(self, response):
captcha=response.xpath('//img[@id="captcha_image"]/@src').extract()
url="http://ocsp.pki.goog/GTSGIAG3"
if len(captcha)>0:
print("存在验证码")
localpath="e:/验证码/captcha.jpg"
urllib.request.urlretrieve(captcha[0],filename=localpath)
print("请查看本地验证码,并输入验证码")
captcha_value=input()
data={
"form_email":"2382633756@qq.com",
"form_password":"doubanzhanghao1",
"captcha-solution": captcha_value,
"redir":"https://www.douban.com/people/177130502" }
else:
print("现在没有验证码!")
data={
"form_email":"2382633756@qq.com",
"form_password":"doubanzhanghao1",
"redir":"https://www.douban.com/people/177130502/" }
print("登录中...")
return [FormRequest.from_response(response,
meta={"cookiejar":response.meta["cookiejar"]},
headers=self.header,
formdata=data,
callback=self.next)]
def next(self,response):
print("登录成功,并爬取了个人中心的信息。")
title=response.xpath("/html/head/title/text()").extract()
note=response.xpath("//div[@class='note']/text()").extract()
print(title[0])
print(note[0])
下面是错误的信息:
2018-04-11 14:25:25 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: douban)
2018-04-11 14:25:25 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'douban', 'NEWSPIDER_MODULE': 'douban.spiders', 'SPIDER_MODULES': ['douban.spiders']}
2018-04-11 14:25:25 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2018-04-11 14:25:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-04-11 14:25:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-04-11 14:25:26 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-04-11 14:25:26 [scrapy.core.engine] INFO: Spider opened
2018-04-11 14:25:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-04-11 14:25:26 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-04-11 14:25:26 [scrapy.core.engine] INFO: Closing spider (finished)
2018-04-11 14:25:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 4, 11, 6, 25, 26, 91503),
'log_count/DEBUG': 1,
'log_count/INFO': 7,
'start_time': datetime.datetime(2018, 4, 11, 6, 25, 26, 75903)}
2018-04-11 14:25:26 [scrapy.core.engine] INFO: Spider closed (finished)
没有找到相关结果
重要提示:提问者不能发表回复,可以通过评论与回答者沟通,沟通后可以通过编辑功能完善问题描述,以便后续其他人能够更容易理解问题.
0 个回复