python 网络采集疑问
0
下面代码是爬取内链网址
def getInternalLinks(bsObj, includeUrl):
internalLinks =[]
#Finds all links that begin with a "/"
for link in bsObj.findAll("a", href=re.compile("^(/|.*"+includeUrl+")")):
if link.attrs['href'] is not None:
if link.attrs['href'] not in internalLinks:
internalLinks.append(link.attrs['href'])
return internalLinks
但for link in bsObj.findAll("a", href=re.compile("^(/|.*"+includeUrl+")")):这个语句 正则表达式匹配结果跟includeUrl有关吗? 因为我这边匹配为‘/ua/toLogin.shtml'这种格式,也没有includeURL
def getInternalLinks(bsObj, includeUrl):
internalLinks =[]
#Finds all links that begin with a "/"
for link in bsObj.findAll("a", href=re.compile("^(/|.*"+includeUrl+")")):
if link.attrs['href'] is not None:
if link.attrs['href'] not in internalLinks:
internalLinks.append(link.attrs['href'])
return internalLinks
但for link in bsObj.findAll("a", href=re.compile("^(/|.*"+includeUrl+")")):这个语句 正则表达式匹配结果跟includeUrl有关吗? 因为我这边匹配为‘/ua/toLogin.shtml'这种格式,也没有includeURL
没有找到相关结果
重要提示:提问者不能发表回复,可以通过评论与回答者沟通,沟通后可以通过编辑功能完善问题描述,以便后续其他人能够更容易理解问题.
0 个回复