python如何爬取防爬链接

首先使用
page = urllib2.urlopen(url).read()
爬出来的代码与页面源代码不一样？
百度说是因为，urllib爬取的是页面的初始代码，异步请求动态增加的那部分没有加载并爬取 or 防爬链接。
于是使用模拟浏览器的方式，设置代理、配置请求头，然后最后爬出的代码与urllib2.urlopen(url).read()一样的
请问这是为什么？可以怎么解决？
url = 'http://search.51job.com/jobsearch/search_result.php?fromJs=1&jobarea=000000%2C00&district=000000&funtype=0000&industrytype=00&issuedate=9&providesalary=04%2C05%2C06%2C07&keyword=python&keywordtype=2&curr_page=1&lang=c&stype=2&postchannel=0000&workyear=99&cotype=99°reefrom=04%2C05%2C06&jobterm=99&companysize=03%2C04%2C05%2C06%2C07&lonlat=0%2C0&radius=-1&ord_field=0&list_type=0&dibiaoid=0&confirmdate=9'
req_header = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36',
'Referer':'http://search.51job.com'
}
opener = urllib2.build_opener(urllib2.ProxyHandler({'http':'14.18.252.61:80'}), urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url,headers=req_header)
resp = urllib2.urlopen(req)
html = resp.read()

大神QQ：3052787382 有3D问题找大神准没错！

扫二维码下载贴吧客户端

下载贴吧APP
看高清直播、视频！

1回复贴，共1页

<<返回自动化测试吧

分享到:

日	一	二	三	四	五	六