||
一、下载模块
python2中的urllib2,urlparse等五个模块都并入了urllib中
如:
(1) urlparse.urlparse(url) 转变为:urllib.parse.urlparse(url)
(2) urllib2.Request(url,headers=headers)转变为:urllib.request.Request(url,headers=headers)
(3)urllib2.urlopen(url).read() 转变为:urllib.request.urlopen(url).read()
(4)urllib2.URLError 转变为:urllib.error.URLError
(5)urllib.urlencode() 转变为:urllib.parse.urlencode()
异常问题:
python2.x 中为 except Exception, e: print (e)
python3.x 中为 except Exception as e: print (e)
二、编码问题
参考: https://www.cnblogs.com/chownjy/p/6625299.html
python3版本中严格区分了str
(utf-8)和bytes
两种类型。
错误:
(1)UnicodeEncodeError: 'gbk' codec can't encode character '\u200b' in position 0: illegal multibyte sequence
(2) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 265: invalid start byte
参考:http://blog.csdn.net/qingyuanluofeng/article/details/45190867
三、包安装
(1)beautifulsoup: conda install BeautifulSoup4
参考:http://blog.csdn.net/fu_shuwu/article/details/53164561
(2)网页正文内容提取模块readability:pip install readability-lxml
参考:https://seofangfa.com/python-note/content-keywords.html
(3)html2text: pip install html2text
四、解析xml
参考:https://www.cnblogs.com/CheeseZH/p/4026686.html
五、邮件推送
参考:http://blog.csdn.net/bmxwm/article/details/79007871
django发送邮件:http://blog.csdn.net/ypq5566/article/details/24293147
http://blog.csdn.net/tanggangzuiniu12/article/details/79015409
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-10-19 22:34
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社