今天使用bs4 写爬虫(爬取baidu搜索url),遇到Unicode编码无法转正常字体,经过反复排查,发现python2.x版本中的print方法打印书写格式问题:
def get_maximum_pages(soup_content,link_re): div=soup_content.find('div',id='wrapper').find('div',id='wrapper_wrapper').find('div',id='container').find('div',id='page') a_list = div.find_all('a') next_page = a_list[len(a_list)-1] next_text = next_page.get_text()print("next_text:",next_page_text) //此处无论怎么调试,都一直显示('next_text:', u'\u4e0b\u4e00\u9875>')
1、python2.x 中正确写法:
print "next_text:",next_page_text
或者
print("next_text:"+next_page_text )
错误格式:print("next_text:",next_page_text)