I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites). 我在處理從不一樣網頁(在不一樣站點上)獲取的文本中的unicode字符時遇到問題。 I am using BeautifulSoup. 我正在使用BeautifulSoup。 web
The problem is that the error is not always reproducible; 問題是錯誤並不是老是可重現的。 it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError
. 它有時能夠在某些頁面上使用,有時它會經過拋出UnicodeEncodeError
來阻止。 I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error. 我已經嘗試了幾乎全部我能想到的東西,可是沒有發現任何能持續工做而又不會引起某種與Unicode相關的錯誤的東西。 ide
One of the sections of code that is causing problems is shown below: 致使問題的代碼部分之一以下所示: fetch
agent_telno = agent.find('div', 'agent_contact_number') agent_telno = '' if agent_telno is None else agent_telno.contents[0] p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
Here is a stack trace produced on SOME strings when the snippet above is run: 這是運行上述代碼段時在某些字符串上生成的堆棧跟蹤: this
Traceback (most recent call last): File "foobar.py", line 792, in <module> p.agent_info = str(agent_contact + ' ' + agent_telno).strip() UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)
I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded. 我懷疑這是由於某些頁面(或更具體地說,來自某些站點的頁面)可能已編碼,而其餘頁面可能未編碼。 All the sites are based in the UK and provide data meant for UK consumption - so there are no issues relating to internalization or dealing with text written in anything other than English. 全部站點都位於英國,並提供供英國消費的數據-所以,與英語之外的其餘任何形式的內部化或文字處理都沒有問題。 編碼
Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem? 是否有人對如何解決此問題有任何想法,以便我能夠始終如一地解決此問題? idea