writing csv file with umlauts causing "unicodeencodeerror: 'ascii' codec can't encode character"

您所在的位置:网站首页 王者荣耀转区是什么意思 writing csv file with umlauts causing "unicodeencodeerror: 'ascii' codec can't encode character"

writing csv file with umlauts causing "unicodeencodeerror: 'ascii' codec can't encode character"

2023-02-28 01:13| 来源: 网络整理| 查看: 265

How can I write data with umlaut characters to a CSV file in Python 3? This solution works with python3 on my OSX but gives the error with python2.7. The error is in the line when data is set so before the call to open and the specification of the encoding. – terence hill Feb 6, 2017 at 12:01 Why are commonly compiled languages not interpreted for faster iteration?

Add a parameter encoding to the open() and set it to 'utf8'.

import csv data = "ääÖ" with open("test.csv", 'w', encoding = 'utf8') as fp: a = csv.writer(fp, delimiter = ";") a.writerows(data)

This solution should work on both python2 and 3 (not needed in python3):

#!/usr/bin/env python # - * -coding: utf - 8 - * - import csv data = "ääÖ" with open("test.csv", "w") as fp: a = csv.writer(fp, delimiter = ";") a.writerows(data)Suggestion : 2

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xdc’ in position 2: ordinal not in range(128) The .csv file might not be encoded as utf-8. I don’t understand why an ascii codec is at work here, after all I’m using a utf-8 encoded input file, and I’m asking PsychoPy to use utf-8 encoding when reading the file. If you are using Windows, try opening the .csv file in Notepad and save (as a copy of) it with encoding UTF-8.

My input file is this one: data.csv (82 Bytes)

And it looks something like this:

1 Auffahrunfall 1 2 Überholen 1 3 Balkon 1 4 Traktor 1

My current attempt at reading in this file is the following:

#!/usr/bin/env python # - * -coding: utf - 8 - * - import codecs f = codecs.open('data.csv', encoding = 'utf-8') for line in f: print str(line)

However, the program gets only as far as outputting

1, Auffahrunfall, 1

Moreover, I can’t seem to find a solution to the problem. I thought perhaps adding

errors = 'ignore'

Sorry to butt in, but I think the problem is actually caused by calling str() .

Python 2 was created with some design mistakes in how they treat strings and unicode characters. If I’m remembering correctly, a string in python 2 is in fact a byte sequence in disguise. A “unicode” object in python 2 is actually the object you need to deal with when using fun characters, not a string. if you were to change:

str(line)

to:

unicode(line)Suggestion : 3

For a current project I need to migrate large volumes of CSV data into a relational database management system. The Python driven csvkit is the swiss army knife of CSV tools and very handy for this purpose. However, a few CSV files cause troubles when I tried to pipe the SQL CREATE statements I created with the csvkit tool to a file. Although the input file is already in UTF-8, the file is written in ASCII, which causes an error if Umlauts are included.Adding the following code after the import commands at the top of the csvsql.py file sets the correct encoding for the output file in this Python script. csvsql -i sqlite -d ‘;’ -e ‘utf8’ –db-schema test_schema –table test_table inputfile.csv > output.sql Traceback (most recent call last): File “/usr/local/bin/csvsql”, line 9, in load_entry_point(‘csvkit==0.9.1’, ‘console_scripts’, ‘csvsql’)() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 161, in launch_new_instance utility.main() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 134, in main self.output_file.write('%s\n’ % sql.make_create_table_statement(sql_table, dialect=self.args.dialect)) UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\ufeff’ in position 46: ordinal not in range(128)```

csvsql - i sqlite - d ';' - e 'utf8'--db - schema test_schema--table test_table inputfile.csv > output.sql UnicodeEncodeError: 'ascii' codec can 't encode character u' \ufeff ' in position 46: ordinal not in range(128)``` Adding the & #8211;verbose flag gives more clarity reload(sys) sys.setdefaultencoding('utf-8')Suggestion : 4

Writing CSV file with umlauts causing "UnicodeEncodeError: 'ascii' codec can't encode character" Writing out results from python to csv file [UnicodeEncodeError: 'charmap' codec can't encode character Python 3 CSV file giving UnicodeDecodeError: 'utf-8' codec can't decode byte error when I print Python3 csv writer failing, exiting on error "TypeError: 'newline' is an invalid keyword argument for this function

import re import csv #import sys with open('input.csv', 'r', encoding = 'UTF-8') as fi, open('output_data.csv', 'w', encoding = 'UTF-8') as fo: reader = csv.reader(fi, delimiter = ';') #for row in csv.reader(fi, delimiter = ';'): DESCRIPT1 = [] ID1 = [] ASSIGNMENT_NAME1 = [] TER1 = [] INFO1 = [] for i, row in enumerate(reader): DESCRIPT1.append(row[0]) ID1.append(row[1]) ASSIGNMENT_NAME1.append(row[2]) TER1.append(row[3]) INFO1.append(row[4]) row[4] = re.sub() fo.write(';'.join(row + [\n]))Suggestion : 5

Writing CSV file with umlauts causing "UnicodeEncodeError: 'ascii' codec can't encode character" Writing out results from python to csv file [UnicodeEncodeError: 'charmap' codec can't encode character Python 3 CSV file giving UnicodeDecodeError: 'utf-8' codec can't decode byte error when I print Python 3 str values need to be encoded as bytes when written to disk. If no encoding is specified for the file, Python will use the platform default. In this case, the default encoding is unable to encode '\u0389', and so raises a UnicodeEncodeError.

I've been trying to write a script that would potentially scrape the list of usernames off the comments section on a defined YouTube video and paste those usernames onto a .csv file.

Here's the script :

from selenium import webdriver import time import csv from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup as soup driver = webdriver.Chrome() driver.get('https://www.youtube.com/watch?v=VIDEOURL') time.sleep(5) driver.execute_script("window.scrollTo(0, 500)") time.sleep(3) html = driver.find_element_by_tag_name('html') html.send_keys(Keys.END) time.sleep(5) scroll_time = 40 for num in range(0, scroll_time): html.send_keys(Keys.PAGE_DOWN) for elem in driver.find_elements_by_xpath('//span[@class="style-scope ytd-comment-renderer"]'): print(elem.text) with open('usernames.csv', 'w') as f: p = csv.writer(f) p.writerows(str(elem.text));

It keeps throwing out the error for line 19 :

return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u30b9' in position 0: character maps to

Python 3 str values need to be encoded as bytes when written to disk. If no encoding is specified for the file, Python will use the platform default. In this case, the default encoding is unable to encode '\u0389', and so raises a UnicodeEncodeError.

The solution is to specify the encoding as UTF-8 when opening the file:

with open('usernames.csv', 'w', encoding = 'utf-8') as f: p = csv.writer(f) ...

Since UTF-8 isn't your platform's default encoding, you'll need to specify the encoding when opening the file as well, in Python code or in applications like Excel.

Windows supports a modified version of UTF-8, named "utf-8-sig" in Python. This encoding inserts three characters at the start of a file to identify the file's encoding to Windows applications which might otherwise attempt to decode using an 8-bit encoding. If the file will be used exclusively on Windows machines then it may be worth using this encoding instead.

with open('usernames.csv', 'w', encoding = 'utf-8-sig') as f: p = csv.writer(f) ...Suggestion : 6

That's why the code written to work with "unicode" is most often incorrect. Its authors simply slather "unicode" stuff without thinking for a second about encodings, combining characters or locales and hope that it works. $ LANG= python2 -c 'print(u"euro sign: \u20ac")' ... UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 11: ordinal not in range(128) $ LANG= python3.7 -c 'print(u"euro sign: \u20ac")' # enjoy PEP 538+PEP 540 euro sign: € > It doesn't matter whether the developer is using Python 2.7, Python 3 or another language: their code must be tested to ensure that it actually works. If it is not tested then the business has fucked up and this needs to be rectified. This is a complete non-answer. Python 3 makes it easy to write code that is mostly correct under assumption that it is ALWAYS correct.

$ python3 - c 'print("ℙƴ☂ℌøἤ")' ℙƴ☂ ℌøἤ

Perhaps I'm missing something, but this does not appear to be the case:

$ echo "Is this correct?" | hd 00000000 49 73 20 74 68 69 73 20 63 6 f 72 72 65 63 74 3 f | Is this correct ? | 00000010 0 a | . | 00000011$ python3 Python 3.4 .3( default, Nov 28 2017, 16: 41: 13)[GCC 4.8 .4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> len('Length') 6 >>> len(' Len ') 5 >>> len('👦🏼👦🏼👦🏼') 6

Posted Dec 20, 2017 17:17 UTC (Wed) by brouhaha (subscriber, #1698) [Link]

#!/usr/bin/env python3 import sys def is_valid_unicode(b): try: s = b.decode('utf-8') except: return False return True b = bytes([int(x, 16) for x in sys.argv[1: ]]) print(is_valid_unicode(b))

Posted Dec 20, 2017 17:17 UTC (Wed) by brouhaha (subscriber, #1698) [Link]

$. / validutf8.py ce bc e0 b8 99 f0 90 8 e b7 e2 a1 8 d 0 a True $. / validutf8.py 2 d 66 5 b 1 a f7 53 e3 f6 fd 47 a2 07 fc False


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3