python 3.x - numpy.loadtxt returns string repr of bytestring instead of string -


i'm having trouble reading data file containing mixed strings , floats numpy.loadtxt in python 3. python 2 works fine, want code work in py3.

a simplified example:

import numpy n  strings = ['str1', 'str2'] parsed = n.loadtxt(strings, dtype='str') print('result:', parsed) 

which, when executed, gives different results py2 , py3.

$> python2 mwe.py  result: ['str1' 'str2'] $> python3 mwe.py  result: ["b'str1'" "b'str2'"] 

python 2 gives strings expected, python 3 gives strings containing string representation of bytestrings.

how can plain strings mess in python3?

loadtxt has passed input string through asbytes function before parsing (it reads files bytestrings). how converts unicode buggy.

genfromtxt appears handle better

in [241]: np.genfromtxt([b'str1', b'str2'], dtype='str') out[241]:  array(['str1', 'str2'],        dtype='<u4') 

but complains if don't give bytestrings:

in [242]: np.genfromtxt(['str1', 'str2'], dtype='str') typeerror: can't convert 'bytes' object str implicitly 

loading s4 , converting unicode after option:

in [244]: np.genfromtxt([b'str1', b'str2'], dtype='s4').astype('str') out[244]:  array(['str1', 'str2'],        dtype='<u4') in [245]: np.loadtxt([b'str1', b'str2'], dtype='s4').astype('str') out[245]:  array(['str1', 'str2'],        dtype='<u4') in [246]: np.loadtxt(['str1', 'str2'], dtype='s4').astype('str') out[246]:  array(['str1', 'str2'],        dtype='<u4') 

another work around converter:

in [250]: np.loadtxt(['str1', 'str2'], dtype='str',converters={0:lambda x: x.decode()}) out[250]:  array(['str1', 'str2'],        dtype='<u4') 

Comments

Popular posts from this blog

many to many - Django Rest Framework ManyToMany filter multiple values -

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

Java Entity Manager - JSON reader was expecting a value but found 'db' -