python 3.x - numpy.loadtxt returns string repr of bytestring instead of string -
i'm having trouble reading data file containing mixed strings , floats numpy.loadtxt in python 3. python 2 works fine, want code work in py3.
a simplified example:
import numpy n strings = ['str1', 'str2'] parsed = n.loadtxt(strings, dtype='str') print('result:', parsed)
which, when executed, gives different results py2 , py3.
$> python2 mwe.py result: ['str1' 'str2'] $> python3 mwe.py result: ["b'str1'" "b'str2'"]
python 2 gives strings expected, python 3 gives strings containing string representation of bytestrings.
how can plain strings mess in python3?
loadtxt
has passed input string through asbytes
function before parsing (it reads files bytestrings). how converts unicode buggy.
genfromtxt
appears handle better
in [241]: np.genfromtxt([b'str1', b'str2'], dtype='str') out[241]: array(['str1', 'str2'], dtype='<u4')
but complains if don't give bytestrings:
in [242]: np.genfromtxt(['str1', 'str2'], dtype='str') typeerror: can't convert 'bytes' object str implicitly
loading s4
, converting unicode after option:
in [244]: np.genfromtxt([b'str1', b'str2'], dtype='s4').astype('str') out[244]: array(['str1', 'str2'], dtype='<u4') in [245]: np.loadtxt([b'str1', b'str2'], dtype='s4').astype('str') out[245]: array(['str1', 'str2'], dtype='<u4') in [246]: np.loadtxt(['str1', 'str2'], dtype='s4').astype('str') out[246]: array(['str1', 'str2'], dtype='<u4')
another work around converter
:
in [250]: np.loadtxt(['str1', 'str2'], dtype='str',converters={0:lambda x: x.decode()}) out[250]: array(['str1', 'str2'], dtype='<u4')
Comments
Post a Comment