python 3.x - numpy.loadtxt returns string repr of bytestring instead of string -


i'm having trouble reading data file containing mixed strings , floats numpy.loadtxt in python 3. python 2 works fine, want code work in py3.

a simplified example:

import numpy n  strings = ['str1', 'str2'] parsed = n.loadtxt(strings, dtype='str') print('result:', parsed) 

which, when executed, gives different results py2 , py3.

$> python2 mwe.py  result: ['str1' 'str2'] $> python3 mwe.py  result: ["b'str1'" "b'str2'"] 

python 2 gives strings expected, python 3 gives strings containing string representation of bytestrings.

how can plain strings mess in python3?

loadtxt has passed input string through asbytes function before parsing (it reads files bytestrings). how converts unicode buggy.

genfromtxt appears handle better

in [241]: np.genfromtxt([b'str1', b'str2'], dtype='str') out[241]:  array(['str1', 'str2'],        dtype='<u4') 

but complains if don't give bytestrings:

in [242]: np.genfromtxt(['str1', 'str2'], dtype='str') typeerror: can't convert 'bytes' object str implicitly 

loading s4 , converting unicode after option:

in [244]: np.genfromtxt([b'str1', b'str2'], dtype='s4').astype('str') out[244]:  array(['str1', 'str2'],        dtype='<u4') in [245]: np.loadtxt([b'str1', b'str2'], dtype='s4').astype('str') out[245]:  array(['str1', 'str2'],        dtype='<u4') in [246]: np.loadtxt(['str1', 'str2'], dtype='s4').astype('str') out[246]:  array(['str1', 'str2'],        dtype='<u4') 

another work around converter:

in [250]: np.loadtxt(['str1', 'str2'], dtype='str',converters={0:lambda x: x.decode()}) out[250]:  array(['str1', 'str2'],        dtype='<u4') 

Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -