UnicodeDecoreError in serializer.to_XML() #29

nemobis · 2018-02-15T11:08:25Z

I read some data from a CSV and writing it to an empty MARCXMLRecord, then I write it out to XML. Is it normal for a record with Unicode characters like this to fail?

OrderedDict([('040', [{'a': ['IT-MiFBE'], 'ind1': u' ', 'b': ['ita'], 'e': ['reicat'], 'ind2': u' '}]), ('041', [{'a': ['ita'], 'ind1': '0', 'ind2': u' '}]), ('044', [{'a': ['ita'], 'ind1': u' ', 'ind2': u' '}]), ('100', [{'a': ['Medici, Mario'], 'ind2': u' ', 'ind1': '1', 'd': ['1899-1979'], '4': ['aut']}]), ('240', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe']}]), ('245', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe'], 'ind1': '1', 'ind2': '0'}]), ('260', [{'a': ['Venezia'], 'c': ['1934-1935'], 'b': ['Presso la sede del Reale Istituto Veneto'], 'ind1': u' ', 'ind2': u' '}]), ('300', [{'a': ['183-190 p.'], 'ind1': u' ', 'ind2': u' '}]), ('362', [{'a': ['Tomo 94. (1934-1935)'], 'ind1': '0', 'ind2': u' '}]), ('524', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe / Mario Medici. In: Atti. Parte seconda, Scienze matematiche e naturali. - Tomo 94. (1934-1935). - Venezia : Presso la sede del Reale Istituto Veneto, 1934-1935. - 183-190 p.'], 'ind1': '8', 'ind2': u' '}]), ('690', [{'a': ['Atti di accademie italiane'], 'ind1': '0', 'ind2': u' '}, {'a': ['Istituto veneto di scienze, lettere ed arti'], 'ind1': '0', 'ind2': u' '}]), ('773', [{'ind1': '0', 't': ['Atti. Tomo 94., Parte 2., Dispense 1.-4. (1934-1935). Parte seconda, Scienze matematiche e naturali'], 'w': ['ISS-IVSLA100130'], 'ind2': u' '}]), ('856', [{'ind1': '4', 'u': ['http://atena.beic.it/webclient/DeliveryManager?pid=8028533&custom_att_2=simple_viewer&search_terms=DTL23&pds_handle='], 'ind2': u' '}]), ('887', [{'a': ['In: Atti. Parte seconda, Scienze matematiche e naturali. - Tomo 94. (1934-1935). - Venezia : Presso la sede del Reale Istituto Veneto, 1934-1935. - 183-190 p.'], '2': ['local'], 'ind1': u' ', 'ind2': u' '}])])

The exception is:

Traceback (most recent call last):
  File ..., in <module>
    xmlout.write(currentrecord.to_XML())
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 147, in to_XML
    DATA_FIELDS=self._serialize_data_fields().strip()
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 96, in _serialize_data_fields
    CONTENT=self._serialize_data_subfields(dict_field)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 71: ordinal not in range(128)

Reading the Template definition in serializer.py, I wonder if the template string shold be unicode. I found some other people for whom that was the culprit: https://stackoverflow.com/a/6038077/1333493

The text was updated successfully, but these errors were encountered:

nemobis · 2018-02-19T15:37:57Z

Minimal (I think) test case:

>>> from marcxml_parser import MARCXMLRecord
>>> r = MARCXMLRecord('<record><controlfield tag="001">test</controlfield></record>')
>>> r.add_data_field( "240", " ", " ", { 'a': u'blà'.encode('utf8')} )
>>> r.to_XML()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 147, in to_XML
    DATA_FIELDS=self._serialize_data_fields().strip()
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 96, in _serialize_data_fields
    CONTENT=self._serialize_data_subfields(dict_field)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 61: ordinal not in range(128)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecoreError in serializer.to_XML() #29

UnicodeDecoreError in serializer.to_XML() #29

nemobis commented Feb 15, 2018

nemobis commented Feb 19, 2018

UnicodeDecoreError in serializer.to_XML() #29

UnicodeDecoreError in serializer.to_XML() #29

Comments

nemobis commented Feb 15, 2018

nemobis commented Feb 19, 2018