Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecoreError in serializer.to_XML() #29

Open
nemobis opened this issue Feb 15, 2018 · 1 comment
Open

UnicodeDecoreError in serializer.to_XML() #29

nemobis opened this issue Feb 15, 2018 · 1 comment

Comments

@nemobis
Copy link

nemobis commented Feb 15, 2018

I read some data from a CSV and writing it to an empty MARCXMLRecord, then I write it out to XML. Is it normal for a record with Unicode characters like this to fail?

OrderedDict([('040', [{'a': ['IT-MiFBE'], 'ind1': u' ', 'b': ['ita'], 'e': ['reicat'], 'ind2': u' '}]), ('041', [{'a': ['ita'], 'ind1': '0', 'ind2': u' '}]), ('044', [{'a': ['ita'], 'ind1': u' ', 'ind2': u' '}]), ('100', [{'a': ['Medici, Mario'], 'ind2': u' ', 'ind1': '1', 'd': ['1899-1979'], '4': ['aut']}]), ('240', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe']}]), ('245', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe'], 'ind1': '1', 'ind2': '0'}]), ('260', [{'a': ['Venezia'], 'c': ['1934-1935'], 'b': ['Presso la sede del Reale Istituto Veneto'], 'ind1': u' ', 'ind2': u' '}]), ('300', [{'a': ['183-190 p.'], 'ind1': u' ', 'ind2': u' '}]), ('362', [{'a': ['Tomo 94. (1934-1935)'], 'ind1': '0', 'ind2': u' '}]), ('524', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe / Mario Medici. In: Atti. Parte seconda, Scienze matematiche e naturali. - Tomo 94. (1934-1935). - Venezia : Presso la sede del Reale Istituto Veneto, 1934-1935. - 183-190 p.'], 'ind1': '8', 'ind2': u' '}]), ('690', [{'a': ['Atti di accademie italiane'], 'ind1': '0', 'ind2': u' '}, {'a': ['Istituto veneto di scienze, lettere ed arti'], 'ind1': '0', 'ind2': u' '}]), ('773', [{'ind1': '0', 't': ['Atti. Tomo 94., Parte 2., Dispense 1.-4. (1934-1935). Parte seconda, Scienze matematiche e naturali'], 'w': ['ISS-IVSLA100130'], 'ind2': u' '}]), ('856', [{'ind1': '4', 'u': ['http://atena.beic.it/webclient/DeliveryManager?pid=8028533&custom_att_2=simple_viewer&search_terms=DTL23&pds_handle='], 'ind2': u' '}]), ('887', [{'a': ['In: Atti. Parte seconda, Scienze matematiche e naturali. - Tomo 94. (1934-1935). - Venezia : Presso la sede del Reale Istituto Veneto, 1934-1935. - 183-190 p.'], '2': ['local'], 'ind1': u' ', 'ind2': u' '}])])

The exception is:

Traceback (most recent call last):
  File ..., in <module>
    xmlout.write(currentrecord.to_XML())
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 147, in to_XML
    DATA_FIELDS=self._serialize_data_fields().strip()
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 96, in _serialize_data_fields
    CONTENT=self._serialize_data_subfields(dict_field)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 71: ordinal not in range(128)

Reading the Template definition in serializer.py, I wonder if the template string shold be unicode. I found some other people for whom that was the culprit: https://stackoverflow.com/a/6038077/1333493

@nemobis
Copy link
Author

nemobis commented Feb 19, 2018

Minimal (I think) test case:

>>> from marcxml_parser import MARCXMLRecord
>>> r = MARCXMLRecord('<record><controlfield tag="001">test</controlfield></record>')
>>> r.add_data_field( "240", " ", " ", { 'a': u'blà'.encode('utf8')} )
>>> r.to_XML()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 147, in to_XML
    DATA_FIELDS=self._serialize_data_fields().strip()
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 96, in _serialize_data_fields
    CONTENT=self._serialize_data_subfields(dict_field)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 61: ordinal not in range(128)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant