'utf-8' codec can't decode byte 0xbf in position 5511: invalid start byte (applies to which file?)

Asked 2 years ago, Updated 2 years ago, 77 views

Processing LABEVENTS table:   0%|                                                          | 0/27854056 [00:00<?, ?it/s]Traceback (most recent call last):

  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,


  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)


  File "/mnt/c/dev/mimic3-benchmarks/mimic3benchmark/scripts/extract_subjects.py", line 81, in <module>
    read_events_table_and_break_up_by_subject(args.mimic3_path, table, args.output_path, items_to_keep=items_to_keep,

File "/mnt/c/dev/mimic3-benchmarks/mimic3benchmark/mimic3csv.py", line 177, in read_events_table_and_break_up_by_subject
    for row, row_no, _ in tqdm(read_events_table_by_row(mimic3_path, table), total=nb_rows,

File "/usr/lib/python3/dist-packages/tqdm/_tqdm.py", line 1000, in __iter__
    for obj in iterable:

  File "/mnt/c/dev/mimic3-benchmarks/mimic3benchmark/mimic3csv.py", line 49, in read_events_table_by_row
    for i, row in enumerate(reader):

 File "/usr/lib/python3.8/csv.py", line 111, in __next__
    row = next(self.reader)

 File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 5511: invalid start byte

I searched for the above error and found that you can put euc-kr or cp949 in the encoding factor.

The problem is that I don't know which file to apply it to.

I wrote the file path above and asked where to apply it.

python unicodedecodeerror

2022-09-20 11:11

1 Answers

def read_events_table_by_row(mimic3_path, table):
    nb_rows = {'chartevents': 330712484, 'labevents': 27854056, 'outputevents': 4349219}
    reader = csv.DictReader(open(os.path.join(mimic3_path, table.upper() + '.csv'), 'r'))
    for i, row in enumerate(reader):
        if 'ICUSTAY_ID' not in row:
            row['ICUSTAY_ID'] = ''
        yield row, i, nb_rows[table.lower()]

The part where the csv data is read.

open(os.path.join(mic3_path, table.upper() '.csv', 'r', encoding=...) You can replace the encoding factor here in the open part.


2022-09-20 11:11

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.