Processing LABEVENTS table: 0%| | 0/27854056 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/c/dev/mimic3-benchmarks/mimic3benchmark/scripts/extract_subjects.py", line 81, in <module>
read_events_table_and_break_up_by_subject(args.mimic3_path, table, args.output_path, items_to_keep=items_to_keep,
File "/mnt/c/dev/mimic3-benchmarks/mimic3benchmark/mimic3csv.py", line 177, in read_events_table_and_break_up_by_subject
for row, row_no, _ in tqdm(read_events_table_by_row(mimic3_path, table), total=nb_rows,
File "/usr/lib/python3/dist-packages/tqdm/_tqdm.py", line 1000, in __iter__
for obj in iterable:
File "/mnt/c/dev/mimic3-benchmarks/mimic3benchmark/mimic3csv.py", line 49, in read_events_table_by_row
for i, row in enumerate(reader):
File "/usr/lib/python3.8/csv.py", line 111, in __next__
row = next(self.reader)
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 5511: invalid start byte
I searched for the above error and found that you can put euc-kr or cp949 in the encoding factor.
The problem is that I don't know which file to apply it to.
I wrote the file path above and asked where to apply it.
python unicodedecodeerror
def read_events_table_by_row(mimic3_path, table):
nb_rows = {'chartevents': 330712484, 'labevents': 27854056, 'outputevents': 4349219}
reader = csv.DictReader(open(os.path.join(mimic3_path, table.upper() + '.csv'), 'r'))
for i, row in enumerate(reader):
if 'ICUSTAY_ID' not in row:
row['ICUSTAY_ID'] = ''
yield row, i, nb_rows[table.lower()]
The part where the csv data is read.
open(os.path.join(mic3_path, table.upper() '.csv', 'r', encoding=...)
You can replace the encoding factor here in the open part.
© 2024 OneMinuteCode. All rights reserved.