Unable to load CSV file on R

Asked 2 years ago, Updated 2 years ago, 88 views

An error occurs when R tries to load a CSV file.
Specifically

>data<-read.csv("sample1.csv", fileEncoding="utf-8")

when run on the
 warning message: 
in 1:read.table(file=file, header=header, sep=sep, quote=quote, and so on: 
   Invalid input in input connection 'sample1.csv' 
in 2:read.table(file=file, header=header, sep=sep, quote=quote, and so on: 
  incomplete final line found by readTableHeader on 'sample1.csv'

appears.The csv file itself is

1,1

I tried to make it very simple, but it didn't work (I saved it as csv from Excel).

r csv

2022-09-30 16:43

2 Answers

I think the reason for the message is that it contains BOM, that there is no new line in the last line, and that there is no headline line.Why don't you add a new line to the last line of the CSV file and do the following?

data<-read.csv("sample1.csv", fileEncoding="UTF-8-BOM", header=F)


2022-09-30 16:43

I expect it to be a .csv file created in Japanese version of Windows.
The solution will vary depending on the environment and the file, so you have to look at the actual file.
R alone is often not available.

What I've been recommending to many people lately is that when data including Japanese arrives,
It's always a way to run a program that successfully converts any character code into UTF-8

.

Install the nkf program first.
If it's ubuntu,

 sudo apt install nkf

Once you can use nkf, start with R

system(glue("find{conv2utf8_dir}-name'*.tsv'-print0|xargs-0nkf-Lu-w --overwrite"))

and .(I am using the blue package. Replace conv2utf8_dir with the directory path where the files to be processed)

Above
·Convert all character codes to UTF-8
·Convert all line feed codes to Unix standard (LF)
·Saved over the original file

is performed.The read_delim function of the readr package allows you to open it with zero additional options.

The following is an explanation.
Files that are stored in the global character code UTF-8 [other than] will be a major technical liability in the long run.
(Most of the various cloud services are not supported.I don't have any plans to deal with this in the future.)

The above one liner will solve everything, so
We strongly recommend that you convert your data to "when it comes in" and never leave a file with the "sjis/euc" character code.

(If you relax a little bit, the contamination of the character code is more persistent than the cockroach...Let's deal with it steadily at the entrance!)


2022-09-30 16:43

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.