How to Compare Files with Commands and Extract Mismatched Records

[Contents]
In the command, compare
Files listing IDs with csv containing IDs
I'd like to extract a record of IDs from csv that are not listed in the "Files Enumerating IDs".

[Example]
Files enumerating IDs (hogehoge_list)

AAAAAA
XXXXXXXX
ZZZZZZZZZZZZZZZZZZZZ

csv (foofoo.csv)

with various information such as ID

AAAAAA, title, URL
BBBBB, title, URL
XXXXXX, title, URL
ZZZZZZ, title, URL

I want to compare "hogehoge_list" and "foofoo.csv" to get the line "foofoo.csv" in "BBBBBB" which is not in "hogehoge_list".

BBBBBB, title, URL

Thank you for your cooperation.

linux bash shellscript shell grep

2022-09-30 16:21

2 Answers

join-t', '-v2 hogehoge_list foofoo.csv

Using the -v option of the join command, only the foofoo.csv lines that were not joined in the first column are printed.Both files must be sorted in the column of interest.

For example, if you include sorting, you will see the following:

#!/bin/sh

# US>Error Immediate Termination, > (Redirect) Prevents Overwrite
set-eC

# At least first column must be sorted because of join command
LC_ALL = C sort hogehoge_list-t', '> delete_ids.txt
LC_ALL=CSort foofoo.csv-t', '>all.csv

LC_ALL = C join-t', '-v2delete_ids.txt all.csv>out.csv

2022-09-30 16:21

If you don't mind AWK, how about the next one-liner?

awk-F, 'FNR==NR {a[$1]++; next}!a[$1]' hogehoge_list foofoo.csv

Set the field separator to ", " in -F,
FNR equals NR, that is, while the first file hogehoge_list is being read, the action in {...} takes place.Now the IDs in the hogehoge_list are stored in the associative array
While the second file foofoo.csv is being read, the following !a[$1] is processed:That is, the default behavior is print only when it is not included in hogehoge_list.

AWK is the standard for processing multiple files, but just for your information.

2022-09-30 16:21

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656