[Contents]
In the command, compare
Files listing IDs with csv containing IDs
I'd like to extract a record of IDs from csv that are not listed in the "Files Enumerating IDs".
[Example]
Files enumerating IDs (hogehoge_list)
AAAAAA
XXXXXXXX
ZZZZZZZZZZZZZZZZZZZZ
csv (foofoo.csv)
with various information such as IDAAAAAA, title, URL
BBBBB, title, URL
XXXXXX, title, URL
ZZZZZZ, title, URL
I want to compare "hogehoge_list" and "foofoo.csv" to get the line "foofoo.csv" in "BBBBBB" which is not in "hogehoge_list".
BBBBBB, title, URL
Thank you for your cooperation.
linux bash shellscript shell grep
join-t', '-v2 hogehoge_list foofoo.csv
Using the -v
option of the join
command, only the foofoo.csv
lines that were not joined in the first column are printed.Both files must be sorted in the column of interest.
For example, if you include sorting, you will see the following:
#!/bin/sh
# US>Error Immediate Termination, > (Redirect) Prevents Overwrite
set-eC
# At least first column must be sorted because of join command
LC_ALL = C sort hogehoge_list-t', '> delete_ids.txt
LC_ALL=CSort foofoo.csv-t', '>all.csv
LC_ALL = C join-t', '-v2delete_ids.txt all.csv>out.csv
If you don't mind AWK, how about the next one-liner?
awk-F, 'FNR==NR {a[$1]++; next}!a[$1]' hogehoge_list foofoo.csv
-F,
FNR
equals NR
, that is, while the first file hogehoge_list
is being read, the action in {...}
takes place.Now the IDs in the hogehoge_list
are stored in the associative arrayfoofoo.csv
is being read, the following !a[$1]
is processed:That is, the default behavior is print
only when it is not included in hogehoge_list
.AWK is the standard for processing multiple files, but just for your information.
© 2024 OneMinuteCode. All rights reserved.