How to Compare Files with Commands and Extract Mismatched Records

Asked 2 years ago, Updated 2 years ago, 158 views

[Contents]
In the command, compare
Files listing IDs with csv containing IDs
I'd like to extract a record of IDs from csv that are not listed in the "Files Enumerating IDs".

[Example]
Files enumerating IDs (hogehoge_list)

AAAAAA
XXXXXXXX
ZZZZZZZZZZZZZZZZZZZZ

csv (foofoo.csv)

with various information such as ID
AAAAAA, title, URL
BBBBB, title, URL
XXXXXX, title, URL
ZZZZZZ, title, URL

I want to compare "hogehoge_list" and "foofoo.csv" to get the line "foofoo.csv" in "BBBBBB" which is not in "hogehoge_list".

BBBBBB, title, URL

Thank you for your cooperation.

linux bash shellscript shell grep

2022-09-30 16:21

2 Answers

join-t', '-v2 hogehoge_list foofoo.csv

Using the -v option of the join command, only the foofoo.csv lines that were not joined in the first column are printed.Both files must be sorted in the column of interest.

For example, if you include sorting, you will see the following:

#!/bin/sh

# US>Error Immediate Termination, > (Redirect) Prevents Overwrite
set-eC

# At least first column must be sorted because of join command
LC_ALL = C sort hogehoge_list-t', '> delete_ids.txt
LC_ALL=CSort foofoo.csv-t', '>all.csv

LC_ALL = C join-t', '-v2delete_ids.txt all.csv>out.csv


2022-09-30 16:21

If you don't mind AWK, how about the next one-liner?

awk-F, 'FNR==NR {a[$1]++; next}!a[$1]' hogehoge_list foofoo.csv
  • Set the field separator to ", " in -F,
  • When
  • FNR equals NR, that is, while the first file hogehoge_list is being read, the action in {...} takes place.Now the IDs in the hogehoge_list are stored in the associative array
  • While the second file foofoo.csv is being read, the following !a[$1] is processed:That is, the default behavior is print only when it is not included in hogehoge_list.

AWK is the standard for processing multiple files, but just for your information.


2022-09-30 16:21

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.