Best language for extracting the same lines for multiple tab-separated files and putting them together in a new folder

Asked 2 years ago, Updated 2 years ago, 93 views

I would like to take a specific line from a tab-separated "multiple" file (hereinafter called dat file) that stores multiple matrix data in the directory, and then arrange the row data in the order in which is taken out to create a new dat file.
Python/Shell Script/C Which one should I use?The dat data is in the Linux directory.

"Best" condition for the methodology you want
Languages running in directories on Linux (easy to deploy language _ex.> python, should I write in python script?), programs are simple, and processing speed is fast

A text file consisting of tabbed decimal matrix data.The first row is an alphabetical or symbolic heading for column data.The data will be hundreds of decimal places under each heading.

#xyz  
1.0 2.1 -5.4 8.2
0.0 23.4 4.4 3.4
... (hereafter hundreds of lines)

Additional note (June 30, 2018)

# [18 spaces] x [19 spaces] y [19 spaces] z
[6 spaces] 1.0 [5 spaces] 2.1 [5 spaces] - 5.4 [5 spaces] 8.2
[6 spaces] 0.0 [5 spaces] 23.4 [5 spaces] 4.4 [5 spaces] 3.4

...(hundreds of lines)

python c shellscript

2022-09-30 16:51

3 Answers

If the dat file is a tab-separated file, the shell script would be good.

Example tab delimited file is org.dat,
The following shell commands allow you to extract and sort columns:
(To extract and sort the third column)

Remove the header from the dat file under
#. (current directory) and put it together into one.
# 3rd row extracted, sorted
# Save results to new.dat
find.-type f-name "*.dat"-exec tail-n+2{}\;|cut-f3|sort-n>new.dat


2022-09-30 16:51

Generally speaking, high productivity and fast processing speed are the trade-offs, so if you want to make it easy, shell scripts are good first.

However, if you have tens of thousands of dat files, or if you have a very large capacity, you may have other requirements, such as "I want to log the process."

Assuming that we can meet unexpected requirements, I think programming languages such as Python are better.

If productivity is important, I can't think of any reason to actively choose C, but if speed and memory efficiency are important, I think it's better to choose C.


2022-09-30 16:51

With Python 3, you can write like this: python/shell script/C, it's just a matter of preference.

 from pathlib import Path
import pandas aspd

dfs=(pd.read_csv(f,delim_whitespace=True)['x'] for fin Path('.').glob('*.dat')))
pd.concat(dfs, ignore_index=True).sort_values().to_csv('all.dat', sep='\t', index=False)


2022-09-30 16:51

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.