If you can use Gawk 4.0 or later, it seems that you can handle it with gawk's FPAT. However, it is not supported in cases where line breaks or double quotes are included.
$gawk-v OFS=, -vFPAT='([^,])*)|("[^"]+")'{for(i=1;i<=NF;i++){if($i~/^[\t]*"/){gsub(", "", "", ", $i)}}};print}'
Defining Fields by Content and metropolis' comments to support 0 character fields, so ([^,]+)
. ([^,] where
is located.*) I chose
If you can only use it before Gawk-4, I think C is easier to write than awk.
This also supports line breaks and double quotes.
#include<stdio.h>
static int parse_element_quoted()
{
intc;
while((c=getchar())!=EOF){
switch(c){
case '' '':
c=getchar();
if(c=='"){
// Two consecutive ""s are when there is one "" in the original data.
// putchar(''''); this line is not required when converting from '''' to ''''
putchar('"');
break;
}
else{
Putchar('''); // No need to take ''' (A)
putchar(c);
return 0;
}
case '\r':
case '\n':
// case '\\':
case', ':
// ignore(remove)them
break;
default:
putchar(c);
}
}
return1;
}
static int parse_element()
{
intc=getchar();
if(c==EOF)
return1;
else if(c=='"'){
Putchar('''); // No need to take ''' (B)
return parse_element_quoted();
}
else{
putchar(c);
if(c=='\n'||c==',')
return 0;
while((c=getchar())!=EOF){
switch(c){
case', ':
case '\n':
putchar(c);
return 0;
case '\r':
break;
default:
putchar(c);
}
}
return1;
}
}
int main()
{
while(parse_element()==0);
return 0;
}
Personally, I think it would be easier to delete "," and "" together.
If so, comment out the lines (A) and (B) of the code above.
If it's sed, it's like this.However, it is assumed that the original string does not contain tab characters.
sed'-es/"\"([^"]*\)"/\tA\1\tB/g;:loop;s/\(\tA[^\t]*\), /\1/g;loop;s/\t./"/g'
Use tab characters for the barrier and replace quotation marks with \tA,\tB
.
Remove commas in strings starting with \tA
one character at a time in the loop.
When you run out of \tA,\tB
to remove, you'll have to put it back in the original quotation mark.
With one liner, the line feed part is semicolon.
awk'BEGIN {FS="\";OFS="\"\"};{gsub(",", "", "", $2);print}'data.csv
More simply, how about this writing method?
cat data.csv | awk' match ($0, / ".*?"/) {tmp0=substr($0, RSTART,RLENGTH); tmp1=tmp0; tmp2=gsub(/,/, ", ", tmp0); sub(tmp1, tmp0, $0); print$0}
I think the processing will be strange if data surrounded by more than one "in one line comes, but I can't think of repeating awk right away, so I'll write it in php below.
$csv_data='
A, "100", Z
B, "1,000", "1,000", Z, "1,000"
C, "1,000,000", Z
';
$array0 = array();
$end_flag=false;
do{
$end_flag=false;
$array=explode('',$csv_data,2);
if(count($array)>1){
$array0[] = $array[0].';
$array2=explode('',$array[1],2);
if(count($array2)>1){
$array0[]=str_replace(',',',',',$array2[0]).';
$csv_data = $array2[1];
$end_flag = true;
} else {
$array0[] = $array2[0];
}
} else {
$array0[] = $array[0];
}
} while($end_flag);
echo implode(',$array0);
I tried double-coating the field separator in awk.
Executable jikko.bat
awk-f src.txt<data.csv
Source File src.txt
BEGIN {FS="\";OFS="\"}
{gsub(", "", "", "", $2); print}
Please tell me how to use one liner.
574 Who developed the "avformat-59.dll" that comes with FFmpeg?
572 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
610 GDB gets version error when attempting to debug with the Presense SDK (IDE)
911 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
© 2024 OneMinuteCode. All rights reserved.