If you can use Gawk 4.0 or later, it seems that you can handle it with gawk's FPAT. However, it is not supported in cases where line breaks or double quotes are included.
$gawk-v OFS=, -vFPAT='([^,])*)|("[^"]+")'{for(i=1;i<=NF;i++){if($i~/^[\t]*"/){gsub(", "", "", ", $i)}}};print}'
Defining Fields by Content and metropolis' comments to support 0 character fields, so ([^,]+)
. ([^,] where
is located.*) I chose
If you can only use it before Gawk-4, I think C is easier to write than awk.
This also supports line breaks and double quotes.
#include<stdio.h>
static int parse_element_quoted()
{
intc;
while((c=getchar())!=EOF){
switch(c){
case '' '':
c=getchar();
if(c=='"){
// Two consecutive ""s are when there is one "" in the original data.
// putchar(''''); this line is not required when converting from '''' to ''''
putchar('"');
break;
}
else{
Putchar('''); // No need to take ''' (A)
putchar(c);
return 0;
}
case '\r':
case '\n':
// case '\\':
case', ':
// ignore(remove)them
break;
default:
putchar(c);
}
}
return1;
}
static int parse_element()
{
intc=getchar();
if(c==EOF)
return1;
else if(c=='"'){
Putchar('''); // No need to take ''' (B)
return parse_element_quoted();
}
else{
putchar(c);
if(c=='\n'||c==',')
return 0;
while((c=getchar())!=EOF){
switch(c){
case', ':
case '\n':
putchar(c);
return 0;
case '\r':
break;
default:
putchar(c);
}
}
return1;
}
}
int main()
{
while(parse_element()==0);
return 0;
}
Personally, I think it would be easier to delete "," and "" together.
If so, comment out the lines (A) and (B) of the code above.
If it's sed, it's like this.However, it is assumed that the original string does not contain tab characters.
sed'-es/"\"([^"]*\)"/\tA\1\tB/g;:loop;s/\(\tA[^\t]*\), /\1/g;loop;s/\t./"/g'
Use tab characters for the barrier and replace quotation marks with \tA,\tB
.
Remove commas in strings starting with \tA
one character at a time in the loop.
When you run out of \tA,\tB
to remove, you'll have to put it back in the original quotation mark.
With one liner, the line feed part is semicolon.
awk'BEGIN {FS="\";OFS="\"\"};{gsub(",", "", "", $2);print}'data.csv
More simply, how about this writing method?
cat data.csv | awk' match ($0, / ".*?"/) {tmp0=substr($0, RSTART,RLENGTH); tmp1=tmp0; tmp2=gsub(/,/, ", ", tmp0); sub(tmp1, tmp0, $0); print$0}
I think the processing will be strange if data surrounded by more than one "in one line comes, but I can't think of repeating awk right away, so I'll write it in php below.
$csv_data='
A, "100", Z
B, "1,000", "1,000", Z, "1,000"
C, "1,000,000", Z
';
$array0 = array();
$end_flag=false;
do{
$end_flag=false;
$array=explode('',$csv_data,2);
if(count($array)>1){
$array0[] = $array[0].';
$array2=explode('',$array[1],2);
if(count($array2)>1){
$array0[]=str_replace(',',',',',$array2[0]).';
$csv_data = $array2[1];
$end_flag = true;
} else {
$array0[] = $array2[0];
}
} else {
$array0[] = $array[0];
}
} while($end_flag);
echo implode(',$array0);
I tried double-coating the field separator in awk.
Executable jikko.bat
awk-f src.txt<data.csv
Source File src.txt
BEGIN {FS="\";OFS="\"}
{gsub(", "", "", "", $2); print}
Please tell me how to use one liner.
© 2024 OneMinuteCode. All rights reserved.