in peg.js
I wrote the following parser
per.pegjs
Start
= c:(Content+)EOL {
return c;
}
Content
= openId:OpenTag:(Content) + closeId:CloseTag{
if(openId!==closeId){
through new Error("expect</"+openId+">but</"+closeId+">";
}
return { type: 'element', id:openId, content:c};
}
/txt:ContentText{
return { type: 'txt', content: txt.trim()}
}
ContentText=[^<>\n] + {return text();}
OpenTag="<"id:[0-9]+">"{return parseInt(id.join(')))}
CloseTag="</"id:[0-9]+">"{return parseInt(id.join(')))}
EOL=[\n]*
Receive the following input and spit out json
<1>abc</1><2>def<3>ghi</3>>/2>
output
[
{
"type": "element",
"id"—1,
"content": [
{
"type": "txt",
"content": "abc"
}
]
},
{
"type": "element",
"id"—2,
"content": [
{
"type": "txt",
"content": "def"
},
{
"type": "element",
"id"—3,
"content": [
{
"type": "txt",
"content": "ghi"
}
]
}
]
}
]
This parser cannot contain <,> in ContentText, but
If I want to include it somehow, how should I parser it?
(Posting self-answer from questioner as wiki)
I have solved it myself, so how can I summarize the corrections?
Please let me know if there is a smarter way.
Start=c:(Content+)EOL{
return c;
}
Content=open:OpenTag:Content+close:CloseTag{
return { type: 'element', id:open, content:c};
}
/
txt:Text{
return { type: 'txt', content: txt.trim()}
}
Text=txt:(NotOpenTag/NotCloseTag/NotTagNotEOL) + {returntxt.join(').trim();}
NotTagNotEOL = [^<>\n] {return text();}
NotOpenTag="<!"Digit!"/"{return text();}
/!Digit">"{return text();}
NotCloseTag="</"!Digit {return text();}
OpenTag="<"id:Digit">"{returnid;}
CloseTag="</"id:Digit">"{returnid;}
Digit = [0-9] + {return parseInt(text()));}
EOL=[\n]*
© 2024 OneMinuteCode. All rights reserved.