We are looking for a way to split each keyword if you enter one or more keywords for a full-text search, such as Google Search.
I'd like to divide the following strings separated by half-width and full-width spaces.
I don't know the regular expression.
Kita A+!^"*-="P"Rain""Snow" only Don't break "abc Time 123 - "Gate Limit"
For Java string literal expressions:
String motor="North A+!^\"*-=\"P\"Rain\"\"\"Snow\"\"Only US>"abc time 123-\"gate limit\"\"\"Don't break";
North
A+!^"*-="
P "rain", "snow"
"only US>"
abc
Time 123 - "Exit"
"Don't break
←I intentionally removed the last double quotationCan you use the regular expression below?It may be, but if you write this code in JAVA,
For pre-execution issues, the descriptive string is red with an incorrect error in eclips.
(I probably need to add \
to escape, but no matter how many \
are added before /
or "
, the string is red with an incorrect error.)
Pattern p=Pattern.compile("\s+(?"|[^"])*"?|[^"]\S*"));
String [ ] result = p.split(moto);
for(inti=0;i<result.length;i++){
System.out.println("["+result[i]+"]");
}
Is the regular expression correct?
To write in Java, add \, and so on, how can Eclipse stop giving the error that the string is incorrect?
Here's an example of a number of patterns we've done.
Pattern p=Pattern.compile("\\s+(\"(?\"|[^\"])*\"?|[^\"]\S*");
When I put \ as below, there are no more string errors in eclips in red letters.
String[]result=test.split("\\s+(\"(?\"|[^\"])*\"?|[^\"]\\S*"));
for(inti=0;i<result.length;i++){
out.print("["+result[i]+"]");
}
However, the regular expression seems to be bad, so we couldn't divide it as below.
test="rain" only";
↓
after splitting
Rain "Mizu"
When you're almost seeing a solution, there's another solution, but if it's a specification (correct answer is [A+!^"*-=P"]
), you can do it with regular expressions.
String motor="Northern A+!^\"*-=P\"Rain\"\"\"Snow\"\"Misutani\"abc Time 123-\"Gate`Limit\"\"\"\"Don't Break";
Pattern p=Pattern.compile("(?:[^\\s\"\\\\\\]|\\\.|\"(?:\\\.|[^\\\\\\\\])*(?:\"|$))+);
Matcher m=p.matcher (moto);
while(m.find(){
System.out.println("["+m.group(1)+"]");
}
Output:
[Here we go]
[A+!^"*-=P"]
[Rain]
[""]
[Snow]
["Only"]
[Still]
[Still]
["Abc Time 123 - "Gate Limit"] [Don't break it]
The result looks very different from the expected result in the questionnaire, but considering that the first A+!^"*-=
is A+!^"*-=P"
and the range between "
alternates, this should be the result.
My idea is
\
+ one character"
to "
or string end
Think of as a lump and extract the continuity of the lump with the longest match (the default regular expression).
If you have time, please try it.
(As for the code I wrote a little while ago, if I was told that the movement was different, I would be worried about where to fix it.After all, it might be better to prioritize the ease of understanding.)
It seems difficult with regular expressions.
I don't know if it matches my intention, but is it like this?
import java.util.Iterator;
public class Tokenizer implements Interator <String > {
private static final char DELIMITER_SPACE=';
private static final char DELIMITER_SPACE_JP=';
private static final char DELIMITER_DOUBLE_QUOTE='";
private String nextToken;
private String target;
private Character delimiter = null;
private int pos = 0;
private int start = 0;
public Tokenizer (String target) {
this.target=target;
This.nextToken=getNextToken();
}
@ Override
public boolean hasNext(){
return nextToken!=null;
}
@ Override
public String next() {
String next=this.nextToken;
This.nextToken=getNextToken();
return next;
}
@ Override
public void remove(){
through new UnsupportedOperationException();
}
private String getNextToken() {
int size = target.length();
while(pos<size){
charc = target.charAt(pos);
pos++;
int length = pos-start;
if(isDelimitor(c,delimiter)){
if(length>1){
boolean isDoubleQuote=isDoubleQuote(delimiter);
String token=getToken(isDoubleQuote, start, length, false);
if(isDoubleQuote){
delimiter = null;
}
start = pos;
return token;
} else{
delimiter=c;
start = pos;
}
} else if(c==DELIMITER_DOUBLE_QUOTE&&length<=1){
delimiter=c;
start = pos;
} else if ((c==DELIMITER_SPACE||c==DELIMITER_SPACE_JP)&&length<=1){
if(delimiter==null||delimiter!=DELIMITER_DOUBLE_QUOTE){
delimiter=c;
start = pos;
}
}
}
int length = pos-start;
if(length>0){
String token=getToken(isDoubleQuote(delimiter), start, length, true);
start = pos;
return token;
}
return null;
}
private boolean is DoubleQuote (Character delimiter) {
return delimiter!=null&delimiter==DELIMITER_DOUBLE_QUOTE;
}
private boolean isDelimiter(charc, Character delimiter) {
if(delimiter==null){
return c==DELIMITER_SPACE||c==DELIMITER_SPACE_JP||c==DELIMITER_DOUBLE_QUOTE;
} else{
return c==delimiter;
}
}
private String getToken(boolean isDoubleQuote, int start, int length, boolean isLast) {
if(isDoubleQuote){
return target.substring (start-1, start+length);
} else{
return target.substring(start, start+length-(isLast?0:1));
}
}
}
Please use it like this.
String motor="North A+!^\"*-=P\"Rain\"\"\"Snow\"\"Only US>"abc time 123-\"gate limit\"\"\"Don't break";
Tokenizer tokenizer = new Tokenizer (moto);
while(tokenizer.hasNext()){
System.out.println("["+tokenizer.next()+"]");
}
Results
[Here we go]
[A+!^"*-=]
[P "Rain" and "Snow"]
["Only"]
[abc]
[Time 123 - "Limited Gate"]
["Don't break it]
UPDATE1
The intent of the program is to:
UPDATE2
String motor="Northern A+!^\"*-=P\"Rain\"\"\"Snow\"\"Misutani\"abc Time 123-\"Gate`Limit\"\"\"\"Don't Break";
Running a program on the produces the following results:
[Here we go]
[A+!^"*-=]
[P"]
[Rain]
["Snow"]
["Only"]
[abc]
[Time 123 - "Limited Gate"]
["Don't break it]
Le Pered'OO's response minus \
.
import java.util.regex.*;
public class Main {
public static void main(String[]args) {
String moto="Northern A+!^\"*-=\"P\"Rain\"\"\"Snow\"\"Only US>"abc time 123-\"gate limit\"\"\"Don't break";
Pattern p=Pattern.compile("((?:[^\\s\"]|\"[^\"]*(?\"|$))+);
Matcher m=p.matcher (moto);
while(m.find()){
System.out.println("["+m.group(1)+"]");
}
}
}
(?:[^\\s\"]|\"[^\"]*(?:\"|$))+)
may be a little difficult to read, but you can add indentation or
Pattern p = Pattern.compile(
"(" +
"(?:" +
US>"[^\\s\"]|"+
"\"[^\"]*(?:\"|$)" +
")+" +
")"
);
Build part by part or
// Regular characters: blank, full-width, non-double-quote characters
String normal_char = "[^\\s\"]";
// Quoted string—A string of at least 0 characters, either double-quote or at the end of a line.
String quoted_str="\"[^\"]*(?:\"|$)";
// tokens:regular characters or quoted strings, one or more
String token="(?:"+normal_char+"|"+quoted_str+")+";
// remember the part that matches the token as a group
Pattern p = Pattern.compile("+token+")";
There is a solution.
There is a lot of demand to split the string considering the quote, so if you search for it, you'll see a lot of things.
https://stackoverflow.com/a/7804472/4368502
https://stackoverflow.com/a/3366634/4368502
It is a little different from this time in that it is divided even in the quote position, but please refer to it.
These seem to be closer to how Google searches work.
The specification of the question is closer to the quote of the shell (sh, bash, etc.) than the input field of the Google search.
I thought it might be simpler than regular expressions, so I wrote it down, but there was no division like that, so it became longer.
I think it will be longer if I write it in a way that is easy to understand and expandable.
I don't think the long one is bad, but I think it's okay to use regular expressions at this level.
public class Main {
public static void main(String[]args) {
String moto="Northern A+!^\"*-=\"P\"Rain\"\"\"Snow\"\"Only US>"abc time 123-\"gate limit\"\"\"Don't break";
State state = State.DELIMITER;
for (inti=0, start=0, length=moto.length(); i<length;i++){
// US>Separate characters:
// spaces, tabs, full-width spaces, line breaks, paper feed,
// carriage return, vertical tab
final String delimiters="\t\n\f\r\u000b";
final String c=moto.substring(i,i+1);
if(delimiters.contains(c)){
if(state==State.UNQUOTED) {
out(moto.substring(start,i));
state = State.DELIMITER;
}
}
else{
if(state==State.DELIMITER) {
start = i;
state = State.UNQUOTED;
}
if(c.equals("\""){
state=(state==State.QUOTED)?
State.UNQUOTED: State.QUOTED;
}
}
if(i==length-1&state!=State.DELIMITER){
out(moto.substring(start, i+1));
}
}
}
private enum State {DELIMITER, QUOTED, UNQUOTED}
private static void out (Strings) {
if(s.length()==0){return;}
System.out.println("["+s+"]");
}
}
© 2024 OneMinuteCode. All rights reserved.