I want to search the array by regular expression.

Asked 2 years ago, Updated 2 years ago, 74 views

I apologize if there is not enough to describe.


What do you want to do

I'd like to search in the array using the regular expression to get the "most elemental string" in the "match string".


Current state

If you throw a string (=inData, variable) and a delimited character (=delimiter, variable, may use regular expressions),

 Separate using Regex.split
"·Replace ""separated front characters (=front)"""
"·""Match delimiter"""""""
·Insert "separated back character (= behind)"
·If it is separated (if there is even one part in the inData where the delimiter matches) true / If it is not separated, return false

I created a function (GetSplitData) to do the above.

The following is an example of the movement when there was only one part that matched the delimiter character.
This is working as expected.

Example: GetSplitData("AAA BBB", "\s+", "", "", "") // inData, delimiter, front, behind
Return value → inData = "AAA BBB"
      front="AAA"
      delimiter=""
      behind="BBB"
      return true

"I am worried about the movement of ""there were multiple parts that matched the delimiter"" this time."
The desired shape is as follows:

Example: GetSplitData ("AAA00BBB0000CCC", "[0-9]+", "", "", "")
  →inData="AAA00BBB0000CCC"
   front="AAA00BBB"
   delimiter="0000"
   behind = "CCC"
   return true


As I use Regex.split, the current movement is
"AAA", "00", "BBB", "0000", and "CCC" are included in the String array.
I can't think of a way to get the "0000" at the back from here.
I was thinking of licking the array again with a delimiter if there are more than one, but
How to lick an array using a regular expression didn't come up when I searched.
Also, where the delimiter at the back was acquired,
I can't even think of a way to get it before and after it.
(I thought I could use LastIndexOf without regular expressions...)If I use it, will I go to search for the string "[0-9]+"?)
(Is it possible to use regular expressions in LastIndexOf just because I failed to find out?)

Please let me know if there are any methods, algorithms, or ideas that can be used.
I would like to ask for your advice.
I look forward to your kind cooperation.


environment

·Microsoft Visual Studio 2013 Express for Windows
· .NET Framework 4.5.1
·C#5.0


Add

I apologize for the confusion.
The criteria for adopting delimiter is not string length, but the one at the one at the back.
If you receive an inData called "AAA0000BBB00CCC" and a delimiter called "[0-9]+",
"The ""00"" will be used for the delimiter."Because it's at the back.
I mistook x in array[x] for the number of elements.
I am very sorry that I confused you because I didn't use the term correctly.


Additional note 2 Background of asking this question in the first place

public bool GetSplitData(string inData, string delimiter, ref string front, ref string behind)
{
    // search for delimiter characters from behind
    int index=inData.IndexOf(delimiter);
    // unbroken
    if(index==-1)return false;

    front=inData.Substring(0,index);
    behind = inData.Substring (index+delimiter.Length, inData.Length-(index+delimiter.Length));
    return true;
}


c# .net regular-expression

2022-09-30 14:20

2 Answers

Dear Sayuri and pgrho, thank you for your reply.
This time, Sayuri, who answered earlier, will be the best answer.
Pgrho's writing method using lambda style and linq was also very instructive.I will refer to it somewhere else.

Here's a program that actually worked.


public bool GetSplitData(string inData, ref string delimiter, ref string front, ref string behind)
{
    // replace with a negative look-ahead regular expression
    string pattern = String.Format("({0}(?!.*{0}))")", delimiter);
    Regex splitRegex = new Regex (pattern);
    // Separate inData with Split
    string [ ] split = splitRegex.Split(inData);
    // divide into three parts
    if(split.Length==3)
    {
        front = split [0];
        delimiter=split[1];
        behind = split[2];
        return true;
    }
    else return false;
}


2022-09-30 14:20

I'm not sure if you want to search like LastIndexOf or determine by the number of characters, but why don't you call Regex.Matches and use Match for the longest or last time?

varms=Regex.Matches(inData, delimiter);
if(ms.Count>0)
{
    varm = ms [mc.Count-1];

    front=inData.Substring(0,m.Index);
    delimiter = m.Value;
    behind = inData.Substring(m.Index+m.Length);

    return true;
}

If you match a regular expression to more than one place, whether it matches a particular place or not itself affects subsequent results.Therefore, you should avoid rewriting the regular expression itself.

example

If you match the string aaba with the pattern (a+|aba), the possible locations are

Yes, but if you match it with Regex.Matches, you get aa (2.) and a (5.).
This is because aa excludes 1. and 3. and 4. that begin halfway through aa.

However, if you change the pattern to (a+|aba)(?!.*(a+|aba)), 1. and 2. do not match because of the fourth character a, so 3. and 4. begin with the second character and match longer aba.

Of course, you can re-open a and aba both end positions are the same, but it's hard to explain the specification why aba is the delimiter instead of a to people who don't know how to implement the GetSplitData method.
Simply using Regex.Matches follows the basic rules of regular expression, so we recommend this for ease of understanding to third parties.

If you use negative look-ahead, there were cases where it would be easier to understand, so I will only list the results.The cause is as explained above.

  • Regex.Split("a==b==c", "==")->"a", "b", "c"
  • Regex.Split("a====c", "==")->"a", "", "c"
  • Regex.Split("a====c", "==(?!.*==))->"a=", "=c"


2022-09-30 14:20

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.