Re.sub question

Asked 2 years ago, Updated 2 years ago, 42 views

import glob

import shutil

import os

import re

dirs = ['DataGathering']

for dir in dirs:

   file_names = glob.glob(os.path.join(dir,'*'))

file_names = [os.path.basename(name) for name in file_names]

   if not os.path.exists(dir +'_renamed'):

       os.makedirs(dir+'_renamed')

       #os.makedirs(dir+'_renamed/csv')

   root = dir + '_renamed'

   for name in file_names:

     if name.endswith('csv') :
           continue
       print(name)
       rename = re.sub("[\(\[].*?[\)\]]", "",name)

If you look at the re.sub, it's very complicated, but why are the symbols there? [] I wonder if you lose confidence in brackets, including letters in brackets. I'd appreciate it if you could tell me how it works. Number 1 is converted to number 2.

1. [Arr]2_[Crd]C1_[Pkg]CAPACITOR-X7R-104K-50V-1608_[PkgType]1_[Conf]_[AF]1_[RBF]1_[CH]1111111111111111_[Type]16_Ang.bmp
2.                2_C1_CAPACITOR-X7R-104K-50V-1608_1__1_1_1111111111111111_16_Ang.bmp

python re.sub

2022-09-21 20:54

1 Answers

You may know that re.sub() was used to replace the string.

rename = re.sub("[\(\[].*?[\)\]]", "",name)

You need to know the difference between gready and laziness matching here.

? that follows the quantifier means

as a wildcard

Match 0 or 1 with the last letter

It doesn't mean anything.

By default, using only the quantifiers (*, +, and ? will cause a gready match. Literally greedy. Anything that matches in the middle is included as much as possible. Because it matches this quantifier from the string you search to the place where the last termination condition is satisfied,

If the regular expression subtracts ? from the regular expression, [\(\[].*[\]] If you use this, the first string above is It will be replaced like this by re.sub().

16_Ang.bmp

However, by attaching this ? to the back, the grammar operates as lazy quantifier. I don't want to look at everything behind and just match it with the closest one, so each square bracket and the inside are matched separately and replaced with an empty string.

On the contrary, the substituted results seem to have been very diligent, but in fact, from the point of view of quantifiers, they have been lazy. I've only done what I see. :)


2022-09-21 20:54

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.