I want to replace the array with a regular expression.

Asked 2 years ago, Updated 2 years ago, 99 views

For example, consider the following protein, i.e., an arbitrary sequence of 20 amino acids (characters):

MNIFEMLRIDELRLKIYKDTEYTIIHLLTKSLNAAKSELDKAIRNTNVITKDEAEKLFNQDVDAAVRILRNAKLKPVYD

In fact, protein sequences can contain specific patterns of arbitrary length amino acid sequences.This pattern is called a motif and is given in a regular expression.Here is an example (there are actually more than 1000):

N[^P][ST][^P]
[RK] {2}. [ST]
[ST]. [RK]
[ST].{2} [DE]
.G[RK][RK]
C.[DN].{4}[FY].C.C
RGD

Here, I would like to convert the above protein from the sequence of only the original amino acid to the regular expression of the motif, and express the non-motif part as the original amino acid.However, I would like to express it so that the original protein length is as short as possible.

What kind of code should I write in Python to make this?
Thank you for your cooperation.

python regular-expression

2022-09-30 19:21

1 Answers

If you understand that you want to convert a regular expression that matches it,

import re

motifs=['N[^P][ST][^P]',
          '[RK]{2}.[ST]',
          '[ST].[RK]',
          '[ST].{2}[DE]',
          '.G[RK][RK]',
          'C.[DN].{4}[FY].C.C',
          "RGD"]

amino="MNIFEMLRIDELRLKIYKDTEYTIIHLLTKSLNAAKSELDKAIRNTNVITKDEAEKLFNQDVDAAVRILRNAKLKPVYD"
for motif in motifs:
    amino=re.sub(motif, motif, amino)
print(amino)
# MNIFEMLRIDELRLKIYKDTEYTIIHLLTKSPSLNAAK [ST].{2}[DE]KAIRNTNVI [ST].{2}[DE]AEKLFNQDVDAAVRILRNAKLKPVYD

You should be able to do it with

In the given example, there is only one matching motif, and the string is not short...


2022-09-30 19:21

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.