For example, consider the following protein, i.e., an arbitrary sequence of 20 amino acids (characters):
MNIFEMLRIDELRLKIYKDTEYTIIHLLTKSLNAAKSELDKAIRNTNVITKDEAEKLFNQDVDAAVRILRNAKLKPVYD
In fact, protein sequences can contain specific patterns of arbitrary length amino acid sequences.This pattern is called a motif and is given in a regular expression.Here is an example (there are actually more than 1000):
N[^P][ST][^P]
[RK] {2}. [ST]
[ST]. [RK]
[ST].{2} [DE]
.G[RK][RK]
C.[DN].{4}[FY].C.C
RGD
Here, I would like to convert the above protein from the sequence of only the original amino acid to the regular expression of the motif, and express the non-motif part as the original amino acid.However, I would like to express it so that the original protein length is as short as possible.
What kind of code should I write in Python to make this?
Thank you for your cooperation.
If you understand that you want to convert a regular expression that matches it,
import re
motifs=['N[^P][ST][^P]',
'[RK]{2}.[ST]',
'[ST].[RK]',
'[ST].{2}[DE]',
'.G[RK][RK]',
'C.[DN].{4}[FY].C.C',
"RGD"]
amino="MNIFEMLRIDELRLKIYKDTEYTIIHLLTKSLNAAKSELDKAIRNTNVITKDEAEKLFNQDVDAAVRILRNAKLKPVYD"
for motif in motifs:
amino=re.sub(motif, motif, amino)
print(amino)
# MNIFEMLRIDELRLKIYKDTEYTIIHLLTKSPSLNAAK [ST].{2}[DE]KAIRNTNVI [ST].{2}[DE]AEKLFNQDVDAAVRILRNAKLKPVYD
You should be able to do it with
In the given example, there is only one matching motif, and the string is not short...
© 2024 OneMinuteCode. All rights reserved.