I want the code to work without using the global variable

Print the line starting with >YP from short.sequ.txt, and count the number of characters from the next line until the space is recognized.
I would like to repeat this task to finally print the >YP line for the maximum and maximum characters.
I want to update max_len and max_protein whenever I find the maximum value.(Minimum value is the same)

I created it using the global variable, but if possible, I would like to write this kind of code without using the global variable.
In that case, I don't know what to change the global variable to, so please let me know.
Thank you for your cooperation.

with open("short.sequ.txt") asf:

    max_len = 0
    max_protein=""

    min_len=1000000000000000
    min_protein=""

    def change (protein, seq):
        global max_len, max_protain, min_len, min_protain
        seq_len=len(seq)
        if seq_len == 0:
            return

        else:
            print([seq_len])

        if seq_len>max_len:
            max_len, max_protain, min_len, min_protain=seq_len, protein, seq_len, protein

        if seq_len<min_len:
            min_len, min_protein = seq_len, protein


    a_line=""
    a_seq=""

    for line inf:
        strip_line=line.rstrip()
        if strip_line.startswith(">"):
            a_line = strip_line
            change(a_line,a_seq)
            a_seq=""
            print( strip_line + '\n')
        else:
            # A = [len(a_seq+trip_line)]
            a_seq+=strip_line
msg1 = "Maximum amino acid entry:"
msg2 = "Length:"
msg3 = "Minimum amino acid entry:"
print(msg1,max_protein,msg2,max_len)
print(msg3,min_protein,msg2,min_len)

short.sequ.txt

>YP_009518834.1 passive uncharacterized protein YjiT [Escherichia colistr.K-12substr.MG1655]
MGQSEYISWVKCTSWLSNFVNLRGRQPDGRPLYYHATNDEYTQLTQLLRAVGQSQSNICNRDFAACFV
LFCSEWYRRDYERQCGWTWDPIYKKIGISFTATELGTIVPKGMEDYWLRPIRFYESERRNFLGTLFSEGG
LPFRLKESDSRFLAVFSRILGQYEQAKQSGFSALSLAVIEKSALPTVFSEDTSVELISHMADNLNSL
VLTHNLINHKEPVQQLEKVHPTWRSEFPIPLDETGTHFLNGLLCAASVEAKPRLQKNKSTRCQFYWSEK
HPDELRVIVSLPDEVSFPVTSEPSTTRFELAICEDGEEVSGLGPAYLENRQATVRLRKSEVRFGRQNP
SAGLSLVARAGGMIVGSIKLDSEIAIGEVPLTFIVDADQWLLQGQASCSVRSSDVLIVLPRDNSNVAGF
DGQSRAVNVLGLKALPVKGCQDVTVTANETYRRITGREQISIGRFAALNGKRASWVCHPDETFIGVPKVIS
TLPDIQSIDVTRYTC

>YP_009518833.1 uncharacterized protein YtiA [Escherichia colistr.K-12 substr.MG1655]
MKEFLFLFHSTVGVIQTRKALQAAGMTFRVSDIPRDLRGGCGLCIWLTCPGEEIQWVIPGLTESIYCQQ
DGVWRCIAHYGVSPR

>YP_009518832.1iraD leader peptide [Escherichia colistr.K-12substr.MG1655]
MENEHQYSGARCSGQAAYVAKRQECAK

>YP_009518831.1 protein YtiD [Escherichia colistr.K-12substr.MG1655]
MADYAEINNPPELSSSGDKYFHLRNYSEYSEYTSGFFLSLMIFIKS

>YP_009518830.1 protein YtiC [Escherichia colistr.K-12substr.MG1655]
MPVNGIFDVFDMLSIYIIYKLIVSNNNTWLIMRK

>YP_009518829.1 passive YjfA [Escherichia colistr.K-12substr.MG1655]
MHMVTYPCLTSRRFQLALIHRVDKRTSMHSRTASESTGARIHRPWCARHQVRPAWRCQYDKLHRVPR
SPELRLDSGPGYTTGSYRY

python

2022-09-30 13:57

1 Answers

Below, we divide the contents of short.sequ.txt by empty lines (\n\n) and sort them by the length of the individual protein sequence.

import re

msg1 = "Maximum amino acid entry:"
msg2 = "Length:"
msg3 = "Minimum amino acid entry:"

with open("short.sequ.txt") as f:
  assoc = {}
  for pin f.read().split("\n\n"):
    if not re.match('^>YP_',p): continue
    arr = p.split("\n")
    assoc[arr[0]]=sum(map(len,arr[1:]))
  sa=sorted (assoc.items(), key=lambdax:x[1])

  print('{}{}\n{}{}.format(msg1, sa[-1][0], msg2, sa[-1][1]))
  print('{}{}\n{}{}.format(msg3, sa[0][0], msg2, sa[0][1]))

=>
Maximum amino acid entries: >YP_009518834.1 passive uncharacterized protein YjiT [Escherichia colistr.K-12 substr.MG1655]
Length: 505
Minimum amino acid entry: > YP_009518832.1iraD leader peptide [Escherichia colistr.K-12 substr.MG1655]
Length: 27

2022-09-30 13:57

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656