Error when trying to run TSSfinder

Asked 1 years ago, Updated 1 years ago, 374 views

The TSSfinder does not work, so I rewrite it myself, but the error does not stop.I finally got around here.Please let me know what's wrong with the script below.

The errors seen in this script are as follows:

Spit out the array, finally print the mysterious "1", and then spit out the error below.

error messages:

Traceback (most recent call last):
  File "./new_tssfinder_2.py", line 115, in<module>
    predict()
  File"/home/iceplant4561/anaconda3/envs/tssfinder/lib/python 3.6/site-packages/click/core.py", line722, in__call__
    return self.main (*args, **kwargs)
  File"/home/iceplant4561/anaconda3/envs/tssfinder/lib/python 3.6/site-packages/click/core.py", line 697, in main
    rv=self.invoke(ctx)
  File"/home/iceplant4561/anaconda3/envs/tssfinder/lib/python 3.6/site-packages/click/core.py", line895, invoke
    return ctx.invoke(self.callback,**ctx.params)
  File"/home/iceplant4561/anaconda3/envs/tssfinder/lib/python 3.6/site-packages/click/core.py", line 535, invoke
    return callback (*args, **kwargs)
  File "./new_tssfinder_2.py", line88, predict
    chrm [print(seq_r.id)] = print(seq_r.seq)

Current State Code:

#!/usr/bin/env python

import sys
import pandas aspd
from Bio import SeqIO
from subprocess import Popen, PIPE, STDOUT
importos
import click
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord


CURR_DIR=os.path.dirname(os.path.realpath(__file__))

MYOP_PROM_BIN=os.path.join(CURR_DIR, "/home/iceplant4561/Agarie_group/ice_plant_genome_from_GSA/TSSFinder/training_sets/new_tssfinder_2.py")

defrev(seq):
    rev_fasta=[]
    for i in reversed (seq):
     if i.upper() == 'A':
          rev_fasta.append('T')
     elifi.upper() == 'C':
          rev_fasta.append('G')
     elifi.upper() == 'G':
          rev_fasta.append('C')
     elifi.upper() == 'T':
          rev_fasta.append('A')
     else:
          rev_fasta.append(i.upper())
     return'.join(rev_fasta)

default_fasta_to_predict(chrm, start1, max_size):
    
    dists = [ ]
    for i in range (50,601,50):
        dists+=[i]*50
    dists = ['600'] * (max_size-len(dists)) + list(reversed(dists))
    return dists

    for row in start 1.iterrows():
     
     if row ['strand'] == '+':
            if row ['begin'] - max_size+1<0:
                a = 0
            else:
                a=row['begin']-max_size+1
            seq=list(zip(chrm[str(row['chr')]][a:row ['begin'] +1], dists]
     else:
            if row ['begin'] + max_size>len(chrm [str(row['chr')]):
                 b=len(chrm[str(row['chr'])])
            else:
                 b = row ['begin'] + max_size
            seq = list(zip(rev(chrm[str(row['chr']]][row['begin']:b]), dists))
            
    seq[0] = ('NPROMOTER', 'NPROMOTER')
    seq[-1] = ('NPROMOTER', 'NPROMOTER')
    return row, seq

def find_features (prediction):
    # print (prediction)
    try:
        tss_pos=prediction.index("TSS-0")
    except:
        tss_pos=-1

    try:
        tata_pos=prediction.index("TATA-0")
    except:
        tata_pos=-1

    return tss_pos, tata_pos

@click.command()
@click.option('--model', type=click.Path(exists=True), help='model directory')
@click.option('--start', type=click.File('rt'), help='start codons BED file')
@click.option('--genome', type=click.File('rt'), help='genome FASTA file')
@click.option('--output', type=click.Path(exists=True), help='output directory')
@click.option('--max_seq_size', type=int, default=1500, help='maximum sequence size to be analyzed')
def predict (model, start, genome, output, max_seq_size):
    start_file=start
    fasta_file=genome
    outdir=output

    start1=pd.read_csv(start_file, sep="\t", names=['chr', 'begin', 'end', 'gene_name', 'score', 'strand'])

    chrm = [ ]
    for seq_rin SeqIO.parse(open("athaliana/genome.fasta"), 'fasta':       
        chrm [print(seq_r.id)] = print(seq_r.seq)

    tss_file=open(os.path.join(outdir, 'out.tss.bed', "w")
    tata_file=open(os.path.join(outdir, 'out.tata.bed', "w")

    for gene extract_fasta_to_predict(chrm, start1, max_size=max_seq_size):
        p=Popen("{}w{}.format(MYOP_PROM_BIN,model).split(), stdout=PIPE, stdin=PIPE)
        for n, din fasta:
            p.stdin.write("{}\t{}\n".format(n,d).encode("ascii")")
        tss_pos, tata_pos = find_features(p.communicate()[0].decode().split("\n"))
        if tss_pos > 0:
            tss_pos=len(fasta)-tsss_pos
            if gene ['strand'] == "+":
                tss_file.write("{}\t{}\t{}\t{}\t{}\t1\t{}\n".format(gene['chr', int(gene['begin'])-tss_pos, int(gene['begin'])-tss_pos+1, gene['gene_name'], gene['stand']))
            else:
                tss_file.write("{}\t{}\t{}\t{}\t{}\t1\t{}\n".format(gene['chr', int(gene['begin']) + tss_pos+1, int(gene['begin']) + tss_pos+2, gene['gene_name'], gene['stand']))
        if tata_pos > 0:
            tata_pos=len(fasta)-tata_pos
            if gene ['strand'] == "+":
                tata_file.write("{}\t{}\t{}\t{}\t{}\t1\t{}\n".format(gene['chr', int(gene['begin'])-tata_pos, int(gene['begin'])-tata_pos+1, gene['gene_name'], gene['stand']))
            else:
                tata_file.write("{}\t{}\t{}\t{}\t{}\t1\t{}\n".format(gene['chr', int(gene['begin'])+tata_pos+1, int(gene['begin'])+tata_pos+2, gene['gene_name'], gene['stand']))
        return fasta,seq
    tss_file.close()
    tata_file.close()

if__name__=='__main__':
    predict()

python python3

2022-11-26 07:45

1 Answers

Assuming that the overall question is the same as Bioinformatics warrior training #4~TSSfinder~ to help someone, the problem you are currently experiencing is a lack of transcription to the question?

File "./new_tssfinder_2.py", line 88, predict
    chrm [print(seq_r.id)] = print(seq_r.seq)
TypeError: list indications must be integrators or slices, not NoneType

As you can see in some note articles, the return value for print(...) is None, and the result cannot be used as a valid value.
That's why the part of the print(seq_r.id) and print(seq_r.seq) above are probably the result of spitting out the array.However, it should be an error at the beginning of the loop, so it is a mystery that is generally .
(Or was it not mentioned in the question or note article, or was the function/method used automatically, or was it displaying data?) And Lastly, the mysterious "1" is the first print(seq_r.seq) part of the loop, and the result (None) may be an error trying to replace chrm[print(seq_r.id)]=?)

Rewrite the of the note article as shown below. If you compare it to the following parts (although it's a little wrong):

start=pd.read_csv(start_file, sep="\t", names=['chr', 'begin', 'end', 'gene_name', 'score', 'strand')

chrm = {}
for fin SeqIO.parse(open(fasta_file), 'fasta'):
    name=print(f.id)
    chrm [name] = str(f.seq)

The following parts of the source code are

 chrm=[ ]
    for seq_rin SeqIO.parse(open("athaliana/genome.fasta"), 'fasta':
        chrm [print(seq_r.id)] = print(seq_r.seq)

Why don't you change it like this and try it?

chrm={}###Initialize as dictionary instead of list
    for seq_rin SeqIO.parse(open("athaliana/genome.fasta"), 'fasta':
        chrm [str(seq_r.id)] = Use str() instead of str(seq_r.seq)####print()

There seems to be a lot of other problems, but I think there will be some changes first.


2022-11-26 09:47

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.