Reading fasta file with multiple sequences
Entering edit mode
Riot • 0
Last seen 6.8 years ago

Hello all,

I'm trying to read a fasta file that has over 5000 sequences.  The plan is to create a vector that calls out all the sequences, and those sequences I'll carry over to Bio Linux after I turn them into protein. I've done this, but with only one sequence at a time (that and I'm still new to RStudio).  Please see below for the codes I'm using... Can someone please tell me where I'm going wrong?

> contigs= read.fasta("contigs.fasta", seqtype = “DNA”)

> contigsdnaseq= contigs[[1]]   (I think this is the part where things go wrong. I'm not sure what code to use in order for the program to recognize the 5000+ sequences.)

> getTrans(contigsdnaseq, sens = "F", NAstring = "X", ambiguous = FALSE, frame = 0, numcode = 1)

> contigs_aa= getTrans(contigsdnaseq,sens = "F")    

> write.fasta(contigs_aa,contigs_aa,file.out = "contigs_aa.fasta")

> contigsaafile = read.fasta("contigs_aa.fasta", seqtype = "AA")

> getAnnot(contigsaafile)


multiple sequences bioconductor rstudio seqinr • 4.2k views
Entering edit mode
Last seen 17 days ago
United States

seqinr is a CRAN package so you'd have to ask elsewhere for help.

In Bioconductor, you'd use

dna = readDNAStringSet("your.fasta")
aa = translate(dna)
writeXStringSet(aa, "aa.fasta")

This would process all of your fasta sequences in one go, no need to iterate.

I'm not really sure what getAnnot() retrieves for amino acid sequences, it seems like it's just the identifier, names(aa). If more, one would use one of the Bioconductor 'org' packages (e.g., or biomaRt; see the vignette AnnotationDbi: Introduction To Bioconductor Annotation Packages in the AnnotationDbi or biomaRt packages for more.


Login before adding your answer.

Traffic: 433 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6