Reading fasta file with multiple sequences
1
0
Entering edit mode
Riot • 0
@riot-12299
Last seen 7.2 years ago

Hello all,

I'm trying to read a fasta file that has over 5000 sequences.  The plan is to create a vector that calls out all the sequences, and those sequences I'll carry over to Bio Linux after I turn them into protein. I've done this, but with only one sequence at a time (that and I'm still new to RStudio).  Please see below for the codes I'm using... Can someone please tell me where I'm going wrong?

> contigs= read.fasta("contigs.fasta", seqtype = “DNA”)

> contigsdnaseq= contigs[[1]]   (I think this is the part where things go wrong. I'm not sure what code to use in order for the program to recognize the 5000+ sequences.)

> getTrans(contigsdnaseq, sens = "F", NAstring = "X", ambiguous = FALSE, frame = 0, numcode = 1)

> contigs_aa= getTrans(contigsdnaseq,sens = "F")    

> write.fasta(contigs_aa,contigs_aa,file.out = "contigs_aa.fasta")

> contigsaafile = read.fasta("contigs_aa.fasta", seqtype = "AA")

> getAnnot(contigsaafile)

 

multiple sequences bioconductor rstudio seqinr • 4.5k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States

seqinr is a CRAN package so you'd have to ask elsewhere for help.

In Bioconductor, you'd use

library(Biostrings)
dna = readDNAStringSet("your.fasta")
aa = translate(dna)
writeXStringSet(aa, "aa.fasta")

This would process all of your fasta sequences in one go, no need to iterate.

I'm not really sure what getAnnot() retrieves for amino acid sequences, it seems like it's just the identifier, names(aa). If more, one would use one of the Bioconductor 'org' packages (e.g., org.Hs.eg.db) or biomaRt; see the vignette AnnotationDbi: Introduction To Bioconductor Annotation Packages in the AnnotationDbi or biomaRt packages for more.

ADD COMMENT

Login before adding your answer.

Traffic: 907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6