Hi I want to use GenomicFeatures to extract upstream sequences from a genome using gene_IDs
My organism (Anopheles melas) has an assembled genome (fasta) and a gtf file.
It seems like these are all the files I should need, however I am not finding it easy to figure out on how to load these two data types for genomicFeatures to use? So far all I have figured out that I need to make a TxDb from gff; I have no idea what to do with the fasta file.
Could someone (BRIEFLY) tell me which packages and objects I need to use and how they all mesh together?
I have read over the manual as well as the Rsamtools manual but am more confused than anything.
Thank you
Just realizing now that
extractUpstreamSeqs()
doesn't take a DNAStringSet object, sorry. We should add this at some point. So for now you could either use:as Martin suggested, or convert your FASTA file to 2-bit format with something like:
Note that using
FaFile
on a compressed FASTA file has been reported to be unreliable on Windows in the past.Hope this helps,
H.