Answer: IRanges, GenomicRanges, GenomicFeatures?
On Sun, Oct 31, 2010 at 11:10 PM, Oleg Moskvin <moskvin at="" wisc.edu="">
> Hello list members,
> For a RNA-seq analysis, what would you suggest to use to convert
raw-sequence-based read coverage to annotated ORF-based coverage, if
the genome of interest is NOT supported in neither UCSC nor ENSEMBL,
which means that creation of a TranscriptDB object in a
straightforward way (I.e. according to the GenomicFeatures pipeline)
is impossible? What would you recommend to import a .gff file
(containing annotation of a particular genome, from GenBank) into
R/Bioconductor to eventually generate a gene-centric countTable
readable by packages like DESeq?
Assuming I've understood your question and how you have your data
available to you, here is one (maybe too simple) approach:
I think I'd parse the GFF into a GRangesList object (each item of the
list would be a GRanges object that stores the exon structure of your
transcripts (or genes) (which I'm assuming is what's in your GFF
If you had your rna-seq data in its own GRanges object, you could then
countOverlaps between your data and GRangesList-transcript info pretty
easily, which you could use to create your countTable.
Hope that helps,
ps - I think rtracklayer has some facilities to import GFF files,
which might be helpful to you.
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact