Question

Summarization by gene or exon or transcript

0

Entering edit mode

Michael Stadler ▴ 350

@michael-stadler-5887

Last seen 2.6 years ago

Switzerland

Hi Reema, If I understand your question correctly, I think the answer is: It depends. Counting alignments per exon may allow you to pick up differential splicing or differential isoform usage unrelated to splicing (e.g. alternative promoter usage or alternative termination). However, robust estimation of exon levels will require much greater sequencing depth; assuming that a gene has on average about ten exons, then you would need about ten times more reads to get a similar magnitude of counts. If you don't have that data or are not interested in within-gene structural differences, gene level estimates may be the better choice. Of course, you could try out both and compare results. You can easily get such counts from a bam file using countOverlaps (see workflow at http://www.bioconductor.org/help/workflows/high-throughput- sequencing/), or with the QuasR package, getting gene and exon counts is as simple as: gn <- qCount(proj, txdb, reportLevel="gene") ex <- qCount(proj, txdb, reportLevel="exon") Michael On 31.10.2013 21:19, Steve Lianoglou wrote: > Hi, > > On Thu, Oct 31, 2013 at 1:04 PM, Reema Singh <reema28sep at="" gmail.com=""> wrote: >> Hi Steve, >> >> Thank you for your reply, >> >> I just want to known what is the idea feature for summarizing read count >> after alignment?. Gene,transcript,exons features from GFF/GTF files are >> frequently used . > > If you are asking what the "ideal" format for storing summarized read > counts is, I would have to say that in "the R world" that would be to > use a SummarizedExperiment (it is a class defined in the GenomicRanges > package). > > The rowData() of the SummarizedExperiment would contain the GRanges > (or GRangesList) that define where the counts in each row of your > assay are from, and the columns would tell you the counts for a given > experiment. > > You could store your relevant sample data in `colData`, ie. phenotypic > data for each experiment (column), like cell type, perturbation, > whatever. See ?SummarizedExperiment for more info. > > If you were asking something else -- sorry, I'm still not getting what > the question is and perhaps someone else can chime in. > > -steve >

QuasR QuasR • 1.7k views

ADD COMMENT • link 10.7 years ago Michael Stadler ▴ 350

score 0 · Answer 1 · 2013-11-01

Hi Reema, If I understand your question correctly, I think the answer is: It depends. Counting alignments per exon may allow you to pick up differential splicing or differential isoform usage unrelated to splicing (e.g. alternative promoter usage or alternative termination). However, robust estimation of exon levels will require much greater sequencing depth; assuming that a gene has on average about ten exons, then you would need about ten times more reads to get a similar magnitude of counts. If you don't have that data or are not interested in within-gene structural differences, gene level estimates may be the better choice. Of course, you could try out both and compare results. You can easily get such counts from a bam file using countOverlaps (see workflow at http://www.bioconductor.org/help/workflows/high-throughput- sequencing/), or with the QuasR package, getting gene and exon counts is as simple as: gn <- qCount(proj, txdb, reportLevel="gene") ex <- qCount(proj, txdb, reportLevel="exon") Michael On 31.10.2013 21:19, Steve Lianoglou wrote: > Hi, > > On Thu, Oct 31, 2013 at 1:04 PM, Reema Singh <reema28sep at="" gmail.com=""> wrote: >> Hi Steve, >> >> Thank you for your reply, >> >> I just want to known what is the idea feature for summarizing read count >> after alignment?. Gene,transcript,exons features from GFF/GTF files are >> frequently used . > > If you are asking what the "ideal" format for storing summarized read > counts is, I would have to say that in "the R world" that would be to > use a SummarizedExperiment (it is a class defined in the GenomicRanges > package). > > The rowData() of the SummarizedExperiment would contain the GRanges > (or GRangesList) that define where the counts in each row of your > assay are from, and the columns would tell you the counts for a given > experiment. > > You could store your relevant sample data in `colData`, ie. phenotypic > data for each experiment (column), like cell type, perturbation, > whatever. See ?SummarizedExperiment for more info. > > If you were asking something else -- sorry, I'm still not getting what > the question is and perhaps someone else can chime in. > > -steve >