Summarization by gene or exon or transcript
1
0
Entering edit mode
Last seen 22 days ago
Switzerland
Hi Reema, If I understand your question correctly, I think the answer is: It depends. Counting alignments per exon may allow you to pick up differential splicing or differential isoform usage unrelated to splicing (e.g. alternative promoter usage or alternative termination). However, robust estimation of exon levels will require much greater sequencing depth; assuming that a gene has on average about ten exons, then you would need about ten times more reads to get a similar magnitude of counts. If you don't have that data or are not interested in within-gene structural differences, gene level estimates may be the better choice. Of course, you could try out both and compare results. You can easily get such counts from a bam file using countOverlaps (see workflow at http://www.bioconductor.org/help/workflows/high-throughput- sequencing/), or with the QuasR package, getting gene and exon counts is as simple as: gn <- qCount(proj, txdb, reportLevel="gene") ex <- qCount(proj, txdb, reportLevel="exon") Michael On 31.10.2013 21:19, Steve Lianoglou wrote: > Hi, > > On Thu, Oct 31, 2013 at 1:04 PM, Reema Singh <reema28sep at="" gmail.com=""> wrote: >> Hi Steve, >> >> Thank you for your reply, >> >> I just want to known what is the idea feature for summarizing read count >> after alignment?. Gene,transcript,exons features from GFF/GTF files are >> frequently used . > > If you are asking what the "ideal" format for storing summarized read > counts is, I would have to say that in "the R world" that would be to > use a SummarizedExperiment (it is a class defined in the GenomicRanges > package). > > The rowData() of the SummarizedExperiment would contain the GRanges > (or GRangesList) that define where the counts in each row of your > assay are from, and the columns would tell you the counts for a given > experiment. > > You could store your relevant sample data in colData, ie. phenotypic > data for each experiment (column), like cell type, perturbation, > whatever. See ?SummarizedExperiment for more info. > > If you were asking something else -- sorry, I'm still not getting what > the question is and perhaps someone else can chime in. > > -steve >
QuasR QuasR • 1.1k views
0
Entering edit mode
Last seen 22 days ago
Switzerland
Hi Reema, If I understand your question correctly, I think the answer is: It depends. Counting alignments per exon may allow you to pick up differential splicing or differential isoform usage unrelated to splicing (e.g. alternative promoter usage or alternative termination). However, robust estimation of exon levels will require much greater sequencing depth; assuming that a gene has on average about ten exons, then you would need about ten times more reads to get a similar magnitude of counts. If you don't have that data or are not interested in within-gene structural differences, gene level estimates may be the better choice. Of course, you could try out both and compare results. You can easily get such counts from a bam file using countOverlaps (see workflow at http://www.bioconductor.org/help/workflows/high-throughput- sequencing/), or with the QuasR package, getting gene and exon counts is as simple as: gn <- qCount(proj, txdb, reportLevel="gene") ex <- qCount(proj, txdb, reportLevel="exon") Michael On 31.10.2013 21:19, Steve Lianoglou wrote: > Hi, > > On Thu, Oct 31, 2013 at 1:04 PM, Reema Singh <reema28sep at="" gmail.com=""> wrote: >> Hi Steve, >> >> Thank you for your reply, >> >> I just want to known what is the idea feature for summarizing read count >> after alignment?. Gene,transcript,exons features from GFF/GTF files are >> frequently used . > > If you are asking what the "ideal" format for storing summarized read > counts is, I would have to say that in "the R world" that would be to > use a SummarizedExperiment (it is a class defined in the GenomicRanges > package). > > The rowData() of the SummarizedExperiment would contain the GRanges > (or GRangesList) that define where the counts in each row of your > assay are from, and the columns would tell you the counts for a given > experiment. > > You could store your relevant sample data in colData, ie. phenotypic > data for each experiment (column), like cell type, perturbation, > whatever. See ?SummarizedExperiment for more info. > > If you were asking something else -- sorry, I'm still not getting what > the question is and perhaps someone else can chime in. > > -steve >
0
Entering edit mode
Hi Michael, Yes, This what I wanted to known. Thank you.:) Kind Regards On Fri, Nov 1, 2013 at 2:02 PM, Michael Stadler <michael.stadler@fmi.ch>wrote: > Hi Reema, > > If I understand your question correctly, I think the answer is: It depends. > > Counting alignments per exon may allow you to pick up differential > splicing or differential isoform usage unrelated to splicing (e.g. > alternative promoter usage or alternative termination). > > However, robust estimation of exon levels will require much greater > sequencing depth; assuming that a gene has on average about ten exons, > then you would need about ten times more reads to get a similar > magnitude of counts. If you don't have that data or are not interested > in within-gene structural differences, gene level estimates may be the > better choice. > > Of course, you could try out both and compare results. You can easily > get such counts from a bam file using countOverlaps (see workflow at > http://www.bioconductor.org/help/workflows/high-throughput- sequencing/), > or with the QuasR package, getting gene and exon counts is as simple as: > > gn <- qCount(proj, txdb, reportLevel="gene") > ex <- qCount(proj, txdb, reportLevel="exon") > > Michael > > > > On 31.10.2013 21:19, Steve Lianoglou wrote: > > Hi, > > > > On Thu, Oct 31, 2013 at 1:04 PM, Reema Singh <reema28sep@gmail.com> > wrote: > >> Hi Steve, > >> > >> Thank you for your reply, > >> > >> I just want to known what is the idea feature for summarizing read count > >> after alignment?. Gene,transcript,exons features from GFF/GTF files are > >> frequently used . > > > > If you are asking what the "ideal" format for storing summarized read > > counts is, I would have to say that in "the R world" that would be to > > use a SummarizedExperiment (it is a class defined in the GenomicRanges > > package). > > > > The rowData() of the SummarizedExperiment would contain the GRanges > > (or GRangesList) that define where the counts in each row of your > > assay are from, and the columns would tell you the counts for a given > > experiment. > > > > You could store your relevant sample data in colData, ie. phenotypic > > data for each experiment (column), like cell type, perturbation, > > whatever. See ?SummarizedExperiment for more info. > > > > If you were asking something else -- sorry, I'm still not getting what > > the question is and perhaps someone else can chime in. > > > > -steve > > > > -- Reema Singh PhD Scholar Computational Biology and Bioinformatics School of Computational and Integrative Sciences Jawaharlal Nehru University New Delhi-110067 INDIA [[alternative HTML version deleted]]
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131102="" c48e3c08="" attachment-0001.pl="">