how to determine what kind of dispersion to use

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hello listserve, Most analysis performed in edgeR rightfully assumes that the data is not Poisson and in fact follows a NB distribution. This information is important when shrinking the dispersions, however I was wondering if there was a graph or function in edgeR that I could make/use to determine what kind of dispersion (i.e. common, moderated tagwise) I need to apply in the exactTest function? I'm not doing a typical RNA-seq experiment (i.e. RIP-seq) so I would like to test which parts of the classic workflow are appropriate for what I'm doing. For instance, can I still use the same equation to figure out the prior.df, or will that not apply to RIP-seq? After doing some comparisons between the different functions and arguments within them I'm wondering if RIP-seq may pose a problem when trying to use the moderated dispersion since the reads in the untagged IP will generally be less than the IP samples. Does that seem like a possibility? Also for the dispersion argument in the exactTest() function, are there good rules of thumb of when to use "common", "trended", "tagwise" or "auto"? Thanks -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] LSD_2.5 ellipse_0.3-8 schoolmath_0.4 [4] colorRamps_2.3 RColorBrewer_1.0-5 gtools_3.2.1 [7] MASS_7.3-29 edgeR_3.4.2 limma_3.18.12 loaded via a namespace (and not attached): [1] tools_3.0.2 -- Sent via the guest posting facility at bioconductor.org.

graph edgeR graph edgeR • 1.4k views

ADD COMMENT • link updated 10.3 years ago by Ryan C. Thompson ★ 7.9k • written 10.3 years ago by Guest User ★ 13k

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 9 months ago

Scripps Research, La Jolla, CA

Hello, In general, the advice is to use tagwise dispersions (which are, by default, moderated by the dispersion trend according to the auto-determined prior df) unless that is not an option. The most common reason that one could be unable to use tagwise dispersions would be if the dataset had no biological replicates. With regard to testing the edgeR dispersion assumptions in a RIP-seq context, I would think that the most important assumption to test is whether the RIP-seq samples have the same dispersions as the regular RNA-seq control samples. This is important to look at since edgeR assumes that the dispersion for a given gene does not vary across samples or conditions. (This is analagous to doing a t-test with the assumption of equal variance between groups.) I would recommend that you split your dataset into RIP-seq only and RNA-seq only and estimate dispersions on both. Then call plotBCV on both datasets and see if the common and trended dispersions look similar (you will probably want to use the same xlim and ylim arguments for both calls to plotBCV so the scales are comparable). However, even if this is not the case, Gordon has replied previously that if the dispersions are different in each group, the test will at worst be over-conservative, meaning that you might get some false negatives but you should not get extra false positives. However, I think the most important issue to look at with RIP-seq is probably the normalization factors. Generally, the assumption behind most normalization methods is that the "average" fold change should be zero, i.e. that most genes are not changing, and they differ in how they compute this average (trimmed mean, quantile, etc.). However, you need to think carefully about what assumption you can make about the relationship between a RIP-seq sample and the matched RNA-seq sample. Remember that in general, high-throughput sequencing is not capable of absolute quantitation, since most sequencing methods produce the same quantity of reads regardless of the size of the input. Therefore, you cannot sidestep the issue of normalization, and you have to make some *a priori* assumption about how to normalize the samples. I'm not sure what that would be for RIP-seq, and it may depend on what question you want to ask. For example, if you are only going to be testing differential RIP pulldowns relative to expression level, the normalization between RIP and RNA-seq is not as important, because it will cancel out anyway. Hopefully this clarifies some of the issues you need to contend with. In general, I expect that edgeR and similar methods are suitable for use in analyzing RIP-seq data. -Ryan On Mon Feb 17 17:53:40 2014, J [guest] wrote: > > Hello listserve, > > Most analysis performed in edgeR rightfully assumes that the data is not Poisson and in fact follows a NB distribution. This information is important when shrinking the dispersions, however I was wondering if there was a graph or function in edgeR that I could make/use to determine what kind of dispersion (i.e. common, moderated tagwise) I need to apply in the exactTest function? > > I'm not doing a typical RNA-seq experiment (i.e. RIP-seq) so I would like to test which parts of the classic workflow are appropriate for what I'm doing. For instance, can I still use the same equation to figure out the prior.df, or will that not apply to RIP-seq? > > After doing some comparisons between the different functions and arguments within them I'm wondering if RIP-seq may pose a problem when trying to use the moderated dispersion since the reads in the untagged IP will generally be less than the IP samples. Does that seem like a possibility? > > Also for the dispersion argument in the exactTest() function, are there good rules of thumb of when to use "common", "trended", "tagwise" or "auto"? > > Thanks > > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] LSD_2.5 ellipse_0.3-8 schoolmath_0.4 > [4] colorRamps_2.3 RColorBrewer_1.0-5 gtools_3.2.1 > [7] MASS_7.3-29 edgeR_3.4.2 limma_3.18.12 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.3 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.