Questions about DESeq2 (new version and filtering low counts)
1
0
Entering edit mode
@amandinefournierchu-lyonfr-5921
Last seen 9.6 years ago
Dear Mike and others, Thank you Mike for your reply yesterday at my last question about PCA and transformed data. I have two other questions for you today ;-) The first question is about your new version of DESeq2 : - I found about 140 DEG when I used a previous version (v1.0.9) - Now I am using your new version v1.0.19 with exactly the same data and FDR threshold, and I find more DEG (about 360). What does explain this difference ? I thought it is the new functionality of count outlier detection. But when I turn this filtering off by using cooksCutoff=FALSE in nbinomWaldTest, I find ~ 370 DEG. What are the other differences between the two versions ? (only outlier detection is reported in the NEWS file) The second question is about filtering low counts : as I understand the vignette, the filtering is done after dispersion estimation. Then we just redo the Benjamini-Hochberg adjustement. I would like to know why it is better to keep the previous estimates ? Naively I would first have filtered genes and then have estimated dispersion without the low counts. But my understanding of statistics is poorer as yours, so could you explain me the rationale of this order in a few words ? Thanks a lot in advance ! Best regards, Amandine ----- Amandine Fournier Lyon Neuroscience Research Center & Lyon Civil Hospital (France)
DESeq2 DESeq2 • 1.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States
hi Amandine, On Fri, Oct 11, 2013 at 6:06 AM, <amandine.fournier@chu-lyon.fr> wrote: > > > The first question is about your new version of DESeq2 : > - I found about 140 DEG when I used a previous version (v1.0.9) > - Now I am using your new version v1.0.19 with exactly the same data > and FDR threshold, and I find more DEG (about 360). > What does explain this difference ? ​We had a lot of useful feedback within the first few weeks of the DESeq2 release​, and there was an issue with likely underestimation of the width of the prior on the dispersion for experiments with few degrees of freedom (number of samples minus the number of parameters to estimate). This seemed important enough to make a change in release, which might have caused the change for you. The exact number of DEG at a given FDR threshold unfortunately involves the tails of statistical distributions which can change with changes to the model parameters. But we have tried to avoid big changes since this one. > > > The second question is about filtering low counts : as I understand the > vignette, the filtering is done after dispersion estimation. Then we just > redo the Benjamini-Hochberg adjustement. > I would like to know why it is better to keep the previous estimates ? > Naively I would first have filtered genes and then have estimated > dispersion without the low counts. > ​The independent filtering on the mean of counts increases your power, so the absolute number of genes with an FDR less than a given threshold. But there is nothing wrong with the estimates of dispersion, log fold change or p-values from the low count genes. Including them should improve the estimation of the dispersion trend and prior for instance.​ Mike​ > > ----- > Amandine Fournier > Lyon Neuroscience Research Center > & Lyon Civil Hospital (France) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6