minimal number of features tested in edgeR

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hi, I have a question regarding the minimal number of genes that we can test in an analysis with edgeR. Let me explain, in a study, edgeR have been used for testing the differential expression of three viruses between two conditions, without considering the counts on other features. That is, the data frame d$counts has only three lines (and 4 columns, as there is two replicates per condition). The library sizes, however, correspond to the total number of tags aligned both on these viruses and on the genes of the host organism. It seems inappropriate to me, as I don't understand how it would be possible to estimate reliably the dispersion from only three features, but maybe I'm wrong... May I have your opinion? For you, what is the minimal number of features that we can test using edgeR? Thank you by advance for your help. Best regards, St??phanie -- output of sessionInfo(): sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.2 limma_3.12.0 loaded via a namespace (and not attached): [1] annotate_1.34.0 AnnotationDbi_1.18.0 Biobase_2.16.0 [4] BiocGenerics_0.2.0 DBI_0.2-5 DESeq_1.8.2 [7] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 [10] IRanges_1.14.3 RColorBrewer_1.0-5 RSQLite_0.11.1 [13] splines_2.15.0 stats4_2.15.0 survival_2.36-14 [16] xtable_1.7-0 -- Sent via the guest posting facility at bioconductor.org.

Organism edgeR Organism edgeR • 812 views

ADD COMMENT • link updated 11.5 years ago by Mark Robinson ▴ 880 • written 11.5 years ago by Guest User ★ 13k

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi Stephanie, In theory, the minimal number of features you can test is 1. From your three rows (2 groups of 2 replicates), you have 6 degrees of freedom to estimate a common dispersion, as opposed to 2 with just one feature. This should "help" and I would consider that an improvement. Assuming some other things fall into place (e.g. it's reasonable to assume, at least to a first-order approximation, that features have the same dispersion), then this should be ok. Assuming they are representative, you could also consider other using other features (that you've presumably filtered?) for just the purpose of estimating dispersion and only test the 3 features of interest. This only helps if they are representative, but gets a bit hard to defend. Anyways, these are just opinions and possibilities. Best, Mark On 25.10.2012, at 11:23, Stephanie [guest] wrote: > > Hi, > > I have a question regarding the minimal number of genes that we can test in an analysis with edgeR. Let me explain, in a study, edgeR have been used for testing the differential expression of three viruses between two conditions, without considering the counts on other features. That is, the data frame d$counts has only three lines (and 4 columns, as there is two replicates per condition). The library sizes, however, correspond to the total number of tags aligned both on these viruses and on the genes of the host organism. It seems inappropriate to me, as I don't understand how it would be possible to estimate reliably the dispersion from only three features, but maybe I'm wrong... May I have your opinion? > For you, what is the minimal number of features that we can test using edgeR? > > Thank you by advance for your help. > > Best regards, > > St??phanie > > -- output of sessionInfo(): > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C > [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 > [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.6.2 limma_3.12.0 > > loaded via a namespace (and not attached): > [1] annotate_1.34.0 AnnotationDbi_1.18.0 Biobase_2.16.0 > [4] BiocGenerics_0.2.0 DBI_0.2-5 DESeq_1.8.2 > [7] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 > [10] IRanges_1.14.3 RColorBrewer_1.0-5 RSQLite_0.11.1 > [13] splines_2.15.0 stats4_2.15.0 survival_2.36-14 > [16] xtable_1.7-0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012

ADD COMMENT • link 11.5 years ago Mark Robinson ▴ 880

Login before adding your answer.