edgeR and tagwise dispersion: overcorrection for multiple tests?
Dear colleagues good morning - I am back to an old issue because I am now much more certain of what I see - and I begin to wonder wether this is due to biology rather than to analytical tools or strategies .. => Here is my sessionInfo() to begin with: R version 2.15.0 (2012-03-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] edgeR_2.6.7 limma_3.12.1 R.utils_1.12.1 R.oo_1.9.8 [5] R.methodsS3_1.4.2 => the experiment description: RNA from five samples and five controls, mice, homogenesous stimulus, brain tissue, SAGE with SOLiD with a good mapping in the UTR (checked also with genome-wide mapping). Tags have been selected with the following parameters: only in UTR; unique mapping; only one mismatch; begin with CATG, hence quite stringent. Hence tha samples are tagged {1 to 5}R for ths stimulus, {1 to 5} as the control => MDS plot and simple pairwise regression analysis of the tag counts between R,C,R vs R and C vs C reveals a clear division of the R samples in two groups: {1R, 3R} and {2R,4R,5R}. In addition, one C sample (3C) overlaps with two R samples and is removed from comparisons => three DEG calculations were performed: (A) all C vs all R; (B) all C minus 3 C vs 1R + 3R; (C) all C minus 3 C versus {2R,4R,5R} => tagwise dispersion; normalizatuion factor on the libraries calculated; filtering by minimal CPM in samples leaves between 6000 and 7000 genes for each comparison. => results which make me wonder about what is happening in the R (esperiment) samples: Comparison A (ALL vs ALL): TWO genes with significant FDR (BH corrected PValue I understand) Comparison B (ALL-3C vs 1R,3R): 2099 genes with significant FDR (!) Comparison C (ALL-3C vs 2R,4R,5R): 20 genes with significant FDR Now, excuse my ignorance, but this is a rather strong effect of the subsetting of one of the two comparison datasets on the FDR, which I did not found in many other similar analyses. In fact, when I first mailed the list, I was talking about 'overcorrection for multiple tests'. Is there any reasonable explanation (apart from {1R,3R} and {2R,4R,5R} being totally different samples, which I exclude) for this ? maybe a strong dependency between the genes involved in the response to the stimulus in the two R subgroups ? I include below the three MDS plots - thanks for any answer and again excuse me, maybe there is a trivial reason for this (such as number of samples..) but it is an unqiue situation between my many SAGE experiments analyzed with edgeR.. Kind regards, Alessandro -- -- Alessandro Guffanti - Head, Bioinformatics, Genomnia srl Via Nerviano, 31 - 20020 Lainate, Milano, Italy Ph: +39-0293305.702 Fax: +39-0293305.777 http://www.genomnia.com "When you're curious, you find lots of interesting things to do." (Walt Disney) ----------------------------------------------------------- Il Contenuto del presente messaggio potrebbe contenere informazioni confidenziali a favore dei soli destinatari del messaggio stesso. Qualora riceviate per errore questo messaggio siete pregati di cancellarlo dalla memoria del computer e di contattare i numeri sopra indicati. Ogni utilizzo o ritrasmissione dei contenuti del messaggio da parte di soggetti diversi dai destinatari ? da considerarsi vietato ed abusivo. The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. -----------------------------------------------------------
