Question: Problem with p-values calculated by eBayes--corrected format
0
10.8 years ago by
Chen, Zhuoxun40 wrote:
Hi Bioconductors, I am really sorry about sending this email again. I didn't know that the table on my email will be lost and reformat. I corrected the format now. Thank you for your patience. ? I have a very weird problem with the statistics with my microarray data. I would like to ask for your help. I am running a microarray with 16 groups, 3 samples/group. On my genechip, every probe is spotted 2 times. By comparing two groups (let?s say A and B), I came across a gene that is very significant by running the following codes, with a p-value= 0.001669417 ---------------------------------------------------------------------- -------------------------------------- corfit <- duplicateCorrelation(Gvsn, design = design, ndups = 2, spacing = 1) fit <- ?lmFit(Gvsn, design = design, ndups = 2, spacing = 1, correlation = corfit$consensus) contrast.matrix <- makeContrasts(A-B, levels=design) fit2 <- contrasts.fit(fit, contrast.matrix) fit3 <- eBayes(fit2) ---------------------------------------------------------------------- -------------------------------------- Then, I looked at the raw data; copy and paste onto Excel and did a simple t-test ? A B 1 6.938162 7.093199 2 7.012382 8.05612 3 7.000305 6.999078 Avg 6.983616 7.382799 contrast 0.399182 p-value one tailed, unequal variance, t-test=0.179333 one tailed, equal variance, t-test=0.151844 ? ? The p-value is NOT even close to 0.05. Then I looked at the contrast of fit3$coefficient, it is 0.399182, which indicates the data input for the codes are correct. I don?t understand why it has such a huge difference on p-value between those two methods. Could somebody please help me with it? Thanks, Zhuoxun Chen ?? SessionInfo: ---------------------------------------------------------------------- --------------------------------------------------------------- R version 2.8.0 (2008-10-20) i386-pc-mingw32 ?locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 ?attached base packages: ?[1] grid????? splines?? tools???? stats???? graphics? grDevices utils???? datasets? methods?? base???? ?other attached packages: ?[1] gplots_2.6.0????????? gdata_2.4.2?????????? gtools_2.5.0????????? org.Hs.eg.db_2.2.6??? GSEABase_1.4.0?????? ?[6] PGSEA_1.10.0????????? Ruuid_1.20.0????????? Rgraphviz_1.20.2????? XML_1.94-0.1????????? bioDist_1.14.0?????? [11] GOstats_2.8.0???????? Category_2.8.0??????? genefilter_1.22.0???? survival_2.34-1?????? RBGL_1.18.0????????? [16] annotate_1.20.0?????? xtable_1.5-4????????? graph_1.20.0????????? eArrayCanary.db_1.0.0 annaffy_1.14.0?????? [21] KEGG.db_2.2.5???????? GO.db_2.2.5?????????? RSQLite_0.7-1???????? DBI_0.2-4???????????? AnnotationDbi_1.4.0? [26] statmod_1.3.6???????? RODBC_1.2-3?????????? RColorBrewer_1.0-2??? vsn_3.8.0???????????? affy_1.20.0????????? [31] Biobase_2.2.0???????? lattice_0.17-15?????? limma_2.16.3???????? ?loaded via a namespace (and not attached): [1] affyio_1.10.1??????? cluster_1.11.11????? preprocessCore_1.4.0 ---------------------------------------------------------------------- -------------------------------------------------------------- ?
microarray go probe • 542 views
modified 10.8 years ago by Kasper Daniel Hansen6.4k • written 10.8 years ago by Chen, Zhuoxun40
Answer: Problem with p-values calculated by eBayes--corrected format
0
10.8 years ago by
United States
Kasper Daniel Hansen6.4k wrote:
On Jan 9, 2009, at 9:21 , Chen, Zhuoxun wrote: > Hi Bioconductors, > > I am really sorry about sending this email again. I didn't know that > the table on my email will be lost and reformat. I corrected the > format now. Thank you for your patience. > > I have a very weird problem with the statistics with my microarray > data. I would like to ask for your help. > I am running a microarray with 16 groups, 3 samples/group. On my > genechip, every probe is spotted 2 times. > By comparing two groups (let?s say A and B), I came across a gene > that is very significant by running the following codes, with a p- > value= 0.001669417 > -------------------------------------------------------------------- ---------------------------------------- > corfit <- duplicateCorrelation(Gvsn, design = design, ndups = 2, > spacing = 1) > fit <- lmFit(Gvsn, design = design, ndups = 2, spacing = 1, > correlation = corfit$consensus) > contrast.matrix <- makeContrasts(A-B, levels=design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit3 <- eBayes(fit2) > -------------------------------------------------------------------- ---------------------------------------- > Then, I looked at the raw data; copy and paste onto Excel and did a > simple t-test > > A B > 1 6.938162 7.093199 > 2 7.012382 8.05612 > 3 7.000305 6.99907 This is 1 contrast with 3 samples in each group. But where is the data from the second probe? And what is the values of corfit? > > Avg 6.983616 7.382799 > contrast 0.399182 > > p-value > one tailed, unequal variance, t-test=0.179333 > one tailed, equal variance, t-test=0.151844 > > The p-value is NOT even close to 0.05. Then I looked at the contrast > of fit3$coefficient, it is 0.399182, which indicates the data input > for the codes are correct. > > I don?t understand why it has such a huge difference on p-value > between those two methods. Could somebody please help me with it? You are both allowing for correlation (which may or may not be sensible, that is hard to know unless you post more details) and you do an empirical Bayes correction. So you are pretty far from doing a standard t-test, and I see no big problem in method "A" giving a different answer from method "B" when the two methods are somewhat different.. Explaining in details what the difference is, is way beyond the scope of an email. A super short answer is that you combine information from having multiple spots measuing the same transcript and that you borrow information about the gene-level variance from looking at the behaviour of all genes. If you want more details, I suggest you read up on mixed models as well as empirical bayes correction. A good starting point will Gordon's sagmb article, cited in limma. Kasper