Understanding limma, fdr and topTable

0

Entering edit mode

Louisa A Rispoli/AS/EXP/UTIA ▴ 160

@louisa-a-rispoliasexputia-2436

Last seen 9.7 years ago

To all- I am a newbie trying to analyze microarray with minimal help and thought that I had figured out all. We have a simple task of comparing two groups with 8 replicates on the bovine genechip. I am attempting to understand the results that I obtain using limma and adjusting for fdr. I have tried reading the help vignettes on p.adjust and topTablle but no closer to understanding if the adjusted p-value represents the fdr or the q-value or something altogether different. Based on some recent work in another lab (by abstract) I know that I may need to use a less stringent fdr then 5% but I am unsure in limma how to change that value (or if it is feasible at all). Looking at the results that I obtained so far, I do not have any differentially expressed genes with the fdr adjustment. But I could be interpreting that wrong, since I was looking for values of the adjusted p to be lower then 0.05 and the smallest that I see is 0.7816, If someone could look at this and assist, any help, advice or reprimmand would be appreciated. Thanks Louisa "If we knew what we were doing, it wouldn't be called Research." - Albert Einstein Louisa Rispoli, Ph.D. Reproductive Physiology Department of Animal Science University of Tennessee, Knoxville A105 Johnson Animal Research and Teaching Unit 1750 Alcoa Highway Knoxville, TN 37920 phone:(865) 946-1874 fax:(865) 946-1010 email: lrispoli at utk.edu R version 2.7.1 (2008-06-23) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 > library(affycoretools) Loading required package: affy Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: affyio Loading required package: preprocessCore Loading required package: limma Loading required package: GOstats Loading required package: graph Loading required package: GO.db Loading required package: AnnotationDbi Loading required package: DBI Loading required package: RSQLite Loading required package: annotate Loading required package: xtable Loading required package: RBGL Loading required package: Category Loading required package: genefilter Loading required package: survival Loading required package: splines Loading required package: biomaRt Loading required package: RCurl Attaching package: 'biomaRt' The following object(s) are masked from package:annotate : getGO Loading required package: gcrma Loading required package: matchprobes Loading required package: annaffy Loading required package: KEGG.db Attaching package: 'annaffy' The following object(s) are masked from package:RCurl : getURL > pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE, row.names=1) > data <- ReadAffy(phenoData=pd) > pData(data) amp trt PolyC-1.CEL PolyA Ctrl PolyC-2.CEL PolyA Ctrl PolyC-3.CEL PolyA Ctrl PolyC-4.CEL PolyA Ctrl PolyC-5.CEL PolyA Ctrl PolyC-6.CEL PolyA Ctrl PolyC-7.CEL PolyA Ctrl PolyC-8.CEL PolyA Ctrl PolyHS-1.CEL PolyA HS PolyHS-2.CEL PolyA HS PolyHS-3.CEL PolyA HS PolyHS-4.CEL PolyA HS PolyHS-5.CEL PolyA HS PolyHS-6.CEL PolyA HS PolyHS-7.CEL PolyA HS PolyHS-8.CEL PolyA HS WTC-1.CEL WT Ctrl WTC-2.CEL WT Ctrl WTC-3.CEL WT Ctrl WTC-4.CEL WT Ctrl WTC-5.CEL WT Ctrl WTC-6.CEL WT Ctrl WTC-7.CEL WT Ctrl WTC-8.CEL WT Ctrl WTHS-1.CEL WT HS WTHS-2.CEL WT HS WTHS-3.CEL WT HS WTHS-4.CEL WT HS WTHS-5.CEL WT HS WTHS-6.CEL WT HS WTHS-7.CEL WT HS WTHS-8.CEL WT HS > Poly.rma <- rma(data[,1:16]) Background correcting Normalizing Calculating Expression > pData(Poly.rma) amp trt PolyC-1.CEL PolyA Ctrl PolyC-2.CEL PolyA Ctrl PolyC-3.CEL PolyA Ctrl PolyC-4.CEL PolyA Ctrl PolyC-5.CEL PolyA Ctrl PolyC-6.CEL PolyA Ctrl PolyC-7.CEL PolyA Ctrl PolyC-8.CEL PolyA Ctrl PolyHS-1.CEL PolyA HS PolyHS-2.CEL PolyA HS PolyHS-3.CEL PolyA HS PolyHS-4.CEL PolyA HS PolyHS-5.CEL PolyA HS PolyHS-6.CEL PolyA HS PolyHS-7.CEL PolyA HS PolyHS-8.CEL PolyA HS > treatment <-c("C", "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS") > design <- model.matrix(~factor(treatment)) > colnames(design) <- c("Ctrl", "CvsHS") > design Ctrl CvsHS 1 1 0 2 1 0 3 1 0 4 1 0 5 1 0 6 1 0 7 1 0 8 1 0 9 1 1 10 1 1 11 1 1 12 1 1 13 1 1 14 1 1 15 1 1 16 1 1 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$`factor(treatment)` [1] "contr.treatment" > options(digits=4) > topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr") ID logFC AveExpr t P.Value adj.P.Val B 16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816 -1.722 12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816 -1.790 2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816 -2.061 2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816 -2.142 10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816 -2.207 10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816 -2.312 8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816 -2.374 2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816 -2.399 5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816 -2.400 19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816 -2.419 3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816 -2.434 14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816 -2.454 10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816 -2.456 2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816 -2.464 5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816 -2.481 21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816 -2.493 800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816 -2.511 27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816 -2.516 22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816 -2.543 4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816 -2.557 13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816 -2.568 12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816 -2.591 20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816 -2.594 17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816 -2.603 23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816 -2.608 > sessionInfo() R version 2.7.1 (2008-06-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1 KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0 [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0 Category_2.6.0 genefilter_1.20.0 survival_2.34-1 [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 [19] DBI_0.2-4 graph_1.18.1 limma_2.14.5 affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 [25] Biobase_2.0.1 loaded via a namespace (and not attached): [1] cluster_1.11.11 XML_1.95-3

Microarray GO limma Microarray GO limma • 3.5k views

ADD COMMENT • link updated 15.8 years ago by Robson, Samuel ▴ 10 • written 15.8 years ago by Louisa A Rispoli/AS/EXP/UTIA ▴ 160

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 25 minutes ago

United States

Hi Louisa, Louisa A Rispoli/AS/EXP/UTIA wrote: > To all- > > I am a newbie trying to analyze microarray with minimal help and thought > that I had figured out all. We have a simple task of comparing two groups > with 8 replicates on the bovine genechip. I am attempting to understand the > results that I obtain using limma and adjusting for fdr. I have tried > reading the help vignettes on p.adjust and topTablle but no closer to > understanding if the adjusted p-value represents the fdr or the q-value or > something altogether different. Based on some recent work in another lab > (by abstract) I know that I may need to use a less stringent fdr then 5% > but I am unsure in limma how to change that value (or if it is feasible at > all). Looking at the results that I obtained so far, I do not have any > differentially expressed genes with the fdr adjustment. But I could be > interpreting that wrong, since I was looking for values of the adjusted p > to be lower then 0.05 and the smallest that I see is 0.7816, If someone > could look at this and assist, any help, advice or reprimmand would be > appreciated. The adjusted p-value does indeed represent the FDR, and it looks like you have little evidence for differential expression. However, this doesn't mean that there is no differential expression, just that you don't have much evidence to say there is. Given the data in hand, the table you show does give those probesets that appear to be the most consistently changed, and those are the most likely to validate if you were to choose to do so. Looking at the way you fit the model, I wonder what the end goal of the experiment might have been. When I see that sort of thing in the lab, almost always the client is looking for the interaction (e.g., genes that react differently to the treatment with HS depending on if they are wt or polyA). Is that not the case here? Regardless, I assume at the very least you might want to know the difference between the hs and control wt samples as well. If you fit a model that includes these samples and then compute the contrasts you might get better results (depending on the intra-group variability of the wt samples), as the denominator of your contrast will be based on all 32 samples rather than just the 16 you are using currently. Best, Jim > > > Thanks > > Louisa > > "If we knew what we were doing, it wouldn't be called Research." - Albert > Einstein > > Louisa Rispoli, Ph.D. Reproductive Physiology > Department of Animal Science > University of Tennessee, Knoxville > A105 Johnson Animal Research and Teaching Unit > 1750 Alcoa Highway > Knoxville, TN 37920 > phone:(865) 946-1874 > fax:(865) 946-1010 > email: lrispoli at utk.edu > > R version 2.7.1 (2008-06-23) > Copyright (C) 2008 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 >> library(affycoretools) > Loading required package: affy > Loading required package: Biobase > Loading required package: tools > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > Loading required package: affyio > Loading required package: preprocessCore > Loading required package: limma > Loading required package: GOstats > Loading required package: graph > Loading required package: GO.db > Loading required package: AnnotationDbi > Loading required package: DBI > Loading required package: RSQLite > Loading required package: annotate > Loading required package: xtable > Loading required package: RBGL > Loading required package: Category > Loading required package: genefilter > Loading required package: survival > Loading required package: splines > Loading required package: biomaRt > Loading required package: RCurl > > Attaching package: 'biomaRt' > > > The following object(s) are masked from package:annotate : > > getGO > > Loading required package: gcrma > Loading required package: matchprobes > Loading required package: annaffy > Loading required package: KEGG.db > > Attaching package: 'annaffy' > > > The following object(s) are masked from package:RCurl : > > getURL > >> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE, > row.names=1) >> data <- ReadAffy(phenoData=pd) >> pData(data) > amp trt > PolyC-1.CEL PolyA Ctrl > PolyC-2.CEL PolyA Ctrl > PolyC-3.CEL PolyA Ctrl > PolyC-4.CEL PolyA Ctrl > PolyC-5.CEL PolyA Ctrl > PolyC-6.CEL PolyA Ctrl > PolyC-7.CEL PolyA Ctrl > PolyC-8.CEL PolyA Ctrl > PolyHS-1.CEL PolyA HS > PolyHS-2.CEL PolyA HS > PolyHS-3.CEL PolyA HS > PolyHS-4.CEL PolyA HS > PolyHS-5.CEL PolyA HS > PolyHS-6.CEL PolyA HS > PolyHS-7.CEL PolyA HS > PolyHS-8.CEL PolyA HS > WTC-1.CEL WT Ctrl > WTC-2.CEL WT Ctrl > WTC-3.CEL WT Ctrl > WTC-4.CEL WT Ctrl > WTC-5.CEL WT Ctrl > WTC-6.CEL WT Ctrl > WTC-7.CEL WT Ctrl > WTC-8.CEL WT Ctrl > WTHS-1.CEL WT HS > WTHS-2.CEL WT HS > WTHS-3.CEL WT HS > WTHS-4.CEL WT HS > WTHS-5.CEL WT HS > WTHS-6.CEL WT HS > WTHS-7.CEL WT HS > WTHS-8.CEL WT HS >> Poly.rma <- rma(data[,1:16]) > Background correcting > Normalizing > Calculating Expression >> pData(Poly.rma) > amp trt > PolyC-1.CEL PolyA Ctrl > PolyC-2.CEL PolyA Ctrl > PolyC-3.CEL PolyA Ctrl > PolyC-4.CEL PolyA Ctrl > PolyC-5.CEL PolyA Ctrl > PolyC-6.CEL PolyA Ctrl > PolyC-7.CEL PolyA Ctrl > PolyC-8.CEL PolyA Ctrl > PolyHS-1.CEL PolyA HS > PolyHS-2.CEL PolyA HS > PolyHS-3.CEL PolyA HS > PolyHS-4.CEL PolyA HS > PolyHS-5.CEL PolyA HS > PolyHS-6.CEL PolyA HS > PolyHS-7.CEL PolyA HS > PolyHS-8.CEL PolyA HS >> treatment <-c("C", > "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS") >> design <- model.matrix(~factor(treatment)) >> colnames(design) <- c("Ctrl", "CvsHS") >> design > Ctrl CvsHS > 1 1 0 > 2 1 0 > 3 1 0 > 4 1 0 > 5 1 0 > 6 1 0 > 7 1 0 > 8 1 0 > 9 1 1 > 10 1 1 > 11 1 1 > 12 1 1 > 13 1 1 > 14 1 1 > 15 1 1 > 16 1 1 > attr(,"assign") > [1] 0 1 > attr(,"contrasts") > attr(,"contrasts")$`factor(treatment)` > [1] "contr.treatment" >> options(digits=4) >> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr") > ID logFC AveExpr t P.Value adj.P.Val B > 16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816 -1.722 > 12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816 -1.790 > 2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816 -2.061 > 2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816 -2.142 > 10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816 -2.207 > 10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816 -2.312 > 8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816 -2.374 > 2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816 -2.399 > 5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816 -2.400 > 19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816 -2.419 > 3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816 -2.434 > 14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816 -2.454 > 10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816 -2.456 > 2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816 -2.464 > 5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816 -2.481 > 21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816 -2.493 > 800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816 -2.511 > 27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816 -2.516 > 22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816 -2.543 > 4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816 -2.557 > 13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816 -2.568 > 12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816 -2.591 > 20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816 -2.594 > 17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816 -2.603 > 23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816 -2.608 > >> sessionInfo() > R version 2.7.1 (2008-06-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1 > KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0 > [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0 > Category_2.6.0 genefilter_1.20.0 survival_2.34-1 > [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 > GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 > [19] DBI_0.2-4 graph_1.18.1 limma_2.14.5 > affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 > [25] Biobase_2.0.1 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 XML_1.95-3 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 15.8 years ago James W. MacDonald 65k

0

Entering edit mode

Jim, Tom and Naomi and all- Thank you for your reponses. I am planning to look for genes (if any) that changed similaryly between the two groups depending on amplification procedure but was just getting starting looking at individually first. I appreciate the advice. Would pre-filtering to remove "absent" from all sample genes improve the situation or create a different problem, reading the archives there seems to be some disagreement if prefiltering before eBayes is appropriate. Thank you again I appreciate any help I get. Louisa "If we knew what we were doing, it wouldn't be called Research." - Albert Einstein Louisa Rispoli, Ph.D. Reproductive Physiology Department of Animal Science University of Tennessee, Knoxville A105 Johnson Animal Research and Teaching Unit 1750 Alcoa Highway Knoxville, TN 37920 phone:(865) 946-1874 fax:(865) 946-1010 email: lrispoli at utk.edu James MacDonald <jmacdon at="" med.umic="" h.edu=""> To Sent by: Louisa A Rispoli/AS/EXP/UTIA bioconductor-boun <larispoli at="" mail.ag.utk.edu=""> ces at stat.math.eth cc z.ch bioconductor at stat.math.ethz.ch Subject Re: [BioC] Understanding limma, fdr 07/07/2008 06:44 and topTable PM Hi Louisa, Louisa A Rispoli/AS/EXP/UTIA wrote: > To all- > > I am a newbie trying to analyze microarray with minimal help and thought > that I had figured out all. We have a simple task of comparing two groups > with 8 replicates on the bovine genechip. I am attempting to understand the > results that I obtain using limma and adjusting for fdr. I have tried > reading the help vignettes on p.adjust and topTablle but no closer to > understanding if the adjusted p-value represents the fdr or the q-value or > something altogether different. Based on some recent work in another lab > (by abstract) I know that I may need to use a less stringent fdr then 5% > but I am unsure in limma how to change that value (or if it is feasible at > all). Looking at the results that I obtained so far, I do not have any > differentially expressed genes with the fdr adjustment. But I could be > interpreting that wrong, since I was looking for values of the adjusted p > to be lower then 0.05 and the smallest that I see is 0.7816, If someone > could look at this and assist, any help, advice or reprimmand would be > appreciated. The adjusted p-value does indeed represent the FDR, and it looks like you have little evidence for differential expression. However, this doesn't mean that there is no differential expression, just that you don't have much evidence to say there is. Given the data in hand, the table you show does give those probesets that appear to be the most consistently changed, and those are the most likely to validate if you were to choose to do so. Looking at the way you fit the model, I wonder what the end goal of the experiment might have been. When I see that sort of thing in the lab, almost always the client is looking for the interaction (e.g., genes that react differently to the treatment with HS depending on if they are wt or polyA). Is that not the case here? Regardless, I assume at the very least you might want to know the difference between the hs and control wt samples as well. If you fit a model that includes these samples and then compute the contrasts you might get better results (depending on the intra-group variability of the wt samples), as the denominator of your contrast will be based on all 32 samples rather than just the 16 you are using currently. Best, Jim > > > Thanks > > Louisa > > "If we knew what we were doing, it wouldn't be called Research." - Albert > Einstein > > Louisa Rispoli, Ph.D. Reproductive Physiology > Department of Animal Science > University of Tennessee, Knoxville > A105 Johnson Animal Research and Teaching Unit > 1750 Alcoa Highway > Knoxville, TN 37920 > phone:(865) 946-1874 > fax:(865) 946-1010 > email: lrispoli at utk.edu > > R version 2.7.1 (2008-06-23) > Copyright (C) 2008 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 >> library(affycoretools) > Loading required package: affy > Loading required package: Biobase > Loading required package: tools > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > Loading required package: affyio > Loading required package: preprocessCore > Loading required package: limma > Loading required package: GOstats > Loading required package: graph > Loading required package: GO.db > Loading required package: AnnotationDbi > Loading required package: DBI > Loading required package: RSQLite > Loading required package: annotate > Loading required package: xtable > Loading required package: RBGL > Loading required package: Category > Loading required package: genefilter > Loading required package: survival > Loading required package: splines > Loading required package: biomaRt > Loading required package: RCurl > > Attaching package: 'biomaRt' > > > The following object(s) are masked from package:annotate : > > getGO > > Loading required package: gcrma > Loading required package: matchprobes > Loading required package: annaffy > Loading required package: KEGG.db > > Attaching package: 'annaffy' > > > The following object(s) are masked from package:RCurl : > > getURL > >> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE, > row.names=1) >> data <- ReadAffy(phenoData=pd) >> pData(data) > amp trt > PolyC-1.CEL PolyA Ctrl > PolyC-2.CEL PolyA Ctrl > PolyC-3.CEL PolyA Ctrl > PolyC-4.CEL PolyA Ctrl > PolyC-5.CEL PolyA Ctrl > PolyC-6.CEL PolyA Ctrl > PolyC-7.CEL PolyA Ctrl > PolyC-8.CEL PolyA Ctrl > PolyHS-1.CEL PolyA HS > PolyHS-2.CEL PolyA HS > PolyHS-3.CEL PolyA HS > PolyHS-4.CEL PolyA HS > PolyHS-5.CEL PolyA HS > PolyHS-6.CEL PolyA HS > PolyHS-7.CEL PolyA HS > PolyHS-8.CEL PolyA HS > WTC-1.CEL WT Ctrl > WTC-2.CEL WT Ctrl > WTC-3.CEL WT Ctrl > WTC-4.CEL WT Ctrl > WTC-5.CEL WT Ctrl > WTC-6.CEL WT Ctrl > WTC-7.CEL WT Ctrl > WTC-8.CEL WT Ctrl > WTHS-1.CEL WT HS > WTHS-2.CEL WT HS > WTHS-3.CEL WT HS > WTHS-4.CEL WT HS > WTHS-5.CEL WT HS > WTHS-6.CEL WT HS > WTHS-7.CEL WT HS > WTHS-8.CEL WT HS >> Poly.rma <- rma(data[,1:16]) > Background correcting > Normalizing > Calculating Expression >> pData(Poly.rma) > amp trt > PolyC-1.CEL PolyA Ctrl > PolyC-2.CEL PolyA Ctrl > PolyC-3.CEL PolyA Ctrl > PolyC-4.CEL PolyA Ctrl > PolyC-5.CEL PolyA Ctrl > PolyC-6.CEL PolyA Ctrl > PolyC-7.CEL PolyA Ctrl > PolyC-8.CEL PolyA Ctrl > PolyHS-1.CEL PolyA HS > PolyHS-2.CEL PolyA HS > PolyHS-3.CEL PolyA HS > PolyHS-4.CEL PolyA HS > PolyHS-5.CEL PolyA HS > PolyHS-6.CEL PolyA HS > PolyHS-7.CEL PolyA HS > PolyHS-8.CEL PolyA HS >> treatment <-c("C", > "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS") >> design <- model.matrix(~factor(treatment)) >> colnames(design) <- c("Ctrl", "CvsHS") >> design > Ctrl CvsHS > 1 1 0 > 2 1 0 > 3 1 0 > 4 1 0 > 5 1 0 > 6 1 0 > 7 1 0 > 8 1 0 > 9 1 1 > 10 1 1 > 11 1 1 > 12 1 1 > 13 1 1 > 14 1 1 > 15 1 1 > 16 1 1 > attr(,"assign") > [1] 0 1 > attr(,"contrasts") > attr(,"contrasts")$`factor(treatment)` > [1] "contr.treatment" >> options(digits=4) >> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr") > ID logFC AveExpr t P.Value adj.P.Val B > 16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816 -1.722 > 12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816 -1.790 > 2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816 -2.061 > 2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816 -2.142 > 10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816 -2.207 > 10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816 -2.312 > 8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816 -2.374 > 2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816 -2.399 > 5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816 -2.400 > 19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816 -2.419 > 3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816 -2.434 > 14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816 -2.454 > 10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816 -2.456 > 2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816 -2.464 > 5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816 -2.481 > 21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816 -2.493 > 800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816 -2.511 > 27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816 -2.516 > 22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816 -2.543 > 4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816 -2.557 > 13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816 -2.568 > 12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816 -2.591 > 20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816 -2.594 > 17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816 -2.603 > 23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816 -2.608 > >> sessionInfo() > R version 2.7.1 (2008-06-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1 > KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0 > [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0 > Category_2.6.0 genefilter_1.20.0 survival_2.34-1 > [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 > GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 > [19] DBI_0.2-4 graph_1.18.1 limma_2.14.5 > affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 > [25] Biobase_2.0.1 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 XML_1.95-3 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.8 years ago Louisa A Rispoli/AS/EXP/UTIA ▴ 160

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

Dear Louisa, FDR works well to "adjust" p-values when the distribution of unadjusted p-values has a sharp spike near 0 and is otherwise pretty flat. What often happens if either the data are noisy or the fold changes are relatively small is that the unadjusted p-values are elevated near 0 and taper almost linearly into the flat part near p=1. In that case, you will have evidence for differential expression, but any list of genes you produce will have a high percentage of false detections (approximately 78.16% in your case). As James said, it is still the case that the smallest p-values belong to genes for which you have the most evidence of differential expression. So those are the ones to try to validate. The fits from your ebayes step include the p-values for each contrast. e.g. output$p.value. To get a histogram for the p-values for the first contrast hist(output$p.value[,1]) Or, you can obtain the histogram for the p-value for the overall F-test output$F.p.value. Naomi At 04:35 PM 7/7/2008, Louisa A Rispoli/AS/EXP/UTIA wrote: >To all- > >I am a newbie trying to analyze microarray with minimal help and thought >that I had figured out all. We have a simple task of comparing two groups >with 8 replicates on the bovine genechip. I am attempting to understand the >results that I obtain using limma and adjusting for fdr. I have tried >reading the help vignettes on p.adjust and topTablle but no closer to >understanding if the adjusted p-value represents the fdr or the q-value or >something altogether different. Based on some recent work in another lab >(by abstract) I know that I may need to use a less stringent fdr then 5% >but I am unsure in limma how to change that value (or if it is feasible at >all). Looking at the results that I obtained so far, I do not have any >differentially expressed genes with the fdr adjustment. But I could be >interpreting that wrong, since I was looking for values of the adjusted p >to be lower then 0.05 and the smallest that I see is 0.7816, If someone >could look at this and assist, any help, advice or reprimmand would be >appreciated. > > >Thanks > >Louisa > >"If we knew what we were doing, it wouldn't be called Research." - Albert >Einstein > >Louisa Rispoli, Ph.D. Reproductive Physiology >Department of Animal Science >University of Tennessee, Knoxville >A105 Johnson Animal Research and Teaching Unit >1750 Alcoa Highway >Knoxville, TN 37920 >phone:(865) 946-1874 >fax:(865) 946-1010 >email: lrispoli at utk.edu > >R version 2.7.1 (2008-06-23) >Copyright (C) 2008 The R Foundation for Statistical Computingmja >ISBN 3-900051-07-0 > > library(affycoretools) >Loading required package: affy >Loading required package: Biobase >Loading required package: tools > >Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > >Loading required package: affyio >Loading required package: preprocessCore >Loading required package: limma >Loading required package: GOstats >Loading required package: graph >Loading required package: GO.db >Loading required package: AnnotationDbi >Loading required package: DBI >Loading required package: RSQLite >Loading required package: annotate >Loading required package: xtable >Loading required package: RBGL >Loading required package: Category >Loading required package: genefilter >Loading required package: survival >Loading required package: splines >Loading required package: biomaRt >Loading required package: RCurl > >Attaching package: 'biomaRt' > > > The following object(s) are masked from package:annotate : > > getGO > >Loading required package: gcrma >Loading required package: matchprobes >Loading required package: annaffy >Loading required package: KEGG.db > >Attaching package: 'annaffy' > > > The following object(s) are masked from package:RCurl : > > getURL > > > pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE, >row.names=1) > > data <- ReadAffy(phenoData=pd) > > pData(data) > amp trt >PolyC-1.CEL PolyA Ctrl >PolyC-2.CEL PolyA Ctrl >PolyC-3.CEL PolyA Ctrl >PolyC-4.CEL PolyA Ctrl >PolyC-5.CEL PolyA Ctrl >PolyC-6.CEL PolyA Ctrl >PolyC-7.CEL PolyA Ctrl >PolyC-8.CEL PolyA Ctrl >PolyHS-1.CEL PolyA HS >PolyHS-2.CEL PolyA HS >PolyHS-3.CEL PolyA HS >PolyHS-4.CEL PolyA HS >PolyHS-5.CEL PolyA HS >PolyHS-6.CEL PolyA HS >PolyHS-7.CEL PolyA HS >PolyHS-8.CEL PolyA HS >WTC-1.CEL WT Ctrl >WTC-2.CEL WT Ctrl >WTC-3.CEL WT Ctrl >WTC-4.CEL WT Ctrl >WTC-5.CEL WT Ctrl >WTC-6.CEL WT Ctrl >WTC-7.CEL WT Ctrl >WTC-8.CEL WT Ctrl >WTHS-1.CEL WT HS >WTHS-2.CEL WT HS >WTHS-3.CEL WT HS >WTHS-4.CEL WT HS >WTHS-5.CEL WT HS >WTHS-6.CEL WT HS >WTHS-7.CEL WT HS >WTHS-8.CEL WT HS > > Poly.rma <- rma(data[,1:16]) >Background correcting >Normalizing >Calculating Expression > > pData(Poly.rma) > amp trt >PolyC-1.CEL PolyA Ctrl >PolyC-2.CEL PolyA Ctrl >PolyC-3.CEL PolyA Ctrl >PolyC-4.CEL PolyA Ctrl >PolyC-5.CEL PolyA Ctrl >PolyC-6.CEL PolyA Ctrl >PolyC-7.CEL PolyA Ctrl >PolyC-8.CEL PolyA Ctrl >PolyHS-1.CEL PolyA HS >PolyHS-2.CEL PolyA HS >PolyHS-3.CEL PolyA HS >PolyHS-4.CEL PolyA HS >PolyHS-5.CEL PolyA HS >PolyHS-6.CEL PolyA HS >PolyHS-7.CEL PolyA HS >PolyHS-8.CEL PolyA HS > > treatment <-c("C", >"C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS") > > design <- model.matrix(~factor(treatment)) > > colnames(design) <- c("Ctrl", "CvsHS") > > design > Ctrl CvsHS >1 1 0 >2 1 0 >3 1 0 >4 1 0 >5 1 0 >6 1 0 >7 1 0 >8 1 0 >9 1 1 >10 1 1 >11 1 1 >12 1 1 >13 1 1 >14 1 1 >15 1 1 >16 1 1 >attr(,"assign") >[1] 0 1 >attr(,"contrasts") >attr(,"contrasts")$`factor(treatment)` >[1] "contr.treatment" > > options(digits=4) > > topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr") > ID logFC AveExpr t P.Value adj.P.Val B >16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816 -1.722 >12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816 -1.790 >2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816 -2.061 >2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816 -2.142 >10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816 -2.207 >10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816 -2.312 >8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816 -2.374 >2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816 -2.399 >5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816 -2.400 >19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816 -2.419 >3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816 -2.434 >14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816 -2.454 >10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816 -2.456 >2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816 -2.464 >5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816 -2.481 >21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816 -2.493 >800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816 -2.511 >27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816 -2.516 >22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816 -2.543 >4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816 -2.557 >13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816 -2.568 >12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816 -2.591 >20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816 -2.594 >17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816 -2.603 >23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816 -2.608 > > > sessionInfo() >R version 2.7.1 (2008-06-23) >i386-pc-mingw32 > >locale: >LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >States.1252;LC_MONETARY=English_United > >States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > >attached base packages: >[1] splines tools stats graphics grDevices utils datasets >methods base > >other attached packages: > [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1 >KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0 > [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0 >Category_2.6.0 genefilter_1.20.0 survival_2.34-1 >[13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 >GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 >[19] DBI_0.2-4 graph_1.18.1 limma_2.14.5 >affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 >[25] Biobase_2.0.1 > >loaded via a namespace (and not attached): >[1] cluster_1.11.11 XML_1.95-3 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 15.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

The current wisdom seems to be that removing very low intensity genes from the dataset improves power. This certainly makes sense, because there is definitely a detection limit. I do not have a reference, but if you search the archived emails, you will find a lot of discussion about this. That being said, you should only filter genes that are low in every sample. Otherwise, you could filter out the most interesting genes - the ones that are expressed only under certain conditions. --Naomi At 08:27 AM 7/8/2008, Louisa A Rispoli/AS/EXP/UTIA wrote: >Jim, Tom and Naomi and all- > >Thank you for your reponses. I am planning to look for genes (if any) that >changed similaryly between the two groups depending on amplification >procedure but was just getting starting looking at individually first. I >appreciate the advice. Would pre-filtering to remove "absent" from all >sample genes improve the situation or create a different problem, reading >the archives there seems to be some disagreement if prefiltering before >eBayes is appropriate. Thank you again I appreciate any help I get. > >Louisa > > > "If we knew what we were doing, it wouldn't be called Research." - Albert >Einstein > >Louisa Rispoli, Ph.D. Reproductive Physiology >Department of Animal Science >University of Tennessee, Knoxville >A105 Johnson Animal Research and Teaching Unit >1750 Alcoa Highway >Knoxville, TN 37920 >phone:(865) 946-1874 >fax:(865) 946-1010 >email: lrispoli at utk.edu > > > > > > James MacDonald > <jmacdon at="" med.umic=""> h.edu> To > Sent by: Louisa A Rispoli/AS/EXP/UTIA > bioconductor-boun <larispoli at="" mail.ag.utk.edu=""> > ces at stat.math.eth cc > z.ch bioconductor at stat.math.ethz.ch > Subject > Re: [BioC] Understanding limma, fdr > 07/07/2008 06:44 and topTable > PM > > > > > > > > > >Hi Louisa, > >Louisa A Rispoli/AS/EXP/UTIA wrote: > > To all- > > > > I am a newbie trying to analyze microarray with minimal help and thought > > that I had figured out all. We have a simple task of comparing two groups > > with 8 replicates on the bovine genechip. I am attempting to understand >the > > results that I obtain using limma and adjusting for fdr. I have tried > > reading the help vignettes on p.adjust and topTablle but no closer to > > understanding if the adjusted p-value represents the fdr or the q-value >or > > something altogether different. Based on some recent work in another lab > > (by abstract) I know that I may need to use a less stringent fdr then 5% > > but I am unsure in limma how to change that value (or if it is feasible >at > > all). Looking at the results that I obtained so far, I do not have any > > differentially expressed genes with the fdr adjustment. But I could be > > interpreting that wrong, since I was looking for values of the adjusted p > > to be lower then 0.05 and the smallest that I see is 0.7816, If someone > > could look at this and assist, any help, advice or reprimmand would be > > appreciated. > >The adjusted p-value does indeed represent the FDR, and it looks like >you have little evidence for differential expression. However, this >doesn't mean that there is no differential expression, just that you >don't have much evidence to say there is. > >Given the data in hand, the table you show does give those probesets >that appear to be the most consistently changed, and those are the most >likely to validate if you were to choose to do so. > >Looking at the way you fit the model, I wonder what the end goal of the >experiment might have been. When I see that sort of thing in the lab, >almost always the client is looking for the interaction (e.g., genes >that react differently to the treatment with HS depending on if they are >wt or polyA). Is that not the case here? > >Regardless, I assume at the very least you might want to know the >difference between the hs and control wt samples as well. If you fit a >model that includes these samples and then compute the contrasts you >might get better results (depending on the intra-group variability of >the wt samples), as the denominator of your contrast will be based on >all 32 samples rather than just the 16 you are using currently. > >Best, > >Jim > > > > > > > Thanks > > > > Louisa > > > > "If we knew what we were doing, it wouldn't be called Research." - Albert > > Einstein > > > > Louisa Rispoli, Ph.D. Reproductive Physiology > > Department of Animal Science > > University of Tennessee, Knoxville > > A105 Johnson Animal Research and Teaching Unit > > 1750 Alcoa Highway > > Knoxville, TN 37920 > > phone:(865) 946-1874 > > fax:(865) 946-1010 > > email: lrispoli at utk.edu > > > > R version 2.7.1 (2008-06-23) > > Copyright (C) 2008 The R Foundation for Statistical Computing > > ISBN 3-900051-07-0 > >> library(affycoretools) > > Loading required package: affy > > Loading required package: Biobase > > Loading required package: tools > > > > Welcome to Bioconductor > > > > Vignettes contain introductory material. To view, type > > 'openVignette()'. To cite Bioconductor, see > > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > > > Loading required package: affyio > > Loading required package: preprocessCore > > Loading required package: limma > > Loading required package: GOstats > > Loading required package: graph > > Loading required package: GO.db > > Loading required package: AnnotationDbi > > Loading required package: DBI > > Loading required package: RSQLite > > Loading required package: annotate > > Loading required package: xtable > > Loading required package: RBGL > > Loading required package: Category > > Loading required package: genefilter > > Loading required package: survival > > Loading required package: splines > > Loading required package: biomaRt > > Loading required package: RCurl > > > > Attaching package: 'biomaRt' > > > > > > The following object(s) are masked from package:annotate : > > > > getGO > > > > Loading required package: gcrma > > Loading required package: matchprobes > > Loading required package: annaffy > > Loading required package: KEGG.db > > > > Attaching package: 'annaffy' > > > > > > The following object(s) are masked from package:RCurl : > > > > getURL > > > >> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE, > > row.names=1) > >> data <- ReadAffy(phenoData=pd) > >> pData(data) > > amp trt > > PolyC-1.CEL PolyA Ctrl > > PolyC-2.CEL PolyA Ctrl > > PolyC-3.CEL PolyA Ctrl > > PolyC-4.CEL PolyA Ctrl > > PolyC-5.CEL PolyA Ctrl > > PolyC-6.CEL PolyA Ctrl > > PolyC-7.CEL PolyA Ctrl > > PolyC-8.CEL PolyA Ctrl > > PolyHS-1.CEL PolyA HS > > PolyHS-2.CEL PolyA HS > > PolyHS-3.CEL PolyA HS > > PolyHS-4.CEL PolyA HS > > PolyHS-5.CEL PolyA HS > > PolyHS-6.CEL PolyA HS > > PolyHS-7.CEL PolyA HS > > PolyHS-8.CEL PolyA HS > > WTC-1.CEL WT Ctrl > > WTC-2.CEL WT Ctrl > > WTC-3.CEL WT Ctrl > > WTC-4.CEL WT Ctrl > > WTC-5.CEL WT Ctrl > > WTC-6.CEL WT Ctrl > > WTC-7.CEL WT Ctrl > > WTC-8.CEL WT Ctrl > > WTHS-1.CEL WT HS > > WTHS-2.CEL WT HS > > WTHS-3.CEL WT HS > > WTHS-4.CEL WT HS > > WTHS-5.CEL WT HS > > WTHS-6.CEL WT HS > > WTHS-7.CEL WT HS > > WTHS-8.CEL WT HS > >> Poly.rma <- rma(data[,1:16]) > > Background correcting > > Normalizing > > Calculating Expression > >> pData(Poly.rma) > > amp trt > > PolyC-1.CEL PolyA Ctrl > > PolyC-2.CEL PolyA Ctrl > > PolyC-3.CEL PolyA Ctrl > > PolyC-4.CEL PolyA Ctrl > > PolyC-5.CEL PolyA Ctrl > > PolyC-6.CEL PolyA Ctrl > > PolyC-7.CEL PolyA Ctrl > > PolyC-8.CEL PolyA Ctrl > > PolyHS-1.CEL PolyA HS > > PolyHS-2.CEL PolyA HS > > PolyHS-3.CEL PolyA HS > > PolyHS-4.CEL PolyA HS > > PolyHS-5.CEL PolyA HS > > PolyHS-6.CEL PolyA HS > > PolyHS-7.CEL PolyA HS > > PolyHS-8.CEL PolyA HS > >> treatment <-c("C", > > "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS") > >> design <- model.matrix(~factor(treatment)) > >> colnames(design) <- c("Ctrl", "CvsHS") > >> design > > Ctrl CvsHS > > 1 1 0 > > 2 1 0 > > 3 1 0 > > 4 1 0 > > 5 1 0 > > 6 1 0 > > 7 1 0 > > 8 1 0 > > 9 1 1 > > 10 1 1 > > 11 1 1 > > 12 1 1 > > 13 1 1 > > 14 1 1 > > 15 1 1 > > 16 1 1 > > attr(,"assign") > > [1] 0 1 > > attr(,"contrasts") > > attr(,"contrasts")$`factor(treatment)` > > [1] "contr.treatment" > >> options(digits=4) > >> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr") > > ID logFC AveExpr t P.Value adj.P.Val >B > > 16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816 >-1.722 > > 12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816 >-1.790 > > 2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816 >-2.061 > > 2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816 >-2.142 > > 10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816 >-2.207 > > 10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816 >-2.312 > > 8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816 >-2.374 > > 2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816 >-2.399 > > 5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816 >-2.400 > > 19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816 >-2.419 > > 3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816 >-2.434 > > 14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816 >-2.454 > > 10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816 >-2.456 > > 2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816 >-2.464 > > 5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816 >-2.481 > > 21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816 >-2.493 > > 800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816 >-2.511 > > 27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816 >-2.516 > > 22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816 >-2.543 > > 4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816 >-2.557 > > 13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816 >-2.568 > > 12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816 >-2.591 > > 20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816 >-2.594 > > 17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816 >-2.603 > > 23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816 >-2.608 > > > >> sessionInfo() > > R version 2.7.1 (2008-06-23) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > > States.1252;LC_MONETARY=English_United > > > > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] splines tools stats graphics grDevices utils datasets > > methods base > > > > other attached packages: > > [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1 > > KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0 > > [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0 > > Category_2.6.0 genefilter_1.20.0 survival_2.34-1 > > [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 > > GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 > > [19] DBI_0.2-4 graph_1.18.1 limma_2.14.5 > > affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 > > [25] Biobase_2.0.1 > > > > loaded via a namespace (and not attached): > > [1] cluster_1.11.11 XML_1.95-3 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > >-- >James W. MacDonald, MS >Biostatistician >UMCCC cDNA and Affymetrix Core >University of Michigan >1500 E Medical Center Drive >7410 CCGC >Ann Arbor MI 48109 >734-647-5623 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 15.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

I would add that removing those genes that are unchanged in any sample will also help reduce the multiplicity problem. Regardless of the expression level, those genes that never change expression are uninteresting by default, so e.g., if beta-actin is highly expressed at the same level in all samples we don't really care to test for differential expression for that gene since it apparently is not differentially expressed. Best, Jim Naomi Altman wrote: > The current wisdom seems to be that removing very low intensity genes > from the dataset improves power. This certainly makes sense, because > there is definitely a detection limit. I do not have a reference, but > if you search the archived emails, you will find a lot of discussion > about this. > > That being said, you should only filter genes that are low in every > sample. Otherwise, you could filter out the most interesting genes - > the ones that are expressed only under certain conditions. > > --Naomi > > > > At 08:27 AM 7/8/2008, Louisa A Rispoli/AS/EXP/UTIA wrote: >> Jim, Tom and Naomi and all- >> >> Thank you for your reponses. I am planning to look for genes (if any) >> that >> changed similaryly between the two groups depending on amplification >> procedure but was just getting starting looking at individually first. I >> appreciate the advice. Would pre-filtering to remove "absent" from all >> sample genes improve the situation or create a different problem, reading >> the archives there seems to be some disagreement if prefiltering before >> eBayes is appropriate. Thank you again I appreciate any help I get. >> >> Louisa >> >> >> "If we knew what we were doing, it wouldn't be called Research." - >> Albert >> Einstein >> >> Louisa Rispoli, Ph.D. Reproductive Physiology >> Department of Animal Science >> University of Tennessee, Knoxville >> A105 Johnson Animal Research and Teaching Unit >> 1750 Alcoa Highway >> Knoxville, TN 37920 >> phone:(865) 946-1874 >> fax:(865) 946-1010 >> email: lrispoli at utk.edu >> >> >> >> >> >> James MacDonald >> <jmacdon at="" med.umic="">> >> h.edu> To >> Sent by: Louisa A Rispoli/AS/EXP/UTIA >> bioconductor-boun <larispoli at="" mail.ag.utk.edu=""> >> >> ces at stat.math.eth cc >> z.ch bioconductor at stat.math.ethz.ch >> >> Subject >> Re: [BioC] Understanding limma, >> fdr >> 07/07/2008 06:44 and topTable >> PM >> >> >> >> >> >> >> >> >> >> Hi Louisa, >> >> Louisa A Rispoli/AS/EXP/UTIA wrote: >> > To all- >> > >> > I am a newbie trying to analyze microarray with minimal help and >> thought >> > that I had figured out all. We have a simple task of comparing two >> groups >> > with 8 replicates on the bovine genechip. I am attempting to understand >> the >> > results that I obtain using limma and adjusting for fdr. I have tried >> > reading the help vignettes on p.adjust and topTablle but no closer to >> > understanding if the adjusted p-value represents the fdr or the q-value >> or >> > something altogether different. Based on some recent work in another >> lab >> > (by abstract) I know that I may need to use a less stringent fdr >> then 5% >> > but I am unsure in limma how to change that value (or if it is feasible >> at >> > all). Looking at the results that I obtained so far, I do not have any >> > differentially expressed genes with the fdr adjustment. But I could be >> > interpreting that wrong, since I was looking for values of the >> adjusted p >> > to be lower then 0.05 and the smallest that I see is 0.7816, If someone >> > could look at this and assist, any help, advice or reprimmand would be >> > appreciated. >> >> The adjusted p-value does indeed represent the FDR, and it looks like >> you have little evidence for differential expression. However, this >> doesn't mean that there is no differential expression, just that you >> don't have much evidence to say there is. >> >> Given the data in hand, the table you show does give those probesets >> that appear to be the most consistently changed, and those are the most >> likely to validate if you were to choose to do so. >> >> Looking at the way you fit the model, I wonder what the end goal of the >> experiment might have been. When I see that sort of thing in the lab, >> almost always the client is looking for the interaction (e.g., genes >> that react differently to the treatment with HS depending on if they are >> wt or polyA). Is that not the case here? >> >> Regardless, I assume at the very least you might want to know the >> difference between the hs and control wt samples as well. If you fit a >> model that includes these samples and then compute the contrasts you >> might get better results (depending on the intra-group variability of >> the wt samples), as the denominator of your contrast will be based on >> all 32 samples rather than just the 16 you are using currently. >> >> Best, >> >> Jim >> >> > >> > >> > Thanks >> > >> > Louisa >> > >> > "If we knew what we were doing, it wouldn't be called Research." - >> Albert >> > Einstein >> > >> > Louisa Rispoli, Ph.D. Reproductive Physiology >> > Department of Animal Science >> > University of Tennessee, Knoxville >> > A105 Johnson Animal Research and Teaching Unit >> > 1750 Alcoa Highway >> > Knoxville, TN 37920 >> > phone:(865) 946-1874 >> > fax:(865) 946-1010 >> > email: lrispoli at utk.edu >> > >> > R version 2.7.1 (2008-06-23) >> > Copyright (C) 2008 The R Foundation for Statistical Computing >> > ISBN 3-900051-07-0 >> >> library(affycoretools) >> > Loading required package: affy >> > Loading required package: Biobase >> > Loading required package: tools >> > >> > Welcome to Bioconductor >> > >> > Vignettes contain introductory material. To view, type >> > 'openVignette()'. To cite Bioconductor, see >> > 'citation("Biobase")' and for packages 'citation(pkgname)'. >> > >> > Loading required package: affyio >> > Loading required package: preprocessCore >> > Loading required package: limma >> > Loading required package: GOstats >> > Loading required package: graph >> > Loading required package: GO.db >> > Loading required package: AnnotationDbi >> > Loading required package: DBI >> > Loading required package: RSQLite >> > Loading required package: annotate >> > Loading required package: xtable >> > Loading required package: RBGL >> > Loading required package: Category >> > Loading required package: genefilter >> > Loading required package: survival >> > Loading required package: splines >> > Loading required package: biomaRt >> > Loading required package: RCurl >> > >> > Attaching package: 'biomaRt' >> > >> > >> > The following object(s) are masked from package:annotate : >> > >> > getGO >> > >> > Loading required package: gcrma >> > Loading required package: matchprobes >> > Loading required package: annaffy >> > Loading required package: KEGG.db >> > >> > Attaching package: 'annaffy' >> > >> > >> > The following object(s) are masked from package:RCurl : >> > >> > getURL >> > >> >> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE, >> > row.names=1) >> >> data <- ReadAffy(phenoData=pd) >> >> pData(data) >> > amp trt >> > PolyC-1.CEL PolyA Ctrl >> > PolyC-2.CEL PolyA Ctrl >> > PolyC-3.CEL PolyA Ctrl >> > PolyC-4.CEL PolyA Ctrl >> > PolyC-5.CEL PolyA Ctrl >> > PolyC-6.CEL PolyA Ctrl >> > PolyC-7.CEL PolyA Ctrl >> > PolyC-8.CEL PolyA Ctrl >> > PolyHS-1.CEL PolyA HS >> > PolyHS-2.CEL PolyA HS >> > PolyHS-3.CEL PolyA HS >> > PolyHS-4.CEL PolyA HS >> > PolyHS-5.CEL PolyA HS >> > PolyHS-6.CEL PolyA HS >> > PolyHS-7.CEL PolyA HS >> > PolyHS-8.CEL PolyA HS >> > WTC-1.CEL WT Ctrl >> > WTC-2.CEL WT Ctrl >> > WTC-3.CEL WT Ctrl >> > WTC-4.CEL WT Ctrl >> > WTC-5.CEL WT Ctrl >> > WTC-6.CEL WT Ctrl >> > WTC-7.CEL WT Ctrl >> > WTC-8.CEL WT Ctrl >> > WTHS-1.CEL WT HS >> > WTHS-2.CEL WT HS >> > WTHS-3.CEL WT HS >> > WTHS-4.CEL WT HS >> > WTHS-5.CEL WT HS >> > WTHS-6.CEL WT HS >> > WTHS-7.CEL WT HS >> > WTHS-8.CEL WT HS >> >> Poly.rma <- rma(data[,1:16]) >> > Background correcting >> > Normalizing >> > Calculating Expression >> >> pData(Poly.rma) >> > amp trt >> > PolyC-1.CEL PolyA Ctrl >> > PolyC-2.CEL PolyA Ctrl >> > PolyC-3.CEL PolyA Ctrl >> > PolyC-4.CEL PolyA Ctrl >> > PolyC-5.CEL PolyA Ctrl >> > PolyC-6.CEL PolyA Ctrl >> > PolyC-7.CEL PolyA Ctrl >> > PolyC-8.CEL PolyA Ctrl >> > PolyHS-1.CEL PolyA HS >> > PolyHS-2.CEL PolyA HS >> > PolyHS-3.CEL PolyA HS >> > PolyHS-4.CEL PolyA HS >> > PolyHS-5.CEL PolyA HS >> > PolyHS-6.CEL PolyA HS >> > PolyHS-7.CEL PolyA HS >> > PolyHS-8.CEL PolyA HS >> >> treatment <-c("C", >> > "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS") >> >> design <- model.matrix(~factor(treatment)) >> >> colnames(design) <- c("Ctrl", "CvsHS") >> >> design >> > Ctrl CvsHS >> > 1 1 0 >> > 2 1 0 >> > 3 1 0 >> > 4 1 0 >> > 5 1 0 >> > 6 1 0 >> > 7 1 0 >> > 8 1 0 >> > 9 1 1 >> > 10 1 1 >> > 11 1 1 >> > 12 1 1 >> > 13 1 1 >> > 14 1 1 >> > 15 1 1 >> > 16 1 1 >> > attr(,"assign") >> > [1] 0 1 >> > attr(,"contrasts") >> > attr(,"contrasts")$`factor(treatment)` >> > [1] "contr.treatment" >> >> options(digits=4) >> >> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr") >> > ID logFC AveExpr t P.Value adj.P.Val >> B >> > 16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816 >> -1.722 >> > 12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816 >> -1.790 >> > 2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816 >> -2.061 >> > 2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816 >> -2.142 >> > 10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816 >> -2.207 >> > 10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816 >> -2.312 >> > 8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816 >> -2.374 >> > 2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816 >> -2.399 >> > 5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816 >> -2.400 >> > 19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816 >> -2.419 >> > 3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816 >> -2.434 >> > 14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816 >> -2.454 >> > 10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816 >> -2.456 >> > 2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816 >> -2.464 >> > 5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816 >> -2.481 >> > 21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816 >> -2.493 >> > 800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816 >> -2.511 >> > 27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816 >> -2.516 >> > 22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816 >> -2.543 >> > 4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816 >> -2.557 >> > 13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816 >> -2.568 >> > 12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816 >> -2.591 >> > 20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816 >> -2.594 >> > 17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816 >> -2.603 >> > 23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816 >> -2.608 >> > >> >> sessionInfo() >> > R version 2.7.1 (2008-06-23) >> > i386-pc-mingw32 >> > >> > locale: >> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> > States.1252;LC_MONETARY=English_United >> > >> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> > >> > attached base packages: >> > [1] splines tools stats graphics grDevices utils >> datasets >> > methods base >> > >> > other attached packages: >> > [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1 >> > KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0 >> > [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0 >> > Category_2.6.0 genefilter_1.20.0 survival_2.34-1 >> > [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2 >> > GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 >> > [19] DBI_0.2-4 graph_1.18.1 limma_2.14.5 >> > affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 >> > [25] Biobase_2.0.1 >> > >> > loaded via a namespace (and not attached): >> > [1] cluster_1.11.11 XML_1.95-3 >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, MS >> Biostatistician >> UMCCC cDNA and Affymetrix Core >> University of Michigan >> 1500 E Medical Center Drive >> 7410 CCGC >> Ann Arbor MI 48109 >> 734-647-5623 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 15.8 years ago James W. MacDonald 65k

0

Entering edit mode

> I would add that removing those genes that are unchanged in any sample > will also help reduce the multiplicity problem. Regardless of the > expression level, those genes that never change expression are > uninteresting by default, so e.g., if beta-actin is highly expressed at > the same level in all samples we don't really care to test for > differential expression for that gene since it apparently is not > differentially expressed. This doesn't make sense. How can I choose to filter out "unchanged" probesets without fitting a model of some sort, and making a probabilistic decision for each probeset about whether it is "unchanged" or not. Every probeset (save those below the detection limit) will exhibit variance (though the variance may be below the precision of the instrument to measure), right? You're not suggesting that there are some probesets with zero variance? It seems to me that this approach leads to a false/erroneous reduction in the multiplicity problem, as you've just moved the hypothesis testing into a separate "phase" of the analysis. And it also would mess up pooled variance estimates such as those used in eBayes-based methods (e.g. limma). So, while I might be willing to filter out known "dead" probesets (that I never see above detection threshold over many hundreds of assays), I'm in the camp that the statistics are corrupt if you filter without regard to its affect on multiplicity corrections. As an aside, it should be possible to fit some of the models using truncated/censored distributions (wherein the statistical model gets to know that there were X number of probesets with values < threshold, but doesn't pretend that those values are real). That's an idea for the model developers to ponder ... -Aaron

ADD REPLY • link 15.8 years ago aaron.j.mackey@gsk.com ▴ 70

0

Entering edit mode

aaron.j.mackey at gsk.com wrote: >> I would add that removing those genes that are unchanged in any sample >> will also help reduce the multiplicity problem. Regardless of the >> expression level, those genes that never change expression are >> uninteresting by default, so e.g., if beta-actin is highly expressed at >> the same level in all samples we don't really care to test for >> differential expression for that gene since it apparently is not >> differentially expressed. > > This doesn't make sense. How can I choose to filter out "unchanged" > probesets without fitting a model of some sort, and making a probabilistic > decision for each probeset about whether it is "unchanged" or not. Every > probeset (save those below the detection limit) will exhibit variance > (though the variance may be below the precision of the instrument to > measure), right? You're not suggesting that there are some probesets with > zero variance? I don't really understand your point here. First, I never suggested fitting a model of any kind to select unchanged probesets, unless computing the variance is some kind of newfangled model fitting that I don't understand. In addition, are you really claiming that a probeset that is 'below the detection limit' (whatever that means) will _not_ have any variance? I would say that doesn't make any sense. All expression values will exhibit some level of variance regardless of whether you might think they are 'below the detection limit'. > > It seems to me that this approach leads to a false/erroneous reduction in > the multiplicity problem, as you've just moved the hypothesis testing into > a separate "phase" of the analysis. And it also would mess up pooled > variance estimates such as those used in eBayes-based methods (e.g. > limma). So yes, if I had actually advocated fitting a model you would be correct. However, simply deciding to exclude probesets that have a low variance will not affect the hypothesis testing. Although it could have an effect on the computation of the pooled variance estimates if you remove too many probesets as the pooled variance might increase. But the same can be said for any filtering method. If you remove a lot of probesets of low intensity (say all those with an absent call) then you very well could be removing probesets with a higher variance and then mess up the estimate of the pooled variance as well. As with all statistics there are tradeoffs and assumptions that are being made regardless of what you do. > > So, while I might be willing to filter out known "dead" probesets (that I > never see above detection threshold over many hundreds of assays), I'm in > the camp that the statistics are corrupt if you filter without regard to > its affect on multiplicity corrections. I don't really know what you mean by 'detection limit'. Has someone published something somewhere that says a probeset with an expression value below X means the mRNA for that gene has not been detected? I am not sure how the filtering step will affect multiplicity corrections. If one were to use a two-stage modeling procedure that you seem to think I am advocating then of course the p-values themselves would be questionable as assumptions would have been violated. But I don't know where multiplicity correction comes into the equation. But personally I am not that much of a purist about multiplicity anyway. I have been known to select probesets based on adjusted p-value and a fold change criterion as well, which completely invalidates the meaning of the adjusted p-values. Best, Jim > > As an aside, it should be possible to fit some of the models using > truncated/censored distributions (wherein the statistical model gets to > know that there were X number of probesets with values < threshold, but > doesn't pretend that those values are real). That's an idea for the model > developers to ponder ... > > -Aaron > -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 15.8 years ago James W. MacDonald 65k

0

Entering edit mode

James MacDonald wrote: > > > aaron.j.mackey at gsk.com wrote: >>> I would add that removing those genes that are unchanged in any >>> sample will also help reduce the multiplicity problem. Regardless of >>> the expression level, those genes that never change expression are >>> uninteresting by default, so e.g., if beta-actin is highly expressed >>> at the same level in all samples we don't really care to test for >>> differential expression for that gene since it apparently is not >>> differentially expressed. >> >> This doesn't make sense. How can I choose to filter out "unchanged" >> probesets without fitting a model of some sort, and making a >> probabilistic decision for each probeset about whether it is >> "unchanged" or not. Every probeset (save those below the detection >> limit) will exhibit variance (though the variance may be below the >> precision of the instrument to measure), right? You're not suggesting >> that there are some probesets with zero variance? > > I don't really understand your point here. First, I never suggested > fitting a model of any kind to select unchanged probesets, unless > computing the variance is some kind of newfangled model fitting that I > don't understand. When you compute the variance and decide to eliminate probes from consideration if the variance is below some value, you are performing a statistical test. Implicitly, you are assuming a vague sort of model that suggests that "if the variance is small enough, then the gene cannot be differentially expressed". This does not mean that this particular statistical test is either efficient or powerful. But it is, nevertheless, a test of differential expression, and so should not really be ignored when accounting for multiple testing. -- Kevin > > In addition, are you really claiming that a probeset that is 'below the > detection limit' (whatever that means) will _not_ have any variance? I > would say that doesn't make any sense. All expression values will > exhibit some level of variance regardless of whether you might think > they are 'below the detection limit'. > >> >> It seems to me that this approach leads to a false/erroneous reduction >> in the multiplicity problem, as you've just moved the hypothesis >> testing into a separate "phase" of the analysis. And it also would >> mess up pooled variance estimates such as those used in eBayes- based >> methods (e.g. limma). > > So yes, if I had actually advocated fitting a model you would be > correct. However, simply deciding to exclude probesets that have a low > variance will not affect the hypothesis testing. Although it could have > an effect on the computation of the pooled variance estimates if you > remove too many probesets as the pooled variance might increase. > > But the same can be said for any filtering method. If you remove a lot > of probesets of low intensity (say all those with an absent call) then > you very well could be removing probesets with a higher variance and > then mess up the estimate of the pooled variance as well. > > As with all statistics there are tradeoffs and assumptions that are > being made regardless of what you do. > >> >> So, while I might be willing to filter out known "dead" probesets >> (that I never see above detection threshold over many hundreds of >> assays), I'm in the camp that the statistics are corrupt if you filter >> without regard to its affect on multiplicity corrections. > > I don't really know what you mean by 'detection limit'. Has someone > published something somewhere that says a probeset with an expression > value below X means the mRNA for that gene has not been detected? > > I am not sure how the filtering step will affect multiplicity > corrections. If one were to use a two-stage modeling procedure that you > seem to think I am advocating then of course the p-values themselves > would be questionable as assumptions would have been violated. But I > don't know where multiplicity correction comes into the equation. > > But personally I am not that much of a purist about multiplicity anyway. > I have been known to select probesets based on adjusted p-value and a > fold change criterion as well, which completely invalidates the meaning > of the adjusted p-values. > > Best, > > Jim > > > >> >> As an aside, it should be possible to fit some of the models using >> truncated/censored distributions (wherein the statistical model gets >> to know that there were X number of probesets with values < threshold, >> but doesn't pretend that those values are real). That's an idea for >> the model developers to ponder ... >> >> -Aaron >> >

ADD REPLY • link 15.8 years ago Kevin R. Coombes ▴ 70

0

Entering edit mode

Kevin, thanks for the clarification, that was, in fact, exactly what I meant. -Aaron > James MacDonald wrote: > > aaron.j.mackey at gsk.com wrote: > >> This doesn't make sense. How can I choose to filter out "unchanged" > >> probesets without fitting a model of some sort, and making a > >> probabilistic decision for each probeset about whether it is > >> "unchanged" or not. Every probeset (save those below the detection > >> limit) will exhibit variance (though the variance may be below the > >> precision of the instrument to measure), right? You're not suggesting > >> that there are some probesets with zero variance? > > > > I don't really understand your point here. First, I never suggested > > fitting a model of any kind to select unchanged probesets, unless > > computing the variance is some kind of newfangled model fitting that I > > don't understand. > > When you compute the variance and decide to eliminate probes from > consideration if the variance is below some value, you are performing a > statistical test. Implicitly, you are assuming a vague sort of model > that suggests that "if the variance is small enough, then the gene > cannot be differentially expressed". This does not mean that this > particular statistical test is either efficient or powerful. But it is, > nevertheless, a test of differential expression, and so should not > really be ignored when accounting for multiple testing. > > -- Kevin

ADD REPLY • link 15.8 years ago aaron.j.mackey@gsk.com ▴ 70

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 4 weeks ago

United States

HI all, My views on filter based on variance run more towards Aaron's. We had another conversation about this recently on the list (https://stat.ethz.ch/pipermail/bioconductor/2008-June/022941.html). Robert Gentleman asked me if I had any evidence to support my suspicions; I've seen some cases where even throwing "Absent" probe sets affects the eBayes calculation of the p-value so that they are larger than when all probe sets are used. I tried a couple of years ago to really investigate this, but I couldn't find a good way to adequately generate microarray data with known numbers of DE genes. Does anybody know of a good microarray data simulator that gives data that looks like real data? If so, I could do some playing around with simulations and present a poster at BioC that we can hack apart :) Cheers, Jenny At 10:17 AM 7/8/2008, aaron.j.mackey at gsk.com wrote: > > I would add that removing those genes that are unchanged in any sample > > will also help reduce the multiplicity problem. Regardless of the > > expression level, those genes that never change expression are > > uninteresting by default, so e.g., if beta-actin is highly expressed at > > the same level in all samples we don't really care to test for > > differential expression for that gene since it apparently is not > > differentially expressed. > >This doesn't make sense. How can I choose to filter out "unchanged" >probesets without fitting a model of some sort, and making a probabilistic >decision for each probeset about whether it is "unchanged" or not. Every >probeset (save those below the detection limit) will exhibit variance >(though the variance may be below the precision of the instrument to >measure), right? You're not suggesting that there are some probesets with >zero variance? > >It seems to me that this approach leads to a false/erroneous reduction in >the multiplicity problem, as you've just moved the hypothesis testing into >a separate "phase" of the analysis. And it also would mess up pooled >variance estimates such as those used in eBayes-based methods (e.g. >limma). > >So, while I might be willing to filter out known "dead" probesets (that I >never see above detection threshold over many hundreds of assays), I'm in >the camp that the statistics are corrupt if you filter without regard to >its affect on multiplicity corrections. > >As an aside, it should be possible to fit some of the models using >truncated/censored distributions (wherein the statistical model gets to >know that there were X number of probesets with values < threshold, but >doesn't pretend that those values are real). That's an idea for the model >developers to ponder ... > >-Aaron > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD COMMENT • link 15.8 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

I agree with Aaron. --Naomi At 11:17 AM 7/8/2008, aaron.j.mackey at gsk.com wrote: > > I would add that removing those genes that are unchanged in any sample > > will also help reduce the multiplicity problem. Regardless of the > > expression level, those genes that never change expression are > > uninteresting by default, so e.g., if beta-actin is highly expressed at > > the same level in all samples we don't really care to test for > > differential expression for that gene since it apparently is not > > differentially expressed. > >This doesn't make sense. How can I choose to filter out "unchanged" >probesets without fitting a model of some sort, and making a probabilistic >decision for each probeset about whether it is "unchanged" or not. Every >probeset (save those below the detection limit) will exhibit variance >(though the variance may be below the precision of the instrument to >measure), right? You're not suggesting that there are some probesets with >zero variance? > >It seems to me that this approach leads to a false/erroneous reduction in >the multiplicity problem, as you've just moved the hypothesis testing into >a separate "phase" of the analysis. And it also would mess up pooled >variance estimates such as those used in eBayes-based methods (e.g. >limma). > >So, while I might be willing to filter out known "dead" probesets (that I >never see above detection threshold over many hundreds of assays), I'm in >the camp that the statistics are corrupt if you filter without regard to >its affect on multiplicity corrections. > >As an aside, it should be possible to fit some of the models using >truncated/censored distributions (wherein the statistical model gets to >know that there were X number of probesets with values < threshold, but >doesn't pretend that those values are real). That's an idea for the model >developers to ponder ... > >-Aaron > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 15.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Robson, Samuel ▴ 10

@robson-samuel-2899

Last seen 9.7 years ago

I have used SIMAGE before to create simulated data sets. Their model contains many factors that create noise similar to that seen probe-to- probe, chip-to-chip and experiment-to-experiment to make it comparable to a 'real' experiment. http://bioinformatics.biol.rug.nl/websoftware/simage/simage_start.php The paper is SIMAGE: simulation of DNA-microarray gene expression data, Albers C.J., BMC Bioinformatics, 2006, 13;7:205. Cheers, Sam Sam Robson MOAC Doctoral Training Center Coventry House University of Warwick Coventry CV4 7AL -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of Jenny Drnevich Sent: Tue 7/8/2008 6:17 PM To: aaron.j.mackey@gsk.com; James MacDonald Cc: Naomi Altman; Louisa A Rispoli/AS/EXP/UTIA; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Understanding limma, fdr and topTable HI all, My views on filter based on variance run more towards Aaron's. We had another conversation about this recently on the list (https://stat.ethz.ch/pipermail/bioconductor/2008-June/022941.html). Robert Gentleman asked me if I had any evidence to support my suspicions; I've seen some cases where even throwing "Absent" probe sets affects the eBayes calculation of the p-value so that they are larger than when all probe sets are used. I tried a couple of years ago to really investigate this, but I couldn't find a good way to adequately generate microarray data with known numbers of DE genes. Does anybody know of a good microarray data simulator that gives data that looks like real data? If so, I could do some playing around with simulations and present a poster at BioC that we can hack apart :) Cheers, Jenny At 10:17 AM 7/8/2008, aaron.j.mackey@gsk.com wrote: > > I would add that removing those genes that are unchanged in any sample > > will also help reduce the multiplicity problem. Regardless of the > > expression level, those genes that never change expression are > > uninteresting by default, so e.g., if beta-actin is highly expressed at > > the same level in all samples we don't really care to test for > > differential expression for that gene since it apparently is not > > differentially expressed. > >This doesn't make sense. How can I choose to filter out "unchanged" >probesets without fitting a model of some sort, and making a probabilistic >decision for each probeset about whether it is "unchanged" or not. Every >probeset (save those below the detection limit) will exhibit variance >(though the variance may be below the precision of the instrument to >measure), right? You're not suggesting that there are some probesets with >zero variance? > >It seems to me that this approach leads to a false/erroneous reduction in >the multiplicity problem, as you've just moved the hypothesis testing into >a separate "phase" of the analysis. And it also would mess up pooled >variance estimates such as those used in eBayes-based methods (e.g. >limma). > >So, while I might be willing to filter out known "dead" probesets (that I >never see above detection threshold over many hundreds of assays), I'm in >the camp that the statistics are corrupt if you filter without regard to >its affect on multiplicity corrections. > >As an aside, it should be possible to fit some of the models using >truncated/censored distributions (wherein the statistical model gets to >know that there were X number of probesets with values < threshold, but >doesn't pretend that those values are real). That's an idea for the model >developers to ponder ... > >-Aaron > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich@illinois.edu _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 15.8 years ago Robson, Samuel ▴ 10

Login before adding your answer.