Question

limma design and contrast matrix for paired experiment

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 15 minutes ago

WEHI, Melbourne, Australia

Dear David, No, your design matrix is incorrect because it ignores the pairing by sample in your experimental design. You can treat Sample as either a fixed or a random factor. The fixed approach is closely analogous to a classical paired t-test. For an experiment like yours, the fixed approach is explained in Section 3.5 of the edgeR User's Guide: http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst /doc/edgeRUsersGuide.pdf You can follow the same construction of the design matrix even though you are using limma. The random approach is a bit more aggressive. For an experiment like yours, the random approach is explained in Section 8.7 of the limma User's Guide: http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst /doc/usersguide.pdf I would probably recommend the first approach for your data. However the second approach is necessary if you want to test for differences between the two cell lines. Best wishes Gordon > Date: Fri, 11 Jan 2013 18:31:23 +0100 > From: David Westergaard <david at="" harsk.dk=""> > To: bioconductor at r-project.org > Subject: [BioC] limma design and contrast matrix for paired > experiment > > Hello, > > I am analysing microarray data performed on two cell cultures, in > which the gene expression were measured before (C) and after treatment > (T), so that the targets look like this: > > Cell-line Treatment Sample > Cell 1 C 1 > Cell 1 T 1 > Cell 2 C 2 > Cell 2 T 2 > Cell 1 C 3 > Cell 1 T 3 > Cell 2 C 4 > Cell 2 T 4 > Cell 1 C 5 > Cell 1 T 5 > Cell 2 C 6 > Cell 2 T 6 > > All experiments were performed on single-channel Agilent arrays, with > 4 samples pr. slide. I am interested in determining the differentially > expressed genes between Cell1 before and after treatment, as well as > Cell2 before and after treatment. This is the preliminary code: > > # Load and normalize data > RG <- read.maimages(targets$FileName,source="agilent.median",green.o nly=TRUE) > # Assume there is a col called FileName in the targets section > RG <- backgroundCorrect(RG, method="normexp", offset=16) > RGNorm <- normalizeBetweenArrays(RG, method="quantile") > RGNorm.ave <- avereps(RGNorm, ID=RGNorm$genes$ProbeName) > > # Create design > Pairing <- paste(rep(c('C1-','C1-','C2-','C2-'),3),c(1,1,2,2,1,1,2,2 ,1,1,2,2),rep(c('C','T'),6),sep='') > pair <- factor(Pairing,levels=unique(Pairing)) > design <- model.matrix( ~ 0 + pair ) > colnames(design) <- c('C1.C','C1.T','C2.C','C2.T') > > # Fit data > fit <- lmFit(RGNorm.ave, design=design) > > cont.matrix <- makeContrasts(C1 = C1.T-C1.C, C2=C2.T-C2.C, levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) > > > For the experiment, is the design/contrast matrix a proper choice to > answer the questions of 'Which genes are differentially expressed in > Cell 1' and 'Which genes are differentially expressed in Cell 2'? > Further, should I do any technical correction, such as > duplicateCorrelation or similar? The reason I am asking is that even > at p<=0.01 I am getting a very high number of differentially expressed > probes (4500ish for Cell 1, and 7500ish for Cell 2, respectively), and > I want to make sure this is biological significance, and not some > technical aspect I have missed. > > Thanks in advance. > > Best, > David > > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] statmod_1.4.16 limma_3.10.3 > > loaded via a namespace (and not attached): > [1] tcltk_2.14.1 tools_2.14.1 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

Microarray limma edgeR Microarray limma edgeR • 5.6k views

ADD COMMENT • link 11.3 years ago Gordon Smyth 50k

score 0 · Answer 1 · 2013-01-13

Dear Gordon, Thank you for your reply. Following the example from the edgeR users guide, I revised the model: CellLine <- factor(rep(c('C1','C1','C2','C2'),3),levels=c('C1','C2')) Sample <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6),levels=c(1,2,3,4,5,6)) Treatment <- factor(rep(c('C','T'),6),levels=c('C','T')) design <- model.matrix( ~ CellLine + CellLine:Sample + CellLine:Treatment ) I am however, not, a bit unsure of which coefficients I am interested in. It seems logical that I should be interested in 'CellLineC1:TreatmentT' and 'CellLineC2:TreatmentT' to answer the question of 'Which genes are differentially expressed in CellLine 1 due to treatment X' and 'Which genes are differentially expressed in CellLine 2 due to treatment X'. Is that correct? Best, David 2013/1/13 Gordon K Smyth <smyth at="" wehi.edu.au="">: > Dear David, > > No, your design matrix is incorrect because it ignores the pairing by sample > in your experimental design. > > You can treat Sample as either a fixed or a random factor. The fixed > approach is closely analogous to a classical paired t-test. For an > experiment like yours, the fixed approach is explained in Section 3.5 of the > edgeR User's Guide: > > http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/in st/doc/edgeRUsersGuide.pdf > > You can follow the same construction of the design matrix even though you > are using limma. > > The random approach is a bit more aggressive. For an experiment like yours, > the random approach is explained in Section 8.7 of the limma User's Guide: > > http://www.bioconductor.org/packages/release/bioc/vignettes/limma/in st/doc/usersguide.pdf > > I would probably recommend the first approach for your data. However the > second approach is necessary if you want to test for differences between the > two cell lines. > > Best wishes > Gordon > >> Date: Fri, 11 Jan 2013 18:31:23 +0100 >> From: David Westergaard <david at="" harsk.dk=""> >> To: bioconductor at r-project.org >> Subject: [BioC] limma design and contrast matrix for paired >> experiment >> >> Hello, >> >> I am analysing microarray data performed on two cell cultures, in >> which the gene expression were measured before (C) and after treatment >> (T), so that the targets look like this: >> >> Cell-line Treatment Sample >> Cell 1 C 1 >> Cell 1 T 1 >> Cell 2 C 2 >> Cell 2 T 2 >> Cell 1 C 3 >> Cell 1 T 3 >> Cell 2 C 4 >> Cell 2 T 4 >> Cell 1 C 5 >> Cell 1 T 5 >> Cell 2 C 6 >> Cell 2 T 6 >> >> All experiments were performed on single-channel Agilent arrays, with >> 4 samples pr. slide. I am interested in determining the differentially >> expressed genes between Cell1 before and after treatment, as well as >> Cell2 before and after treatment. This is the preliminary code: >> >> # Load and normalize data >> RG <- >> read.maimages(targets$FileName,source="agilent.median",green.only=T RUE) >> # Assume there is a col called FileName in the targets section >> RG <- backgroundCorrect(RG, method="normexp", offset=16) >> RGNorm <- normalizeBetweenArrays(RG, method="quantile") >> RGNorm.ave <- avereps(RGNorm, ID=RGNorm$genes$ProbeName) >> >> # Create design >> Pairing <- >> paste(rep(c('C1-','C1-','C2-','C2-'),3),c(1,1,2,2,1,1,2,2,1,1,2,2), rep(c('C','T'),6),sep='') >> pair <- factor(Pairing,levels=unique(Pairing)) >> design <- model.matrix( ~ 0 + pair ) >> colnames(design) <- c('C1.C','C1.T','C2.C','C2.T') >> >> # Fit data >> fit <- lmFit(RGNorm.ave, design=design) >> >> cont.matrix <- makeContrasts(C1 = C1.T-C1.C, C2=C2.T-C2.C, levels=design) >> fit2 <- contrasts.fit(fit, cont.matrix) >> fit2 <- eBayes(fit2) >> >> >> For the experiment, is the design/contrast matrix a proper choice to >> answer the questions of 'Which genes are differentially expressed in >> Cell 1' and 'Which genes are differentially expressed in Cell 2'? >> Further, should I do any technical correction, such as >> duplicateCorrelation or similar? The reason I am asking is that even >> at p<=0.01 I am getting a very high number of differentially expressed >> probes (4500ish for Cell 1, and 7500ish for Cell 2, respectively), and >> I want to make sure this is biological significance, and not some >> technical aspect I have missed. >> >> Thanks in advance. >> >> Best, >> David >> >> >>> sessionInfo() >> >> R version 2.14.1 (2011-12-22) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] statmod_1.4.16 limma_3.10.3 >> >> loaded via a namespace (and not attached): >> [1] tcltk_2.14.1 tools_2.14.1 >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

score 0 · Answer 2 · 2013-01-15

Dear David, You have not followed the advice in the edgeR user's guide that "The design matrix will be easier to construct in R if we re-number the patients within each disease group:" limma will still give you correct results, but you will have extra non-estimable coefficients in your linear model. Best wishes Gordon > Date: Mon, 14 Jan 2013 00:14:18 +0100 > From: David Westergaard <david at="" harsk.dk=""> > To: Gordon K Smyth <smyth at="" wehi.edu.au=""> > Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: Re: [BioC] limma design and contrast matrix for paired > experiment > > Dear Gordon, > > Thank you for your reply. Following the example from the edgeR users > guide, I revised the model: > > CellLine <- factor(rep(c('C1','C1','C2','C2'),3),levels=c('C1','C2')) > Sample <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6),levels=c(1,2,3,4,5,6)) > Treatment <- factor(rep(c('C','T'),6),levels=c('C','T')) > > design <- model.matrix( ~ CellLine + CellLine:Sample + CellLine:Treatment ) > > I am however, not, a bit unsure of which coefficients I am interested > in. It seems logical that I should be interested in > 'CellLineC1:TreatmentT' and 'CellLineC2:TreatmentT' to answer the > question of 'Which genes are differentially expressed in CellLine 1 > due to treatment X' and 'Which genes are differentially expressed in > CellLine 2 due to treatment X'. Is that correct? > > Best, > David > > 2013/1/13 Gordon K Smyth <smyth at="" wehi.edu.au="">: >> Dear David, >> >> No, your design matrix is incorrect because it ignores the pairing by sample >> in your experimental design. >> >> You can treat Sample as either a fixed or a random factor. The fixed >> approach is closely analogous to a classical paired t-test. For an >> experiment like yours, the fixed approach is explained in Section 3.5 of the >> edgeR User's Guide: >> >> http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/i nst/doc/edgeRUsersGuide.pdf >> >> You can follow the same construction of the design matrix even though you >> are using limma. >> >> The random approach is a bit more aggressive. For an experiment like yours, >> the random approach is explained in Section 8.7 of the limma User's Guide: >> >> http://www.bioconductor.org/packages/release/bioc/vignettes/limma/i nst/doc/usersguide.pdf >> >> I would probably recommend the first approach for your data. However the >> second approach is necessary if you want to test for differences between the >> two cell lines. >> >> Best wishes >> Gordon >> >>> Date: Fri, 11 Jan 2013 18:31:23 +0100 >>> From: David Westergaard <david at="" harsk.dk=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] limma design and contrast matrix for paired >>> experiment >>> >>> Hello, >>> >>> I am analysing microarray data performed on two cell cultures, in >>> which the gene expression were measured before (C) and after treatment >>> (T), so that the targets look like this: >>> >>> Cell-line Treatment Sample >>> Cell 1 C 1 >>> Cell 1 T 1 >>> Cell 2 C 2 >>> Cell 2 T 2 >>> Cell 1 C 3 >>> Cell 1 T 3 >>> Cell 2 C 4 >>> Cell 2 T 4 >>> Cell 1 C 5 >>> Cell 1 T 5 >>> Cell 2 C 6 >>> Cell 2 T 6 >>> >>> All experiments were performed on single-channel Agilent arrays, with >>> 4 samples pr. slide. I am interested in determining the differentially >>> expressed genes between Cell1 before and after treatment, as well as >>> Cell2 before and after treatment. This is the preliminary code: >>> >>> # Load and normalize data >>> RG <- >>> read.maimages(targets$FileName,source="agilent.median",green.only= TRUE) >>> # Assume there is a col called FileName in the targets section >>> RG <- backgroundCorrect(RG, method="normexp", offset=16) >>> RGNorm <- normalizeBetweenArrays(RG, method="quantile") >>> RGNorm.ave <- avereps(RGNorm, ID=RGNorm$genes$ProbeName) >>> >>> # Create design >>> Pairing <- >>> paste(rep(c('C1-','C1-','C2-','C2-'),3),c(1,1,2,2,1,1,2,2,1,1,2,2) ,rep(c('C','T'),6),sep='') >>> pair <- factor(Pairing,levels=unique(Pairing)) >>> design <- model.matrix( ~ 0 + pair ) >>> colnames(design) <- c('C1.C','C1.T','C2.C','C2.T') >>> >>> # Fit data >>> fit <- lmFit(RGNorm.ave, design=design) >>> >>> cont.matrix <- makeContrasts(C1 = C1.T-C1.C, C2=C2.T-C2.C, levels=design) >>> fit2 <- contrasts.fit(fit, cont.matrix) >>> fit2 <- eBayes(fit2) >>> >>> >>> For the experiment, is the design/contrast matrix a proper choice to >>> answer the questions of 'Which genes are differentially expressed in >>> Cell 1' and 'Which genes are differentially expressed in Cell 2'? >>> Further, should I do any technical correction, such as >>> duplicateCorrelation or similar? The reason I am asking is that even >>> at p<=0.01 I am getting a very high number of differentially expressed >>> probes (4500ish for Cell 1, and 7500ish for Cell 2, respectively), and >>> I want to make sure this is biological significance, and not some >>> technical aspect I have missed. >>> >>> Thanks in advance. >>> >>> Best, >>> David >>> >>> >>>> sessionInfo() >>> >>> R version 2.14.1 (2011-12-22) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] statmod_1.4.16 limma_3.10.3 >>> >>> loaded via a namespace (and not attached): >>> [1] tcltk_2.14.1 tools_2.14.1 >>> >> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}