Questions about using Limma

0

Entering edit mode

wei tie ▴ 30

@wei-tie-4062

Last seen 9.6 years ago

Hello , I am currently working on 30 Affymetrix arrays for a time course experiment using Limma package. I have two groups (A and B) with samples taken at 0hr, 1hr, 6hr, 12hrs and 24hrs. The following is how my targets file looks like: FileName Targets Files 1-3 A.0hr Files 4-6 A.1hr Files 7-9 A.6hr Files 10-12 A.12hr Files 13-15 A.24hr Files 16-18 B.0hr Files 19-21 B.1hr Files 22-24 B.6hr Files 25-27 B.12hr Files 28-30 B.24hr If I apply the lmFit() function to all the 30 arrays as follows: sample<-c("A0","A1","A6","A12","A24","B0","B1","B6","B12","B24") all<-factor(rep(sample,each=3),levels=sample) design<-model.matrix(~0+all) colnames(design)<-sample fit1 <- lmFit(eset,design) contrast<- makeContrasts(B1-B0,A1-A0, (B1-B0)-(A1-A0),levels=design) fit2<- contrasts.fit(fit1, contrast) fit2<- eBayes(fit2) results1<-decideTests(fit2,method="global",adjust.method="BH",p.value= 0.05,lfc=1) summary(results1) B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) -1 1941 1133 1155 0 26434 27285 28635 1 1879 1836 464 the result is that 1619(1155+464) genes changed at the first 1hr differently between group A and group B. If I apply the lmFit() function only to the 12 samples taken at 0hr and 1hr , and use the following target file: FileName Targets Files 1-3 A.0hr Files 4-6 A.1hr Files 7-9 B.0hr Files 10-12 B.1hr then use the following script: sample1<-c("A0","A1","B0","B1") part<-factor(rep(sample,each=3),levels=sample) design<-model.matrix(~0+part) colnames(design)<-sample1 fit3 <- lmFit(eset1,design) contrast<- makeContrasts(B1-B0,A1-A0, (B1-B0)-(A1-A0),levels=design) fit4<- contrasts.fit(fit3, contrast) fit4<- eBayes(fit4) results2<-decideTests(fit4,method="global",adjust.method="BH",p.value= 0.05,lfc=1) summary(results2) B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) -1 1501 825 563 0 27478 28194 29483 1 1275 1235 208 the genes changing differently between group A and group B in terms of interaction decreased to 771(563+208). If I use the method="separate" in decideTests() Results3<-decideTests(fit4,method="separate",adjust.method="BH",p.valu e=0.05,lfc=1) summary(results3) B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) -1 1763 876 67 0 26915 28058 30169 1 1576 1320 18 the genes which changed differently between group A and group B are much fewer (67+18=85). Why do I get different lists of differentially expressed genes when I use the three approaches? Which result should I choose? I would really appreciate if someone can give some help. Thank you, Regards, Wei Tie

limma limma • 1.1k views

ADD COMMENT • link updated 14.0 years ago by James W. MacDonald 65k • written 14.0 years ago by wei tie ▴ 30

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 11 hours ago

United States

Hi Wei, wei tie wrote: > Hello , > > I am currently working on 30 Affymetrix arrays for a time course > experiment using Limma package. I have two groups (A and B) with > samples taken at 0hr, 1hr, 6hr, 12hrs and 24hrs. > > The following is how my targets file looks like: > FileName Targets > Files 1-3 A.0hr > Files 4-6 A.1hr > Files 7-9 A.6hr > Files 10-12 A.12hr > Files 13-15 A.24hr > Files 16-18 B.0hr > Files 19-21 B.1hr > Files 22-24 B.6hr > Files 25-27 B.12hr > Files 28-30 B.24hr > > If I apply the lmFit() function to all the 30 arrays as follows: > > sample<-c("A0","A1","A6","A12","A24","B0","B1","B6","B12","B24") > all<-factor(rep(sample,each=3),levels=sample) > design<-model.matrix(~0+all) > colnames(design)<-sample > fit1 <- lmFit(eset,design) > contrast<- makeContrasts(B1-B0,A1-A0, (B1-B0)-(A1-A0),levels=design) > fit2<- contrasts.fit(fit1, contrast) > fit2<- eBayes(fit2) > results1<-decideTests(fit2,method="global",adjust.method="BH",p.valu e=0.05,lfc=1) > summary(results1) > B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) > -1 1941 1133 1155 > 0 26434 27285 28635 > 1 1879 1836 464 > > the result is that 1619(1155+464) genes changed at the first 1hr > differently between group A and group B. > > If I apply the lmFit() function only to the 12 samples taken at 0hr > and 1hr , and use the following target file: > FileName Targets > Files 1-3 A.0hr > Files 4-6 A.1hr > Files 7-9 B.0hr > Files 10-12 B.1hr > > then use the following script: > sample1<-c("A0","A1","B0","B1") > part<-factor(rep(sample,each=3),levels=sample) > design<-model.matrix(~0+part) > colnames(design)<-sample1 > fit3 <- lmFit(eset1,design) > contrast<- makeContrasts(B1-B0,A1-A0, (B1-B0)-(A1-A0),levels=design) > fit4<- contrasts.fit(fit3, contrast) > fit4<- eBayes(fit4) > results2<-decideTests(fit4,method="global",adjust.method="BH",p.valu e=0.05,lfc=1) > summary(results2) > B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) > -1 1501 825 563 > 0 27478 28194 29483 > 1 1275 1235 208 > > the genes changing differently between group A and group B in terms of > interaction decreased to 771(563+208). > > If I use the method="separate" in decideTests() > Results3<-decideTests(fit4,method="separate",adjust.method="BH",p.va lue=0.05,lfc=1) > summary(results3) > B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) > -1 1763 876 67 > 0 26915 28058 30169 > 1 1576 1320 18 > > the genes which changed differently between group A and group B are > much fewer (67+18=85). > > Why do I get different lists of differentially expressed genes when I > use the three approaches? Which result should I choose? You get different results because you have different degrees of freedom for your statistics when you go from using all data to a subset. Fewer degrees of freedom translates to less power to detect differences. When you use global vs separate for decideTests, you are adjusting for multiplicity over different numbers of comparisons (when you use global, you pile all the p-values into one vector and then do multiplicity adjustment, when using separate, the adjustment is done by comparison). I don't think you will find anybody who is willing to tell you which one you should choose, especially via email. You might consider consulting with a local statistician if you need help. Best, Jim > > I would really appreciate if someone can give some help. > > Thank you, > Regards, > Wei Tie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.0 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks for reply from Heidi and James. Is there any disadvantage for using more degrees of freedom? Best Wei

ADD REPLY • link 14.0 years ago wei tie ▴ 30

0

Entering edit mode

Heidi Dvinge ★ 2.0k

@heidi-dvinge-2195

Last seen 9.6 years ago

Hello Wei, okay, not actually an answer to your question, but since you have a time series you might also want to consider the package "timecourse". I've found it quite useful for medium to long time series. Best wishes \Heidi =================== Loans that change lives https://www.kiva.org/ On 5 May 2010, at 14:09, wei tie wrote: > Hello , > > I am currently working on 30 Affymetrix arrays for a time course > experiment using Limma package. I have two groups (A and B) with > samples taken at 0hr, 1hr, 6hr, 12hrs and 24hrs. > > The following is how my targets file looks like: > FileName Targets > Files 1-3 A.0hr > Files 4-6 A.1hr > Files 7-9 A.6hr > Files 10-12 A.12hr > Files 13-15 A.24hr > Files 16-18 B.0hr > Files 19-21 B.1hr > Files 22-24 B.6hr > Files 25-27 B.12hr > Files 28-30 B.24hr > > If I apply the lmFit() function to all the 30 arrays as follows: > > sample<-c("A0","A1","A6","A12","A24","B0","B1","B6","B12","B24") > all<-factor(rep(sample,each=3),levels=sample) > design<-model.matrix(~0+all) > colnames(design)<-sample > fit1 <- lmFit(eset,design) > contrast<- makeContrasts(B1-B0,A1-A0, (B1-B0)-(A1-A0),levels=design) > fit2<- contrasts.fit(fit1, contrast) > fit2<- eBayes(fit2) > results1<-decideTests > (fit2,method="global",adjust.method="BH",p.value=0.05,lfc=1) > summary(results1) > B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) > -1 1941 1133 1155 > 0 26434 27285 28635 > 1 1879 1836 464 > > the result is that 1619(1155+464) genes changed at the first 1hr > differently between group A and group B. > > If I apply the lmFit() function only to the 12 samples taken at 0hr > and 1hr , and use the following target file: > FileName Targets > Files 1-3 A.0hr > Files 4-6 A.1hr > Files 7-9 B.0hr > Files 10-12 B.1hr > > then use the following script: > sample1<-c("A0","A1","B0","B1") > part<-factor(rep(sample,each=3),levels=sample) > design<-model.matrix(~0+part) > colnames(design)<-sample1 > fit3 <- lmFit(eset1,design) > contrast<- makeContrasts(B1-B0,A1-A0, (B1-B0)-(A1-A0),levels=design) > fit4<- contrasts.fit(fit3, contrast) > fit4<- eBayes(fit4) > results2<-decideTests > (fit4,method="global",adjust.method="BH",p.value=0.05,lfc=1) > summary(results2) > B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) > -1 1501 825 563 > 0 27478 28194 29483 > 1 1275 1235 208 > > the genes changing differently between group A and group B in terms of > interaction decreased to 771(563+208). > > If I use the method="separate" in decideTests() > Results3<-decideTests > (fit4,method="separate",adjust.method="BH",p.value=0.05,lfc=1) > summary(results3) > B1 - B0 A1 - A0 (B1 - B0) - (A1 - A0) > -1 1763 876 67 > 0 26915 28058 30169 > 1 1576 1320 18 > > the genes which changed differently between group A and group B are > much fewer (67+18=85). > > Why do I get different lists of differentially expressed genes when I > use the three approaches? Which result should I choose? > > I would really appreciate if someone can give some help. > > Thank you, > Regards, > Wei Tie > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 14.0 years ago Heidi Dvinge ★ 2.0k

Login before adding your answer.