Comparison of diff. t-statistics, Limma and rowttests

0

Entering edit mode

Boel Brynedal ▴ 200

@boel-brynedal-2091

Last seen 9.6 years ago

Dear List, I have affy hgu133plus2 arrays from individuals with disease, in two different stages of the disease. I've earlier used rowttests and FDR correction. Now I was playing around with limma to see what I could do (added different covariates etc) but also investigated the most simple setting, comparing the two different stages directly using Limma. The first thing that struck me was that limma "finds" only half the amount of significantly diff expressed genes. So I started to look at the t-statistics from limma. Then I stumbled across this: when I do a qq-plot of the ordinary t-statistics they are far from normally distributed, and actually totally strange. See attached plot comparing the ordinary t, the moderate t (both from Limma) as well as t-statistics from rowttests ("Diff_tStatistics_Limma.jpg"). Am I doing something completely wrong? The assumption of equal variance taken using ordinary t could not create this, could it? Please help me figure out what's wrong here, I'm hoping I've done some stupid mistake. What else could explain this? Thank you. Best wishes, Boel My code and sessionInfo: # eset is a filtered, gcrma normalized ExpressionSet with ~10 000 probe sets, 24 arrays. library(limma) library(Biobase) library(genefilter) specific<-factor(c(rep("stageA",10),rep("stageB",14)), levels=c("stageB","stageA")) design<-model.matrix(~specific) fit<-lmFit(eset,design) Fit<-eBayes(fit) ordinary.t <- fit3$coef / fit3$stdev.unscaled / fit3$sigma moderate.t<-Fit$t[,2] rowttests.t<-rowttests(eset,fac=specific) par(mfrow=c(1,3)) qqnorm(ordinary.t,main="fit ordinary.t") qqnorm(moderate.t, main=" Fit moderate.t") qqnorm(rowttests.t[,1], main= "rowttests.t") dev2bitmap("Diff_tStatistics_Limma.jpg",type="jpeg", height = 5, width = 15, res = 75) > sessionInfo() R version 2.7.1 (2008-06-23) x86_64-unknown-linux-gnu locale: ... attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] genefilter_1.20.0 survival_2.34-1 Biobase_2.0.1 limma_2.14.5 loaded via a namespace (and not attached): [1] annotate_1.18.0 AnnotationDbi_1.2.2 DBI_0.2-4 [4] RSQLite_0.6-9 --~*~**~***~*~***~**~*~-- Boel Brynedal, MSc, PhD student Karolinska Institutet Department of Clinical neuroscience

hgu133plus2 affy limma gcrma hgu133plus2 affy limma gcrma • 1.2k views

ADD COMMENT • link 15.8 years ago Boel Brynedal ▴ 200

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 16 days ago

EMBL European Molecular Biology Laborat…

Dear Boel it seems that your attachment didn't go through. Can you post the plots on Flickr, Youtube, Picasaweb or the like? -- Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 25/07/2008 08:33 Boel Brynedal scripsit > Dear List, > > I have affy hgu133plus2 arrays from individuals with disease, in two > different stages of the disease. I've earlier used rowttests and FDR > correction. Now I was playing around with limma to see what I could do > (added different covariates etc) but also investigated the most simple > setting, comparing the two different stages directly using Limma. The > first thing that struck me was that limma "finds" only half the amount > of significantly diff expressed genes. So I started to look at the > t-statistics from limma. Then I stumbled across this: when I do a > qq-plot of the ordinary t-statistics they are far from normally > distributed, and actually totally strange. See attached plot comparing > the ordinary t, the moderate t (both from Limma) as well as t-statistics > from rowttests ("Diff_tStatistics_Limma.jpg"). > > Am I doing something completely wrong? The assumption of equal variance > taken using ordinary t could not create this, could it? Please help me > figure out what's wrong here, I'm hoping I've done some stupid mistake. > What else could explain this? Thank you. > > Best wishes, > Boel > > My code and sessionInfo: > > # eset is a filtered, gcrma normalized ExpressionSet with ~10 000 probe > sets, 24 arrays. > library(limma) > library(Biobase) > library(genefilter) > specific<-factor(c(rep("stageA",10),rep("stageB",14)), > levels=c("stageB","stageA")) > design<-model.matrix(~specific) > fit<-lmFit(eset,design) > Fit<-eBayes(fit) > > ordinary.t <- fit3$coef / fit3$stdev.unscaled / fit3$sigma > moderate.t<-Fit$t[,2] > rowttests.t<-rowttests(eset,fac=specific) > > par(mfrow=c(1,3)) > qqnorm(ordinary.t,main="fit ordinary.t") > qqnorm(moderate.t, main=" Fit moderate.t") > qqnorm(rowttests.t[,1], main= "rowttests.t") > dev2bitmap("Diff_tStatistics_Limma.jpg",type="jpeg", height = 5, width = > 15, res = 75) > >> sessionInfo() > R version 2.7.1 (2008-06-23) > x86_64-unknown-linux-gnu > > locale: > ... > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] genefilter_1.20.0 survival_2.34-1 Biobase_2.0.1 limma_2.14.5 > > loaded via a namespace (and not attached): > [1] annotate_1.18.0 AnnotationDbi_1.2.2 DBI_0.2-4 > [4] RSQLite_0.6-9 > > --~*~**~***~*~***~**~*~-- > Boel Brynedal, MSc, PhD student > Karolinska Institutet > Department of Clinical neuroscience > > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Boel Brynedal ▴ 200

@boel-brynedal-2091

Last seen 9.6 years ago

Dear List, Thank you (Wolfgang and Paolo) for telling me the attachment did not get through. This is a link to the picture: http://picasaweb.google.se/Boelbubblan/Statistics/photo Cheers, Boel --~*~**~***~*~***~**~*~-- Boel Brynedal, MSc, PhD student Karolinska Institutet Department of Clinical neuroscience ----- Original Message ----- From: Boel Brynedal <boel.brynedal@ki.se> Date: Friday, July 25, 2008 9:33 am Subject: Comparison of diff. t-statistics, Limma and rowttests To: bioconductor at stat.math.ethz.ch > Dear List, > > I have affy hgu133plus2 arrays from individuals with disease, in two > different stages of the disease. I've earlier used rowttests and FDR > correction. Now I was playing around with limma to see what I could do > (added different covariates etc) but also investigated the most simple > setting, comparing the two different stages directly using Limma. The > first thing that struck me was that limma "finds" only half the amount > of significantly diff expressed genes. So I started to look at the > t-statistics from limma. Then I stumbled across this: when I do a > qq-plot of the ordinary t-statistics they are far from normally > distributed, and actually totally strange. See attached plot comparing > the ordinary t, the moderate t (both from Limma) as well as t- > statisticsfrom rowttests ("Diff_tStatistics_Limma.jpg"). > > Am I doing something completely wrong? The assumption of equal > variancetaken using ordinary t could not create this, could it? > Please help me > figure out what's wrong here, I'm hoping I've done some stupid > mistake.What else could explain this? Thank you. > > Best wishes, > Boel > > My code and sessionInfo: > > # eset is a filtered, gcrma normalized ExpressionSet with ~10 000 > probesets, 24 arrays. > library(limma) > library(Biobase) > library(genefilter) > specific<-factor(c(rep("stageA",10),rep("stageB",14)), > levels=c("stageB","stageA")) > design<-model.matrix(~specific) > fit<-lmFit(eset,design) > Fit<-eBayes(fit) > > ordinary.t <- fit3$coef / fit3$stdev.unscaled / fit3$sigma > moderate.t<-Fit$t[,2] > rowttests.t<-rowttests(eset,fac=specific) > > par(mfrow=c(1,3)) > qqnorm(ordinary.t,main="fit ordinary.t") > qqnorm(moderate.t, main=" Fit moderate.t") > qqnorm(rowttests.t[,1], main= "rowttests.t") > dev2bitmap("Diff_tStatistics_Limma.jpg",type="jpeg", height = 5, > width = > 15, res = 75) > > > sessionInfo() > R version 2.7.1 (2008-06-23) > x86_64-unknown-linux-gnu > > locale: > ... > > attached base packages: > [1] splines tools stats graphics grDevices utils > datasets[8] methods base > > other attached packages: > [1] genefilter_1.20.0 survival_2.34-1 Biobase_2.0.1 limma_2.14.5 > > loaded via a namespace (and not attached): > [1] annotate_1.18.0 AnnotationDbi_1.2.2 DBI_0.2-4 > [4] RSQLite_0.6-9 > > --~*~**~***~*~***~**~*~-- > Boel Brynedal, MSc, PhD student > Karolinska Institutet > Department of Clinical neuroscience > >

ADD COMMENT • link 15.8 years ago Boel Brynedal ▴ 200

0

Entering edit mode

Dear Boel How does the "pairs" plot look like for the matrix with rows = genes, columns = three different ways of computing t? Can you single out the data for one particular gene where you get a big difference (e.g. where ordinary.t is so large) and trace back how the computations in lmFit produce that result? Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber 25/07/2008 12:19 Boel Brynedal scripsit > Dear List, > > Thank you (Wolfgang and Paolo) for telling me the attachment did not get > through. This is a link to the picture: > http://picasaweb.google.se/Boelbubblan/Statistics/photo > > Cheers, > Boel > > --~*~**~***~*~***~**~*~-- > Boel Brynedal, MSc, PhD student > Karolinska Institutet > Department of Clinical neuroscience > > > ----- Original Message ----- > From: Boel Brynedal <boel.brynedal at="" ki.se=""> > Date: Friday, July 25, 2008 9:33 am > Subject: Comparison of diff. t-statistics, Limma and rowttests > To: bioconductor at stat.math.ethz.ch > >> Dear List, >> >> I have affy hgu133plus2 arrays from individuals with disease, in two >> different stages of the disease. I've earlier used rowttests and FDR >> correction. Now I was playing around with limma to see what I could do >> (added different covariates etc) but also investigated the most simple >> setting, comparing the two different stages directly using Limma. The >> first thing that struck me was that limma "finds" only half the amount >> of significantly diff expressed genes. So I started to look at the >> t-statistics from limma. Then I stumbled across this: when I do a >> qq-plot of the ordinary t-statistics they are far from normally >> distributed, and actually totally strange. See attached plot comparing >> the ordinary t, the moderate t (both from Limma) as well as t- >> statisticsfrom rowttests ("Diff_tStatistics_Limma.jpg"). >> >> Am I doing something completely wrong? The assumption of equal >> variancetaken using ordinary t could not create this, could it? >> Please help me >> figure out what's wrong here, I'm hoping I've done some stupid >> mistake.What else could explain this? Thank you. >> >> Best wishes, >> Boel >> >> My code and sessionInfo: >> >> # eset is a filtered, gcrma normalized ExpressionSet with ~10 000 >> probesets, 24 arrays. >> library(limma) >> library(Biobase) >> library(genefilter) >> specific<-factor(c(rep("stageA",10),rep("stageB",14)), >> levels=c("stageB","stageA")) >> design<-model.matrix(~specific) >> fit<-lmFit(eset,design) >> Fit<-eBayes(fit) >> >> ordinary.t <- fit3$coef / fit3$stdev.unscaled / fit3$sigma >> moderate.t<-Fit$t[,2] >> rowttests.t<-rowttests(eset,fac=specific) >> >> par(mfrow=c(1,3)) >> qqnorm(ordinary.t,main="fit ordinary.t") >> qqnorm(moderate.t, main=" Fit moderate.t") >> qqnorm(rowttests.t[,1], main= "rowttests.t") >> dev2bitmap("Diff_tStatistics_Limma.jpg",type="jpeg", height = 5, >> width = >> 15, res = 75) >> >>> sessionInfo() >> R version 2.7.1 (2008-06-23) >> x86_64-unknown-linux-gnu >> >> locale: >> ... >> >> attached base packages: >> [1] splines tools stats graphics grDevices utils >> datasets[8] methods base >> >> other attached packages: >> [1] genefilter_1.20.0 survival_2.34-1 Biobase_2.0.1 limma_2.14.5 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.18.0 AnnotationDbi_1.2.2 DBI_0.2-4 >> [4] RSQLite_0.6-9 >> >> --~*~**~***~*~***~**~*~-- >> Boel Brynedal, MSc, PhD student >> Karolinska Institutet >> Department of Clinical neuroscience >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.8 years ago Wolfgang Huber ★ 13k

Login before adding your answer.