Unexpectedly high FDR using contrasts in Limma?
1
0
Entering edit mode
@gordon-smyth
Last seen 2 minutes ago
WEHI, Melbourne, Australia
Dear Jose, It all looks normal and expected to me. Are you surprised that you get a significant MBD1 comparison but not DBL-MBD1? Don't forget that direct comparison are more precision, and hence more powerful, than indirect comparisons. Look at the stdev.unscaled from limma and you'll see that larger values for DBL-MBD1. Other things like including a dye-effect, background correcting, or otherwise checking the data often help give more powerful results. Best wishes Gordon > Date: Thu, 05 Apr 2007 19:45:01 +0100 > From: J.delasHeras at ed.ac.uk > Subject: [BioC] Unexpectedly high FDR using contrasts in Limma? > To: bioconductor at stat.math.ethz.ch > Message-ID: <20070405194501.1dq6rz6tlogk40g4 at www.staffmail.ed.ac.uk> > > > Hi everyone, > > I'm analysing (limma 2.9.10) a set of twelve 2-colour cDNA arrays (4 > experiments of 3 slides each) using a common reference design. I find > that I get very high adjusted P values (BH) using contrasts to compare > some of these experiments. > > No adjusted P value is lower than 0.9, which would indicate there's > not enough evidence that any gene behaves different in either > experiment, and I find that surprising. Yes, all four experiments are > quite similar, but there are some genes that do stand out enough (I'd > think!) and I have confirmation by RT that this is the case. > > So I am wondering whether I don't fully understand where these > adjusted p values come from, or whether I'm not asking the question I > think I am asking, or maybe an error on my code? > > Here's a summary of what I did: > > - 12 files read with read.maimages > >> targets #H226 is the common reference > SlideNumber FileName Cy3 Cy5 > 1 25 140025-3.gpr H226 MBD1 > 2 26 140026-3.gpr MBD1 H226 > 3 23 140023-3.gpr H226 MBD1 > 4 19 140019-3.gpr MBD2 H226 > 5 17 140017-3.gpr H226 MBD2 > 6 15 140015-3.gpr MBD2 H226 > 7 20 140020-3.gpr H226 MeCP2 > 8 18 140018-3.gpr MeCP2 H226 > 9 16 140016-3.gpr MeCP2 H226 > 10 22 140022-3.gpr H226 DBL > 11 24 140024-3.gpr DBL H226 > 12 14 140014-3.gpr DBL H226 > > - background corrected using a custom method: RGList object 'RG.s' > >> design # H226 as common reference > DBL MBD1 MBD2 MeCP2 > 1 0 1 0 0 > 2 0 -1 0 0 > 3 0 1 0 0 > 4 0 0 -1 0 > 5 0 0 1 0 > 6 0 0 -1 0 > 7 0 0 0 1 > 8 0 0 0 -1 > 9 0 0 0 -1 > 10 1 0 0 0 > 11 -1 0 0 0 > 12 -1 0 0 0 > >> MA.sw<-normalizeWithinArrays(RG.s, layout, bc.method="none", >> method="printtiploess") > >> fit.sw<-lmFit(MA.sw, design, method="ls") >> eb.sw<-eBayes(fit.sw, proportion=0.01) > > - I get the adjusted P values from 'topTable': > >> tt.sw.MBD1<-topTable(eb.sw, coef=2, n=number, genelist=gal, >> adjust.method="BH", sort.by="P") >> tt.sw.MBD2<-topTable(eb.sw, coef=3, n=number, genelist=gal, >> adjust.method="BH", sort.by="P") >> tt.sw.MeCP2<-topTable(eb.sw, coef=4, n=number, genelist=gal, >> adjust.method="BH", sort.by="P") >> tt.sw.DBL<-topTable(eb.sw, coef=1, n=number, genelist=gal, >> adjust.method="BH", sort.by="P") > > The values I get here look alright and make sense. > > From the 4 experiments, the 4th one (DBL) is a control experiment. I > would like to compare the other three to it, to see what genes are > differentially expressed between each of the first three experiments > and the control. I wanted to use contrasts for this. This is how I did > it: > >> levels<-c("DBL", "MBD1", "MBD2", "MeCP2") >> contrasts<-makeContrasts(DBL,MBD1,MBD2,MeCP2,DBL-MBD1,DBL-MBD2,DBL- MeCP2,levels=levels) > >> contrasts > Contrasts > Levels DBL MBD1 MBD2 MeCP2 DBL - MBD1 DBL - MBD2 DBL - MeCP2 > DBL 1 0 0 0 1 1 1 > MBD1 0 1 0 0 -1 0 0 > MBD2 0 0 1 0 0 -1 0 > MeCP2 0 0 0 1 0 0 -1 > >> fitc<-contrasts.fit(fit.sw,contrasts) >> ebfitc<-eBayes(fitc) > > then I use 'topTable' again to get the adjusted p values etc: > >> tt.cMBD1<-topTable(ebfitc, coef=5, n=number, genelist=gal, >> adjust.method="BH", sort.by="P") >> tt.cMBD2<-topTable(ebfitc, coef=6, n=number, genelist=gal, >> adjust.method="BH", sort.by="P") >> tt.cMeCP2<-topTable(ebfitc, coef=7, n=number, genelist=gal, >> adjust.method="BH", sort.by="P") > > the top of the list makes sense, I picked up the genes I was > expecting, however the adjusted p values are terrible, so I wonder if > this is real or I am making a mistake somewhere. > > As an example, here's a gene I know to be differentially expressed: CDKN1C > This gene has negligible expression in the common reference, it has > negligible expression after the control experiment (DBL) is performed, > but it is clearly expressed after the MBD1 experiment is performed > (verified by RT). This is what I get: > > A.mean = 10.48 (it goes up to 11.6 in the individual slides of the > MBD1 experiment, it stays low in all others) > > M (MBD1) = 2.98 > P (MBD1) = 6.07e-05 > adj p val (MBD1) = 0.00044 > > M (DBL) = -0.44 > P (DBL) = 0.38 > adj p val (DBL) = 0.56 > > M (DBL-MBD1) = -3.43 > P (DBL-MBD1) = 0.00036 > adj p val (DBL-MBD1) = 0.95 > > I am surprised this gene (and several others like this one) give me > such poor adjusted p values... > > why? > > Jose > > -- > Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 > Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 > Swann Building, Mayfield Road > University of Edinburgh > Edinburgh EH9 3JR > UK
limma limma • 947 views
ADD COMMENT
0
Entering edit mode
@jdelasherasedacuk-1189
Last seen 8.7 years ago
United Kingdom
Hi Gordon, > It all looks normal and expected to me. Are you surprised that you > get a significant MBD1 > comparison but not DBL-MBD1? Don't forget that direct comparison > are more precision, and hence > more powerful, than indirect comparisons. I was REALLY hoping that I was doing something wrong... :( I know that the direct comparison is more powerful (although I don't know just *how much* more powerful), and the problem here is that we might want to use different controls in the future, so this way I can add a new control experiment easily, without repeating the whole experiment again. I am surprised and not. It surprises me because the difference is remarkable (at least it looks that way to me) between the DBL experiment (no effect) and the MBD1 experiment (clearcut effect), at least for 10-20 genes. At the same time, these experiments were tough, in that there is a lot of biological variation: for each experiment 3 replicates were made, and they vary a lot in the level of the observed effect... so I guess yeah, it's not that surprising to see that when I make a contrast, the overall variation is high enough that it makes it too difficult to pick out any individual gene with any degree of confidence. I was still hoping, 'though :-) > Look at the stdev.unscaled from limma and you'll see > that larger values for DBL-MBD1. Indeed, from 0.58 on MBD1 to 0.87 in DBL-MBD1. > Other things like including a dye-effect, background correcting, or > otherwise checking the data > often help give more powerful results. I thought the dye effect is incorporated by default in limma, without specifying it implicitly. I noticed that background correction can have a big effect on the final results. I really don't like straight substraction of background estimates, when the estimate is simply the signal measured locally around the spots. I try to use estimates based on signal from spots that have DNA spotted but don't hybridise, if I can, and substract that. I've tried other methods, normexp can work pretty well once you hit the right offset... I've been meaning to try single channel analysis in limma. I wonder if that could help me improve matters a little, by allowing me to compare MBD1 against DBL directly... however, the bottomline remains that there's a lot of variation in these experiments so if the preliminary results look ok, we might just have to increase the number of slides (we have the slides, for free, it's only the cost of transfections + labelling). Hmmm, gotta think. many thanks for your reply, Gordon, it was helpful as usual, just not what I wanted to hear :-) best, Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK
ADD COMMENT
0
Entering edit mode
On Sat, April 7, 2007 12:43 am, J.delasHeras at ed.ac.uk wrote: >> Other things like including a dye-effect, background correcting, or >> otherwise checking the data >> often help give more powerful results. > > I thought the dye effect is incorporated by default in limma, without > specifying it implicitly. I'm referring to probe-specific dye-effects. See page 36 of the User's Guide. limma automatically takes into account the arrangement of dye swaps when computing the treatment effects, but it doesn't automatically include probe-specific dye- effects. In an experiment such as yours, including a dye-effect term, which is just an intercept term for your experiment, can globally reduce the standard errors and hence increase statistical significance overall. > I've been meaning to try single channel analysis in limma. I wonder if > that could help me improve matters a little, by allowing me to compare > MBD1 against DBL directly... Single channel analysis doesn't turn an indirect comparison into a direct one. That is a structural matter. For your experiment, single channel analysis simply recovers information from the A-values as well as M-values, usually about 8% extra information. Best wishes Gordon
ADD REPLY

Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6