Question

limma: instances of highly variable paired ratios, but very small p-values

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Dear Ryan, The limma algorithm is very well understood by now, and there are many bioinformatications at the FHCRC who could probably answer your question. I find it hard to make a response to your email because I just see a jumble of numbers. I don't have a firm understanding of what your experimental design is, what model you've fitted, what the numbers are that you've calculated, or what you want me to see. To comment more, I'd need to see your experimental design and the complete pipeline of commands used to generate the output given. Best wishes Gordon I don't a firm idea of what your experimental design > Date: Wed, 14 Nov 2012 17:23:43 -0800 > From: Ryan Basom <rbasom at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] limma: instances of highly variable paired ratios, but > very small p-values > > Hi, > > I've performed a limma analysis on paired samples that were run on > Illumina HT12 arrays, with three replicates in each condition. I'm a > bit troubled by the results though, as there are several probes that > have very small adjusted p-values, though when looking at the paired > ratio values, they vary quite a bit. Here are a few examples where the > comparison is long-short, and the samples are paired by the letters > A,B,C. After the adj.P.Val column, I've calculated the paired sample > ratio values, these three columns are followed by the signal intensities > from each sample: > > ProbeID TargetID logFC AveExpr P.Value adj.P.Val long-short.A > long-short.B long-short.C long.A long.B long.C short.A short.B short.C > 1450390 RPL17 -1.3733092649 10.2105020267 4.35314891863083e-17 > 4.55305891127872e-14 -1.277287712 -0.6714209686 -2.1712191142 > 9.5618416199 9.5085763086 9.5011242541 10.8391293319 10.1799972771 > 11.6723433683 1230376 ALAS2 1.4395987013 10.0069363572 > 4.9058551374517e-17 4.76463659313792e-14 0.356310701 2.2275874983 > 1.7348979044 9.1085909827 10.4991863428 12.5724297981 8.7522802817 > 8.2715988445 10.8375318936 3420451 RSL24D1 -1.2585240828 8.0288125742 > 6.26229691539969e-15 4.73046950881609e-12 -0.9845918613 -0.6335605827 > -2.1574198045 7.5348228063 7.4471358372 7.2166929547 8.5194146676 > 8.0806964199 9.3741127592 > > > I'd assumed that limma would have been more sensitive to this and am > wondering if anyone could please explain why this may be occurring. > > Thanks, > > Ryan > > __ > Ryan Basom > Systems Analyst/Programmer > Genomics Resource > Fred Hutchinson Cancer Research Center > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

Cancer limma Cancer limma • 890 views

ADD COMMENT • link updated 13.1 years ago by Ryan Basom ▴ 20 • written 13.1 years ago by Gordon Smyth 53k

score 0 · Answer 1 · 2012-11-18

Dear Gordon, I apologize about the lack of clarity in my initial post. In my experiment, I'm working with three pairs of twins that differ in sleep duration. In my targets file, the twins are paired by "Source," and are grouped by sleep duration - either "short" or "long." I want try to identify significantly differentially expressed probes between the "short" and "long" groups. > targets Sample Group Source 1 short.A short A 2 short.B short B 3 short.C short C 4 long.A long A 5 long.B long B 6 long.C long C "d" is a data.frame that contains quantile normalized, log2 scaled signal intensity values, with samples represented in columns. The data set has undergone signal intensity and variance filtering steps, resulting in 13,597 rows of values that were used with limma. > head(d) long.A long.B long.C short.A short.B short.C 1690577 9.717804 10.068060 9.969054 11.74844 10.84940 12.46522 2120040 9.234160 9.415976 9.463955 10.95052 10.42986 11.84757 3180438 9.724018 9.370251 9.758478 11.32368 10.51315 11.97926 3130324 8.980456 9.163279 9.203548 10.57555 10.13772 11.71771 6380255 9.624958 9.118592 9.364389 11.02301 10.20351 11.85628 6560164 10.097376 9.852433 10.089217 11.55226 10.78285 12.20962 design<-model.matrix(~targets$Source+targets$Group) fit <- lmFit(d, design) fit <- eBayes(fit) results<-topTable(fit, coef="targets$Groupshort", number=nrow(d)) What I'm puzzled about are instances of probes with very small p-values, when the sample paired ratios vary quite a bit. Below are a few examples. > exampleProbes<-c(1450390,1230376,3420451) > results[results$ID %in% exampleProbes,] ID logFC AveExpr t P.Value adj.P.Val B 13 1450390 -1.373309 10.210502 -9.807005 4.353149e-17 4.553059e-14 28.26206 14 1230376 1.439599 10.006936 9.785405 4.905855e-17 4.764637e-14 28.14655 18 3420451 -1.258524 8.028813 -8.903744 6.262297e-15 4.730470e-12 23.45921 > signalIntensities<-d[row.names(d) %in% exampleProbes,] > signalIntensities long.A long.B long.C short.A short.B short.C 1450390 9.561842 9.508576 9.501124 10.839129 10.179997 11.672343 1230376 9.108591 10.499186 12.572430 8.752280 8.271599 10.837532 3420451 7.534823 7.447136 7.216693 8.519415 8.080696 9.374113 > ratios<-signalIntensities[1:3]-signalIntensities[4:6] > ratios long.A long.B long.C 1450390 -1.2772877 -0.6714210 -2.171219 1230376 0.3563107 2.2275875 1.734898 3420451 -0.9845919 -0.6335606 -2.157420 Thank you very much for any assistance with this. Best, Ryan On Fri, Nov 16, 2012 at 10:36 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Ryan, > > The limma algorithm is very well understood by now, and there are many > bioinformatications at the FHCRC who could probably answer your question. > > I find it hard to make a response to your email because I just see a > jumble of numbers. I don't have a firm understanding of what your > experimental design is, what model you've fitted, what the numbers are that > you've calculated, or what you want me to see. To comment more, I'd need > to see your experimental design and the complete pipeline of commands used > to generate the output given. > > Best wishes > Gordon > > I don't a firm idea of what your experimental design > > Date: Wed, 14 Nov 2012 17:23:43 -0800 >> From: Ryan Basom <rbasom@gmail.com> >> To: bioconductor@r-project.org >> Subject: [BioC] limma: instances of highly variable paired ratios, but >> very small p-values >> >> Hi, >> >> I've performed a limma analysis on paired samples that were run on >> Illumina HT12 arrays, with three replicates in each condition. I'm a bit >> troubled by the results though, as there are several probes that have very >> small adjusted p-values, though when looking at the paired ratio values, >> they vary quite a bit. Here are a few examples where the comparison is >> long-short, and the samples are paired by the letters A,B,C. After the >> adj.P.Val column, I've calculated the paired sample ratio values, these >> three columns are followed by the signal intensities from each sample: >> >> ProbeID TargetID logFC AveExpr P.Value adj.P.Val long-short.A >> long-short.B long-short.C long.A long.B long.C short.A short.B short.C >> 1450390 RPL17 -1.3733092649 10.2105020267 4.35314891863083e-17 >> 4.55305891127872e-14 -1.277287712 -0.6714209686 -2.1712191142 9.5618416199 >> 9.5085763086 9.5011242541 10.8391293319 10.1799972771 11.6723433683 1230376 >> ALAS2 1.4395987013 10.0069363572 4.9058551374517e-17 4.76463659313792e-14 >> 0.356310701 2.2275874983 1.7348979044 9.1085909827 10.4991863428 >> 12.5724297981 8.7522802817 8.2715988445 10.8375318936 3420451 RSL24D1 >> -1.2585240828 8.0288125742 6.26229691539969e-15 4.73046950881609e-12 >> -0.9845918613 -0.6335605827 -2.1574198045 7.5348228063 7.4471358372 >> 7.2166929547 8.5194146676 8.0806964199 9.3741127592 >> >> >> I'd assumed that limma would have been more sensitive to this and am >> wondering if anyone could please explain why this may be occurring. >> >> Thanks, >> >> Ryan >> >> __ >> Ryan Basom >> Systems Analyst/Programmer >> Genomics Resource >> Fred Hutchinson Cancer Research Center >> >> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}

score 0 · Answer 2 · 2012-11-19

Dear Ryan, Thanks, this is much clearer now. There isn't any contradiction in the results you give, and the results are essentially as I would want them to be. The aim of the moderated t-test is to test whether the log-ratios are consistently greater or lesser than zero, not to test whether the log-ratios are equal. For each of the three genes you give, the three pairwise log-ratios are consistently greater than or less than zero. For each gene, the three log-ratios are of the same sign, some quite large and none very small. Hence we would conclude that the treatment (sleep duration) does have a worthwhile effect for these genes. Note that the moderated t-test does not penalize a gene for having a large variance as much as the ordinary t-test would do. Compared to the ordinary t-test, the moderated t-test gives more weight to the treatment fold change. It can therefore be viewed as a compromise between the ordinary t-test and ranking by fold-change. To see whether the compromise is being drawn, look at fit$df.prior. If this equal to 3 (the residual df for you experiment), then the moderated t-test is giving equal weight to the genewise variance and to the global variance. If df.prior is much larger than 3, then the moderated t-test is using mostly the global variance. The larger the df.prior, the closer the moderated t-test will come to ranking genes by fold change. The smaller the df.prior, the closer the moderated t-test will come to ranking by ordinary t-test. Larger genewise variances are penalized more when df.prior is low and less when df.prior is large. I don't see a strong reason to suspect a problem in your analysis, but I should say that I do not necessarily recommend filtering by variance in conjunction with the eBayes procedure. There is a risk that, by removing genes with small overall variances, you are changing the natural properties of the data, leading limma to think that the variances are more consistent than they actually are, and leading limma to set df.prior larger than would be ideal. You will almost certainly find that, if you remove the filter-by-variance step, then limma will choose a lower value for df.prior than at present. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Sun, 18 Nov 2012, Ryan Basom wrote: > Dear Gordon, > > I apologize about the lack of clarity in my initial post. In my > experiment, I'm working with three pairs of twins that differ in sleep > duration. In my targets file, the twins are paired by "Source," and are > grouped by sleep duration - either "short" or "long." I want try to > identify significantly differentially expressed probes between the "short" > and "long" groups. > >> targets > Sample Group Source > 1 short.A short A > 2 short.B short B > 3 short.C short C > 4 long.A long A > 5 long.B long B > 6 long.C long C > > > "d" is a data.frame that contains quantile normalized, log2 scaled signal > intensity values, with samples represented in columns. The data set has > undergone signal intensity and variance filtering steps, resulting in > 13,597 rows of values that were used with limma. > >> head(d) > long.A long.B long.C short.A short.B short.C > 1690577 9.717804 10.068060 9.969054 11.74844 10.84940 12.46522 > 2120040 9.234160 9.415976 9.463955 10.95052 10.42986 11.84757 > 3180438 9.724018 9.370251 9.758478 11.32368 10.51315 11.97926 > 3130324 8.980456 9.163279 9.203548 10.57555 10.13772 11.71771 > 6380255 9.624958 9.118592 9.364389 11.02301 10.20351 11.85628 > 6560164 10.097376 9.852433 10.089217 11.55226 10.78285 12.20962 > > > design<-model.matrix(~targets$Source+targets$Group) > fit <- lmFit(d, design) > fit <- eBayes(fit) > results<-topTable(fit, coef="targets$Groupshort", number=nrow(d)) > > What I'm puzzled about are instances of probes with very small p-values, > when the sample paired ratios vary quite a bit. Below are a few examples. > >> exampleProbes<-c(1450390,1230376,3420451) >> results[results$ID %in% exampleProbes,] > ID logFC AveExpr t P.Value adj.P.Val B > 13 1450390 -1.373309 10.210502 -9.807005 4.353149e-17 4.553059e-14 28.26206 > 14 1230376 1.439599 10.006936 9.785405 4.905855e-17 4.764637e-14 28.14655 > 18 3420451 -1.258524 8.028813 -8.903744 6.262297e-15 4.730470e-12 23.45921 > > >> signalIntensities<-d[row.names(d) %in% exampleProbes,] >> signalIntensities > long.A long.B long.C short.A short.B short.C > 1450390 9.561842 9.508576 9.501124 10.839129 10.179997 11.672343 > 1230376 9.108591 10.499186 12.572430 8.752280 8.271599 10.837532 > 3420451 7.534823 7.447136 7.216693 8.519415 8.080696 9.374113 > >> ratios<-signalIntensities[1:3]-signalIntensities[4:6] >> ratios > long.A long.B long.C > 1450390 -1.2772877 -0.6714210 -2.171219 > 1230376 0.3563107 2.2275875 1.734898 > 3420451 -0.9845919 -0.6335606 -2.157420 > > > Thank you very much for any assistance with this. > > Best, > > Ryan > > > > > On Fri, Nov 16, 2012 at 10:36 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Ryan, >> >> The limma algorithm is very well understood by now, and there are many >> bioinformatications at the FHCRC who could probably answer your question. >> >> I find it hard to make a response to your email because I just see a >> jumble of numbers. I don't have a firm understanding of what your >> experimental design is, what model you've fitted, what the numbers are that >> you've calculated, or what you want me to see. To comment more, I'd need >> to see your experimental design and the complete pipeline of commands used >> to generate the output given. >> >> Best wishes >> Gordon >> >> I don't a firm idea of what your experimental design >> >> Date: Wed, 14 Nov 2012 17:23:43 -0800 >>> From: Ryan Basom <rbasom at="" gmail.com=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] limma: instances of highly variable paired ratios, but >>> very small p-values >>> >>> Hi, >>> >>> I've performed a limma analysis on paired samples that were run on >>> Illumina HT12 arrays, with three replicates in each condition. I'm a bit >>> troubled by the results though, as there are several probes that have very >>> small adjusted p-values, though when looking at the paired ratio values, >>> they vary quite a bit. Here are a few examples where the comparison is >>> long-short, and the samples are paired by the letters A,B,C. After the >>> adj.P.Val column, I've calculated the paired sample ratio values, these >>> three columns are followed by the signal intensities from each sample: >>> >>> ProbeID TargetID logFC AveExpr P.Value adj.P.Val long-short.A >>> long-short.B long-short.C long.A long.B long.C short.A short.B short.C >>> 1450390 RPL17 -1.3733092649 10.2105020267 4.35314891863083e-17 >>> 4.55305891127872e-14 -1.277287712 -0.6714209686 -2.1712191142 9.5618416199 >>> 9.5085763086 9.5011242541 10.8391293319 10.1799972771 11.6723433683 1230376 >>> ALAS2 1.4395987013 10.0069363572 4.9058551374517e-17 4.76463659313792e-14 >>> 0.356310701 2.2275874983 1.7348979044 9.1085909827 10.4991863428 >>> 12.5724297981 8.7522802817 8.2715988445 10.8375318936 3420451 RSL24D1 >>> -1.2585240828 8.0288125742 6.26229691539969e-15 4.73046950881609e-12 >>> -0.9845918613 -0.6335605827 -2.1574198045 7.5348228063 7.4471358372 >>> 7.2166929547 8.5194146676 8.0806964199 9.3741127592 >>> >>> >>> I'd assumed that limma would have been more sensitive to this and am >>> wondering if anyone could please explain why this may be occurring. >>> >>> Thanks, >>> >>> Ryan >>> >>> __ >>> Ryan Basom >>> Systems Analyst/Programmer >>> Genomics Resource >>> Fred Hutchinson Cancer Research Center ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}