limma, voom, weights: factorial versus nested interaction results

0

Entering edit mode

Reiner Schulz ▴ 40

@reiner-schulz-5678

Last seen 9.6 years ago

Dear all, Am analysing an RNA-seq experiment with a two-way factorial design, and am observing different results for two contrasts of interest depending on whether I use a factorial model matrix or a nested interaction model matrix. I only observe this if the data are pre-processed w/ voom: > counts.voom= voom( ... i.e., when there are observational level weights passed to lmFit. If I only pass counts.voom$E (no observational level weights) to lmFit, the results for the two contrasts of interest are identical for the factorial and nested interaction model matrices. Is this to be expected, and if so, what is the explanation? Much appreciated, Reiner PS: some more details ... > strain [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT WT WT [20] WT WT WT WT WT WT WT Levels: WT MUT > treat [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 T1 U T2 [26] T1 Levels: U T1 T2 > cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) > design.matrix= model.matrix( ~0 + cond) > design.matrix WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 1 0 1 0 0 0 0 2 0 0 0 0 0 1 3 0 0 0 1 0 0 4 0 1 0 0 0 0 5 0 0 0 0 0 1 6 0 0 0 1 0 0 7 0 1 0 0 0 0 8 0 1 0 0 0 0 9 0 0 0 0 0 1 10 0 0 0 1 0 0 11 0 1 0 0 0 0 12 0 0 0 0 0 1 13 0 0 0 1 0 0 14 1 0 0 0 0 0 15 0 0 0 0 1 0 16 0 0 1 0 0 0 17 1 0 0 0 0 0 18 0 0 0 0 1 0 19 0 0 1 0 0 0 20 1 0 0 0 0 0 21 1 0 0 0 0 0 22 0 0 0 0 1 0 23 0 0 1 0 0 0 24 1 0 0 0 0 0 25 0 0 0 0 1 0 26 0 0 1 0 0 0 > design.matrix.alt= model.matrix( ~strain + strain:treat) > design.matrix.alt Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 1 1 1 0 0 0 0 2 1 1 0 0 0 1 3 1 1 0 1 0 0 4 1 1 0 0 0 0 5 1 1 0 0 0 1 6 1 1 0 1 0 0 7 1 1 0 0 0 0 8 1 1 0 0 0 0 9 1 1 0 0 0 1 10 1 1 0 1 0 0 11 1 1 0 0 0 0 12 1 1 0 0 0 1 13 1 1 0 1 0 0 14 1 0 0 0 0 0 15 1 0 0 0 1 0 16 1 0 1 0 0 0 17 1 0 0 0 0 0 18 1 0 0 0 1 0 19 1 0 1 0 0 0 20 1 0 0 0 0 0 21 1 0 0 0 0 0 22 1 0 0 0 1 0 23 1 0 1 0 0 0 24 1 0 0 0 0 0 25 1 0 0 0 1 0 26 1 0 1 0 0 0 Observed equalities (as expected): MUT == MUT.U - WT.U WT:T1 == WT.T1 - WT.U MUT:T1 == MUT.T1 - MUT.U WT:T2 == WT.T2 - WT.U MUT:T2 == MUT.T2 - MUT.U MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) Observed inequalities (only w/ voom pre-processed data; unexpected): WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) --

• 1.6k views

ADD COMMENT • link updated 11.3 years ago by Gordon Smyth 50k • written 11.3 years ago by Reiner Schulz ▴ 40

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

It would probably help to have the exact code that you used to call voom. voom can take a lot of arguments, and I'm not sure which ones would make a difference here. On 01/02/2013 07:31 AM, Reiner Schulz wrote: > Dear all, > > Am analysing an RNA-seq experiment with a two-way factorial design, and > am observing different results for two contrasts of interest depending > on whether I use a factorial model matrix or a nested interaction model > matrix. I only observe this if the data are pre-processed w/ voom: >> counts.voom= voom( ... > i.e., when there are observational level weights passed to lmFit. If I > only pass counts.voom$E (no observational level weights) to lmFit, the > results for the two contrasts of interest are identical for the > factorial and nested interaction model matrices. > Is this to be expected, and if so, what is the explanation? > > Much appreciated, Reiner > > PS: some more details ... > >> strain > [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT > WT WT > [20] WT WT WT WT WT WT WT > Levels: WT MUT >> treat > [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 > T1 U T2 > [26] T1 > Levels: U T1 T2 > >> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', > 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >> design.matrix= model.matrix( ~0 + cond) >> design.matrix > WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 > 1 0 1 0 0 0 0 > 2 0 0 0 0 0 1 > 3 0 0 0 1 0 0 > 4 0 1 0 0 0 0 > 5 0 0 0 0 0 1 > 6 0 0 0 1 0 0 > 7 0 1 0 0 0 0 > 8 0 1 0 0 0 0 > 9 0 0 0 0 0 1 > 10 0 0 0 1 0 0 > 11 0 1 0 0 0 0 > 12 0 0 0 0 0 1 > 13 0 0 0 1 0 0 > 14 1 0 0 0 0 0 > 15 0 0 0 0 1 0 > 16 0 0 1 0 0 0 > 17 1 0 0 0 0 0 > 18 0 0 0 0 1 0 > 19 0 0 1 0 0 0 > 20 1 0 0 0 0 0 > 21 1 0 0 0 0 0 > 22 0 0 0 0 1 0 > 23 0 0 1 0 0 0 > 24 1 0 0 0 0 0 > 25 0 0 0 0 1 0 > 26 0 0 1 0 0 0 > >> design.matrix.alt= model.matrix( ~strain + strain:treat) >> design.matrix.alt > Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 > 1 1 1 0 0 0 0 > 2 1 1 0 0 0 1 > 3 1 1 0 1 0 0 > 4 1 1 0 0 0 0 > 5 1 1 0 0 0 1 > 6 1 1 0 1 0 0 > 7 1 1 0 0 0 0 > 8 1 1 0 0 0 0 > 9 1 1 0 0 0 1 > 10 1 1 0 1 0 0 > 11 1 1 0 0 0 0 > 12 1 1 0 0 0 1 > 13 1 1 0 1 0 0 > 14 1 0 0 0 0 0 > 15 1 0 0 0 1 0 > 16 1 0 1 0 0 0 > 17 1 0 0 0 0 0 > 18 1 0 0 0 1 0 > 19 1 0 1 0 0 0 > 20 1 0 0 0 0 0 > 21 1 0 0 0 0 0 > 22 1 0 0 0 1 0 > 23 1 0 1 0 0 0 > 24 1 0 0 0 0 0 > 25 1 0 0 0 1 0 > 26 1 0 1 0 0 0 > > Observed equalities (as expected): > MUT == MUT.U - WT.U > WT:T1 == WT.T1 - WT.U > MUT:T1 == MUT.T1 - MUT.U > WT:T2 == WT.T2 - WT.U > MUT:T2 == MUT.T2 - MUT.U > MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) > MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) > > Observed inequalities (only w/ voom pre-processed data; unexpected): > WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) > MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) > > -- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.3 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

hi Ryan, the voom call is: counts.voom= voom( counts, design= design.matrix, plot=T, lib.size=nf*colSums( counts), normalize.method= 'none') where nf= calcNormFactors( counts) and design.matrix as below. fitting is done using: fit.limma= lmFit( counts.voom$E, design= design.matrix, weights= counts.voom$weights) and: fit.limma.alt= lmFit( counts.voom$E, design= design.matrix.alt, weights= counts.voom$weights) where design.matrix.alt as below. so, the only difference is in the design matrix; expression values and weights are the same. yet, i observe the unexpected inequalities below. without 'weights= counts.voom$weights', everything is as expected. also, relevant bits from sessionInfo(): R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) limma_3.14.3 edgeR_3.0.7 DESeq_1.10.1 best, r On 02/01/13 19:47, Ryan C. Thompson wrote: > It would probably help to have the exact code that you used to call > voom. voom can take a lot of arguments, and I'm not sure which ones > would make a difference here. > > On 01/02/2013 07:31 AM, Reiner Schulz wrote: >> Dear all, >> >> Am analysing an RNA-seq experiment with a two-way factorial design, and >> am observing different results for two contrasts of interest depending >> on whether I use a factorial model matrix or a nested interaction model >> matrix. I only observe this if the data are pre-processed w/ voom: >>> counts.voom= voom( ... >> i.e., when there are observational level weights passed to lmFit. If I >> only pass counts.voom$E (no observational level weights) to lmFit, the >> results for the two contrasts of interest are identical for the >> factorial and nested interaction model matrices. >> Is this to be expected, and if so, what is the explanation? >> >> Much appreciated, Reiner >> >> PS: some more details ... >> >>> strain >> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT >> WT WT >> [20] WT WT WT WT WT WT WT >> Levels: WT MUT >>> treat >> [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 >> T1 U T2 >> [26] T1 >> Levels: U T1 T2 >> >>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', >> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >>> design.matrix= model.matrix( ~0 + cond) >>> design.matrix >> WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 >> 1 0 1 0 0 0 0 >> 2 0 0 0 0 0 1 >> 3 0 0 0 1 0 0 >> 4 0 1 0 0 0 0 >> 5 0 0 0 0 0 1 >> 6 0 0 0 1 0 0 >> 7 0 1 0 0 0 0 >> 8 0 1 0 0 0 0 >> 9 0 0 0 0 0 1 >> 10 0 0 0 1 0 0 >> 11 0 1 0 0 0 0 >> 12 0 0 0 0 0 1 >> 13 0 0 0 1 0 0 >> 14 1 0 0 0 0 0 >> 15 0 0 0 0 1 0 >> 16 0 0 1 0 0 0 >> 17 1 0 0 0 0 0 >> 18 0 0 0 0 1 0 >> 19 0 0 1 0 0 0 >> 20 1 0 0 0 0 0 >> 21 1 0 0 0 0 0 >> 22 0 0 0 0 1 0 >> 23 0 0 1 0 0 0 >> 24 1 0 0 0 0 0 >> 25 0 0 0 0 1 0 >> 26 0 0 1 0 0 0 >> >>> design.matrix.alt= model.matrix( ~strain + strain:treat) >>> design.matrix.alt >> Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 >> 1 1 1 0 0 0 0 >> 2 1 1 0 0 0 1 >> 3 1 1 0 1 0 0 >> 4 1 1 0 0 0 0 >> 5 1 1 0 0 0 1 >> 6 1 1 0 1 0 0 >> 7 1 1 0 0 0 0 >> 8 1 1 0 0 0 0 >> 9 1 1 0 0 0 1 >> 10 1 1 0 1 0 0 >> 11 1 1 0 0 0 0 >> 12 1 1 0 0 0 1 >> 13 1 1 0 1 0 0 >> 14 1 0 0 0 0 0 >> 15 1 0 0 0 1 0 >> 16 1 0 1 0 0 0 >> 17 1 0 0 0 0 0 >> 18 1 0 0 0 1 0 >> 19 1 0 1 0 0 0 >> 20 1 0 0 0 0 0 >> 21 1 0 0 0 0 0 >> 22 1 0 0 0 1 0 >> 23 1 0 1 0 0 0 >> 24 1 0 0 0 0 0 >> 25 1 0 0 0 1 0 >> 26 1 0 1 0 0 0 >> >> Observed equalities (as expected): >> MUT == MUT.U - WT.U >> WT:T1 == WT.T1 - WT.U >> MUT:T1 == MUT.T1 - MUT.U >> WT:T2 == WT.T2 - WT.U >> MUT:T2 == MUT.T2 - MUT.U >> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) >> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) >> >> Observed inequalities (only w/ voom pre-processed data; unexpected): >> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) >> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) >> >> -- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Reiner Schulz, Ph.D. Senior Lecturer in Bioinformatics and Epigenomics Dept. of Medical and Molecular Genetics King's College London Guy's Hospital, 8th floor Tower Wing London SE1 9RT https://atlas.genetics.kcl.ac.uk/~rschulz

ADD REPLY • link 11.3 years ago Reiner Schulz ▴ 40

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Dear Reiner, You don't mention contrasts.fit(), but I assume that you are using that function to get equivalent quantities from the two design matrices. If I am guessing your problem correctly, then the issue is that contrasts.fit() gives results that are not exact in the presence of voom weights. See the warning note on the help page for contrasts.fit. You will get the exact correct t-statistic if the contrast you want is a coefficient of your design matrix. You should also get exact correct results when using the one way layout design formula ~0+cond, but you may get an approximate test statistic if you use a contrast from the nested formula. The fold changes are exact in all cases. There was a thread about this a few years ago, see: https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030 947.html Basically I have promised to change contrasts.fit() to give exact results in all cases, even though this will make it much slower in some cases. I have not so far found the time to do it, but will definitely do so by the time we publish the voom method. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth > Date: Wed, 2 Jan 2013 13:31:58 +0100 > From: Reiner Schulz <reiner.schulz at="" kcl.ac.uk=""> > To: <bioconductor at="" r-project.org=""> > Subject: [BioC] limma, voom, weights: factorial versus nested > interaction results > > Dear all, > > Am analysing an RNA-seq experiment with a two-way factorial design, and > am observing different results for two contrasts of interest depending > on whether I use a factorial model matrix or a nested interaction model > matrix. I only observe this if the data are pre-processed w/ voom: >> counts.voom= voom( ... > i.e., when there are observational level weights passed to lmFit. If I > only pass counts.voom$E (no observational level weights) to lmFit, the > results for the two contrasts of interest are identical for the > factorial and nested interaction model matrices. > Is this to be expected, and if so, what is the explanation? > > Much appreciated, Reiner > > PS: some more details ... > >> strain > [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT > WT WT > [20] WT WT WT WT WT WT WT > Levels: WT MUT >> treat > [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 > T1 U T2 > [26] T1 > Levels: U T1 T2 > >> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', > 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >> design.matrix= model.matrix( ~0 + cond) >> design.matrix > WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 > 1 0 1 0 0 0 0 > 2 0 0 0 0 0 1 > 3 0 0 0 1 0 0 > 4 0 1 0 0 0 0 > 5 0 0 0 0 0 1 > 6 0 0 0 1 0 0 > 7 0 1 0 0 0 0 > 8 0 1 0 0 0 0 > 9 0 0 0 0 0 1 > 10 0 0 0 1 0 0 > 11 0 1 0 0 0 0 > 12 0 0 0 0 0 1 > 13 0 0 0 1 0 0 > 14 1 0 0 0 0 0 > 15 0 0 0 0 1 0 > 16 0 0 1 0 0 0 > 17 1 0 0 0 0 0 > 18 0 0 0 0 1 0 > 19 0 0 1 0 0 0 > 20 1 0 0 0 0 0 > 21 1 0 0 0 0 0 > 22 0 0 0 0 1 0 > 23 0 0 1 0 0 0 > 24 1 0 0 0 0 0 > 25 0 0 0 0 1 0 > 26 0 0 1 0 0 0 > >> design.matrix.alt= model.matrix( ~strain + strain:treat) >> design.matrix.alt > Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 > 1 1 1 0 0 0 0 > 2 1 1 0 0 0 1 > 3 1 1 0 1 0 0 > 4 1 1 0 0 0 0 > 5 1 1 0 0 0 1 > 6 1 1 0 1 0 0 > 7 1 1 0 0 0 0 > 8 1 1 0 0 0 0 > 9 1 1 0 0 0 1 > 10 1 1 0 1 0 0 > 11 1 1 0 0 0 0 > 12 1 1 0 0 0 1 > 13 1 1 0 1 0 0 > 14 1 0 0 0 0 0 > 15 1 0 0 0 1 0 > 16 1 0 1 0 0 0 > 17 1 0 0 0 0 0 > 18 1 0 0 0 1 0 > 19 1 0 1 0 0 0 > 20 1 0 0 0 0 0 > 21 1 0 0 0 0 0 > 22 1 0 0 0 1 0 > 23 1 0 1 0 0 0 > 24 1 0 0 0 0 0 > 25 1 0 0 0 1 0 > 26 1 0 1 0 0 0 > > Observed equalities (as expected): > MUT == MUT.U - WT.U > WT:T1 == WT.T1 - WT.U > MUT:T1 == MUT.T1 - MUT.U > WT:T2 == WT.T2 - WT.U > MUT:T2 == MUT.T2 - MUT.U > MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) > MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) > > Observed inequalities (only w/ voom pre-processed data; unexpected): > WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) > MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) > > -- > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.3 years ago Gordon Smyth 50k

0

Entering edit mode

Thank you very much, Gordon. Out of curiosity, why did you develop the voom method when there is edgeR? Are there good, perhaps data-dependent, reasons for using one over the other? Best, Reiner On 03/01/13 23:59, Gordon K Smyth wrote: > Dear Reiner, > > You don't mention contrasts.fit(), but I assume that you are using that > function to get equivalent quantities from the two design matrices. > > If I am guessing your problem correctly, then the issue is that > contrasts.fit() gives results that are not exact in the presence of voom > weights. See the warning note on the help page for contrasts.fit. > > You will get the exact correct t-statistic if the contrast you want is a > coefficient of your design matrix. You should also get exact correct > results when using the one way layout design formula ~0+cond, but you > may get an approximate test statistic if you use a contrast from the > nested formula. The fold changes are exact in all cases. > > There was a thread about this a few years ago, see: > > https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/0 30947.html > > > Basically I have promised to change contrasts.fit() to give exact > results in all cases, even though this will make it much slower in some > cases. I have not so far found the time to do it, but will definitely > do so by the time we publish the voom method. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > > >> Date: Wed, 2 Jan 2013 13:31:58 +0100 >> From: Reiner Schulz <reiner.schulz at="" kcl.ac.uk=""> >> To: <bioconductor at="" r-project.org=""> >> Subject: [BioC] limma, voom, weights: factorial versus nested >> interaction results >> >> Dear all, >> >> Am analysing an RNA-seq experiment with a two-way factorial design, and >> am observing different results for two contrasts of interest depending >> on whether I use a factorial model matrix or a nested interaction model >> matrix. I only observe this if the data are pre-processed w/ voom: >>> counts.voom= voom( ... >> i.e., when there are observational level weights passed to lmFit. If I >> only pass counts.voom$E (no observational level weights) to lmFit, the >> results for the two contrasts of interest are identical for the >> factorial and nested interaction model matrices. >> Is this to be expected, and if so, what is the explanation? >> >> Much appreciated, Reiner >> >> PS: some more details ... >> >>> strain >> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT >> WT WT >> [20] WT WT WT WT WT WT WT >> Levels: WT MUT >>> treat >> [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 >> T1 U T2 >> [26] T1 >> Levels: U T1 T2 >> >>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', >> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >>> design.matrix= model.matrix( ~0 + cond) >>> design.matrix >> WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 >> 1 0 1 0 0 0 0 >> 2 0 0 0 0 0 1 >> 3 0 0 0 1 0 0 >> 4 0 1 0 0 0 0 >> 5 0 0 0 0 0 1 >> 6 0 0 0 1 0 0 >> 7 0 1 0 0 0 0 >> 8 0 1 0 0 0 0 >> 9 0 0 0 0 0 1 >> 10 0 0 0 1 0 0 >> 11 0 1 0 0 0 0 >> 12 0 0 0 0 0 1 >> 13 0 0 0 1 0 0 >> 14 1 0 0 0 0 0 >> 15 0 0 0 0 1 0 >> 16 0 0 1 0 0 0 >> 17 1 0 0 0 0 0 >> 18 0 0 0 0 1 0 >> 19 0 0 1 0 0 0 >> 20 1 0 0 0 0 0 >> 21 1 0 0 0 0 0 >> 22 0 0 0 0 1 0 >> 23 0 0 1 0 0 0 >> 24 1 0 0 0 0 0 >> 25 0 0 0 0 1 0 >> 26 0 0 1 0 0 0 >> >>> design.matrix.alt= model.matrix( ~strain + strain:treat) >>> design.matrix.alt >> Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 >> 1 1 1 0 0 0 0 >> 2 1 1 0 0 0 1 >> 3 1 1 0 1 0 0 >> 4 1 1 0 0 0 0 >> 5 1 1 0 0 0 1 >> 6 1 1 0 1 0 0 >> 7 1 1 0 0 0 0 >> 8 1 1 0 0 0 0 >> 9 1 1 0 0 0 1 >> 10 1 1 0 1 0 0 >> 11 1 1 0 0 0 0 >> 12 1 1 0 0 0 1 >> 13 1 1 0 1 0 0 >> 14 1 0 0 0 0 0 >> 15 1 0 0 0 1 0 >> 16 1 0 1 0 0 0 >> 17 1 0 0 0 0 0 >> 18 1 0 0 0 1 0 >> 19 1 0 1 0 0 0 >> 20 1 0 0 0 0 0 >> 21 1 0 0 0 0 0 >> 22 1 0 0 0 1 0 >> 23 1 0 1 0 0 0 >> 24 1 0 0 0 0 0 >> 25 1 0 0 0 1 0 >> 26 1 0 1 0 0 0 >> >> Observed equalities (as expected): >> MUT == MUT.U - WT.U >> WT:T1 == WT.T1 - WT.U >> MUT:T1 == MUT.T1 - MUT.U >> WT:T2 == WT.T2 - WT.U >> MUT:T2 == MUT.T2 - MUT.U >> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) >> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) >> >> Observed inequalities (only w/ voom pre-processed data; unexpected): >> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) >> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) >> >> -- >> > > ______________________________________________________________________ > The information in this email is confidential and intended solely for > the addressee. > You must not disclose, forward, print or use it without the permission > of the sender. > ______________________________________________________________________ > -- Reiner Schulz, Ph.D. Senior Lecturer in Bioinformatics and Epigenomics Dept. of Medical and Molecular Genetics King's College London Guy's Hospital, 8th floor Tower Wing London SE1 9RT https://atlas.genetics.kcl.ac.uk/~rschulz

ADD REPLY • link 11.3 years ago Reiner Schulz ▴ 40

0

Entering edit mode

Dear Reiner, Well, I don't honestly know yet which of edgeR or voom are better in what circumstances. I feel that I can't know this until I have pushed both methods as far as they can go, something that I am still doing, and then have tried them both out on a wide variety of datasets. I started the edgeR project way back in 2004, before RNA-Seq technology even existed. My thought then was that non-normal methods would be essential. And I think that it remains true that edgeR has a clear edge over other methods for the SAGE data that was available then. edgeR performs very well, but has not solved all our data analysis problems for RNA-Seq data, not yet anyway. voom has surprised me by performing much better than I expected. It solves the biggest problem with normal-based methods by estimating the mean-variance relationship adaptively. It holds its size correctly in almost any circumstance, scales to very large datasets, can adaptively estimate the amount of smoothing necessary, and gives immediate access to all the downstream limma pipeline including gene set testing. My feeling the moment is that edgeR is superior for small counts and but that voom is safer and more reliable for noisy heterogeneous data. Only edgeR can estimate the biological coefficient of variation (as we defined this in our 2012 paper). But we are actively working on both methods, and are open to what we find. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Fri, 4 Jan 2013, Reiner Schulz wrote: > Thank you very much, Gordon. > > Out of curiosity, why did you develop the voom method when there is > edgeR? Are there good, perhaps data-dependent, reasons for using one > over the other? > > Best, Reiner > > On 03/01/13 23:59, Gordon K Smyth wrote: >> Dear Reiner, >> >> You don't mention contrasts.fit(), but I assume that you are using that >> function to get equivalent quantities from the two design matrices. >> >> If I am guessing your problem correctly, then the issue is that >> contrasts.fit() gives results that are not exact in the presence of voom >> weights. See the warning note on the help page for contrasts.fit. >> >> You will get the exact correct t-statistic if the contrast you want is a >> coefficient of your design matrix. You should also get exact correct >> results when using the one way layout design formula ~0+cond, but you >> may get an approximate test statistic if you use a contrast from the >> nested formula. The fold changes are exact in all cases. >> >> There was a thread about this a few years ago, see: >> >> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/ 030947.html >> >> >> Basically I have promised to change contrasts.fit() to give exact >> results in all cases, even though this will make it much slower in some >> cases. I have not so far found the time to do it, but will definitely >> do so by the time we publish the voom method. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> >> >>> Date: Wed, 2 Jan 2013 13:31:58 +0100 >>> From: Reiner Schulz <reiner.schulz at="" kcl.ac.uk=""> >>> To: <bioconductor at="" r-project.org=""> >>> Subject: [BioC] limma, voom, weights: factorial versus nested >>> interaction results >>> >>> Dear all, >>> >>> Am analysing an RNA-seq experiment with a two-way factorial design, and >>> am observing different results for two contrasts of interest depending >>> on whether I use a factorial model matrix or a nested interaction model >>> matrix. I only observe this if the data are pre-processed w/ voom: >>>> counts.voom= voom( ... >>> i.e., when there are observational level weights passed to lmFit. If I >>> only pass counts.voom$E (no observational level weights) to lmFit, the >>> results for the two contrasts of interest are identical for the >>> factorial and nested interaction model matrices. >>> Is this to be expected, and if so, what is the explanation? >>> >>> Much appreciated, Reiner >>> >>> PS: some more details ... >>> >>>> strain >>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT >>> WT WT >>> [20] WT WT WT WT WT WT WT >>> Levels: WT MUT >>>> treat >>> [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 >>> T1 U T2 >>> [26] T1 >>> Levels: U T1 T2 >>> >>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', >>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >>>> design.matrix= model.matrix( ~0 + cond) >>>> design.matrix >>> WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 >>> 1 0 1 0 0 0 0 >>> 2 0 0 0 0 0 1 >>> 3 0 0 0 1 0 0 >>> 4 0 1 0 0 0 0 >>> 5 0 0 0 0 0 1 >>> 6 0 0 0 1 0 0 >>> 7 0 1 0 0 0 0 >>> 8 0 1 0 0 0 0 >>> 9 0 0 0 0 0 1 >>> 10 0 0 0 1 0 0 >>> 11 0 1 0 0 0 0 >>> 12 0 0 0 0 0 1 >>> 13 0 0 0 1 0 0 >>> 14 1 0 0 0 0 0 >>> 15 0 0 0 0 1 0 >>> 16 0 0 1 0 0 0 >>> 17 1 0 0 0 0 0 >>> 18 0 0 0 0 1 0 >>> 19 0 0 1 0 0 0 >>> 20 1 0 0 0 0 0 >>> 21 1 0 0 0 0 0 >>> 22 0 0 0 0 1 0 >>> 23 0 0 1 0 0 0 >>> 24 1 0 0 0 0 0 >>> 25 0 0 0 0 1 0 >>> 26 0 0 1 0 0 0 >>> >>>> design.matrix.alt= model.matrix( ~strain + strain:treat) >>>> design.matrix.alt >>> Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 >>> 1 1 1 0 0 0 0 >>> 2 1 1 0 0 0 1 >>> 3 1 1 0 1 0 0 >>> 4 1 1 0 0 0 0 >>> 5 1 1 0 0 0 1 >>> 6 1 1 0 1 0 0 >>> 7 1 1 0 0 0 0 >>> 8 1 1 0 0 0 0 >>> 9 1 1 0 0 0 1 >>> 10 1 1 0 1 0 0 >>> 11 1 1 0 0 0 0 >>> 12 1 1 0 0 0 1 >>> 13 1 1 0 1 0 0 >>> 14 1 0 0 0 0 0 >>> 15 1 0 0 0 1 0 >>> 16 1 0 1 0 0 0 >>> 17 1 0 0 0 0 0 >>> 18 1 0 0 0 1 0 >>> 19 1 0 1 0 0 0 >>> 20 1 0 0 0 0 0 >>> 21 1 0 0 0 0 0 >>> 22 1 0 0 0 1 0 >>> 23 1 0 1 0 0 0 >>> 24 1 0 0 0 0 0 >>> 25 1 0 0 0 1 0 >>> 26 1 0 1 0 0 0 >>> >>> Observed equalities (as expected): >>> MUT == MUT.U - WT.U >>> WT:T1 == WT.T1 - WT.U >>> MUT:T1 == MUT.T1 - MUT.U >>> WT:T2 == WT.T2 - WT.U >>> MUT:T2 == MUT.T2 - MUT.U >>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) >>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) >>> >>> Observed inequalities (only w/ voom pre-processed data; unexpected): >>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) >>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) >>> >>> -- >>> >> > > -- > Reiner Schulz, Ph.D. > Senior Lecturer in Bioinformatics and Epigenomics > Dept. of Medical and Molecular Genetics > King's College London > Guy's Hospital, 8th floor Tower Wing > London SE1 9RT > https://atlas.genetics.kcl.ac.uk/~rschulz > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 11.3 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, Thanks again. That's very helpful. I am working w/ iCLIP data. In terms of size, probably similar to SAGE: only 10^5 to 10^6 tags/reads for a typical human sample. Have applied both edgeR and voom, and the results do not seem to be widely different; though not what I expected, but that's another matter. Best, Reiner On 06/01/13 05:08, Gordon K Smyth wrote: > Dear Reiner, > > Well, I don't honestly know yet which of edgeR or voom are better in > what circumstances. I feel that I can't know this until I have pushed > both methods as far as they can go, something that I am still doing, and > then have tried them both out on a wide variety of datasets. > > I started the edgeR project way back in 2004, before RNA-Seq technology > even existed. My thought then was that non-normal methods would be > essential. And I think that it remains true that edgeR has a clear edge > over other methods for the SAGE data that was available then. > > edgeR performs very well, but has not solved all our data analysis > problems for RNA-Seq data, not yet anyway. > > voom has surprised me by performing much better than I expected. It > solves the biggest problem with normal-based methods by estimating the > mean-variance relationship adaptively. It holds its size correctly in > almost any circumstance, scales to very large datasets, can adaptively > estimate the amount of smoothing necessary, and gives immediate access > to all the downstream limma pipeline including gene set testing. > > My feeling the moment is that edgeR is superior for small counts and but > that voom is safer and more reliable for noisy heterogeneous data. Only > edgeR can estimate the biological coefficient of variation (as we > defined this in our 2012 paper). But we are actively working on both > methods, and are open to what we find. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > On Fri, 4 Jan 2013, Reiner Schulz wrote: > >> Thank you very much, Gordon. >> >> Out of curiosity, why did you develop the voom method when there is >> edgeR? Are there good, perhaps data-dependent, reasons for using one >> over the other? >> >> Best, Reiner >> >> On 03/01/13 23:59, Gordon K Smyth wrote: >>> Dear Reiner, >>> >>> You don't mention contrasts.fit(), but I assume that you are using that >>> function to get equivalent quantities from the two design matrices. >>> >>> If I am guessing your problem correctly, then the issue is that >>> contrasts.fit() gives results that are not exact in the presence of voom >>> weights. See the warning note on the help page for contrasts.fit. >>> >>> You will get the exact correct t-statistic if the contrast you want is a >>> coefficient of your design matrix. You should also get exact correct >>> results when using the one way layout design formula ~0+cond, but you >>> may get an approximate test statistic if you use a contrast from the >>> nested formula. The fold changes are exact in all cases. >>> >>> There was a thread about this a few years ago, see: >>> >>> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December /030947.html >>> >>> >>> >>> Basically I have promised to change contrasts.fit() to give exact >>> results in all cases, even though this will make it much slower in some >>> cases. I have not so far found the time to do it, but will definitely >>> do so by the time we publish the voom method. >>> >>> Best wishes >>> Gordon >>> >>> --------------------------------------------- >>> Professor Gordon K Smyth, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> http://www.statsci.org/smyth >>> >>> >>> >>>> Date: Wed, 2 Jan 2013 13:31:58 +0100 >>>> From: Reiner Schulz <reiner.schulz at="" kcl.ac.uk=""> >>>> To: <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] limma, voom, weights: factorial versus nested >>>> interaction results >>>> >>>> Dear all, >>>> >>>> Am analysing an RNA-seq experiment with a two-way factorial design, and >>>> am observing different results for two contrasts of interest depending >>>> on whether I use a factorial model matrix or a nested interaction model >>>> matrix. I only observe this if the data are pre-processed w/ voom: >>>>> counts.voom= voom( ... >>>> i.e., when there are observational level weights passed to lmFit. If I >>>> only pass counts.voom$E (no observational level weights) to lmFit, the >>>> results for the two contrasts of interest are identical for the >>>> factorial and nested interaction model matrices. >>>> Is this to be expected, and if so, what is the explanation? >>>> >>>> Much appreciated, Reiner >>>> >>>> PS: some more details ... >>>> >>>>> strain >>>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT >>>> WT WT >>>> [20] WT WT WT WT WT WT WT >>>> Levels: WT MUT >>>>> treat >>>> [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 >>>> T1 U T2 >>>> [26] T1 >>>> Levels: U T1 T2 >>>> >>>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', >>>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >>>>> design.matrix= model.matrix( ~0 + cond) >>>>> design.matrix >>>> WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 >>>> 1 0 1 0 0 0 0 >>>> 2 0 0 0 0 0 1 >>>> 3 0 0 0 1 0 0 >>>> 4 0 1 0 0 0 0 >>>> 5 0 0 0 0 0 1 >>>> 6 0 0 0 1 0 0 >>>> 7 0 1 0 0 0 0 >>>> 8 0 1 0 0 0 0 >>>> 9 0 0 0 0 0 1 >>>> 10 0 0 0 1 0 0 >>>> 11 0 1 0 0 0 0 >>>> 12 0 0 0 0 0 1 >>>> 13 0 0 0 1 0 0 >>>> 14 1 0 0 0 0 0 >>>> 15 0 0 0 0 1 0 >>>> 16 0 0 1 0 0 0 >>>> 17 1 0 0 0 0 0 >>>> 18 0 0 0 0 1 0 >>>> 19 0 0 1 0 0 0 >>>> 20 1 0 0 0 0 0 >>>> 21 1 0 0 0 0 0 >>>> 22 0 0 0 0 1 0 >>>> 23 0 0 1 0 0 0 >>>> 24 1 0 0 0 0 0 >>>> 25 0 0 0 0 1 0 >>>> 26 0 0 1 0 0 0 >>>> >>>>> design.matrix.alt= model.matrix( ~strain + strain:treat) >>>>> design.matrix.alt >>>> Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 >>>> 1 1 1 0 0 0 0 >>>> 2 1 1 0 0 0 1 >>>> 3 1 1 0 1 0 0 >>>> 4 1 1 0 0 0 0 >>>> 5 1 1 0 0 0 1 >>>> 6 1 1 0 1 0 0 >>>> 7 1 1 0 0 0 0 >>>> 8 1 1 0 0 0 0 >>>> 9 1 1 0 0 0 1 >>>> 10 1 1 0 1 0 0 >>>> 11 1 1 0 0 0 0 >>>> 12 1 1 0 0 0 1 >>>> 13 1 1 0 1 0 0 >>>> 14 1 0 0 0 0 0 >>>> 15 1 0 0 0 1 0 >>>> 16 1 0 1 0 0 0 >>>> 17 1 0 0 0 0 0 >>>> 18 1 0 0 0 1 0 >>>> 19 1 0 1 0 0 0 >>>> 20 1 0 0 0 0 0 >>>> 21 1 0 0 0 0 0 >>>> 22 1 0 0 0 1 0 >>>> 23 1 0 1 0 0 0 >>>> 24 1 0 0 0 0 0 >>>> 25 1 0 0 0 1 0 >>>> 26 1 0 1 0 0 0 >>>> >>>> Observed equalities (as expected): >>>> MUT == MUT.U - WT.U >>>> WT:T1 == WT.T1 - WT.U >>>> MUT:T1 == MUT.T1 - MUT.U >>>> WT:T2 == WT.T2 - WT.U >>>> MUT:T2 == MUT.T2 - MUT.U >>>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) >>>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) >>>> >>>> Observed inequalities (only w/ voom pre-processed data; unexpected): >>>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) >>>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) >>>> >>>> -- >>>> >>> >> >> -- >> Reiner Schulz, Ph.D. >> Senior Lecturer in Bioinformatics and Epigenomics >> Dept. of Medical and Molecular Genetics >> King's College London >> Guy's Hospital, 8th floor Tower Wing >> London SE1 9RT >> https://atlas.genetics.kcl.ac.uk/~rschulz >> >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:21}}

ADD REPLY • link 11.3 years ago Reiner Schulz ▴ 10

0

Entering edit mode

Dear Gordon, Thanks again. That's very helpful. I am working w/ iCLIP data. In terms of size, probably similar to SAGE: only 10^5 to 10^6 tags/reads for a typical human sample. Have applied both edgeR and voom, and the results do not seem to be widely different; though not what I expected, but that's another matter. Best, Reiner On 06/01/13 05:08, Gordon K Smyth wrote: > Dear Reiner, > > Well, I don't honestly know yet which of edgeR or voom are better in > what circumstances. I feel that I can't know this until I have pushed > both methods as far as they can go, something that I am still doing, and > then have tried them both out on a wide variety of datasets. > > I started the edgeR project way back in 2004, before RNA-Seq technology > even existed. My thought then was that non-normal methods would be > essential. And I think that it remains true that edgeR has a clear edge > over other methods for the SAGE data that was available then. > > edgeR performs very well, but has not solved all our data analysis > problems for RNA-Seq data, not yet anyway. > > voom has surprised me by performing much better than I expected. It > solves the biggest problem with normal-based methods by estimating the > mean-variance relationship adaptively. It holds its size correctly in > almost any circumstance, scales to very large datasets, can adaptively > estimate the amount of smoothing necessary, and gives immediate access > to all the downstream limma pipeline including gene set testing. > > My feeling the moment is that edgeR is superior for small counts and but > that voom is safer and more reliable for noisy heterogeneous data. Only > edgeR can estimate the biological coefficient of variation (as we > defined this in our 2012 paper). But we are actively working on both > methods, and are open to what we find. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > On Fri, 4 Jan 2013, Reiner Schulz wrote: > >> Thank you very much, Gordon. >> >> Out of curiosity, why did you develop the voom method when there is >> edgeR? Are there good, perhaps data-dependent, reasons for using one >> over the other? >> >> Best, Reiner >> >> On 03/01/13 23:59, Gordon K Smyth wrote: >>> Dear Reiner, >>> >>> You don't mention contrasts.fit(), but I assume that you are using that >>> function to get equivalent quantities from the two design matrices. >>> >>> If I am guessing your problem correctly, then the issue is that >>> contrasts.fit() gives results that are not exact in the presence of voom >>> weights. See the warning note on the help page for contrasts.fit. >>> >>> You will get the exact correct t-statistic if the contrast you want is a >>> coefficient of your design matrix. You should also get exact correct >>> results when using the one way layout design formula ~0+cond, but you >>> may get an approximate test statistic if you use a contrast from the >>> nested formula. The fold changes are exact in all cases. >>> >>> There was a thread about this a few years ago, see: >>> >>> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December /030947.html >>> >>> >>> >>> Basically I have promised to change contrasts.fit() to give exact >>> results in all cases, even though this will make it much slower in some >>> cases. I have not so far found the time to do it, but will definitely >>> do so by the time we publish the voom method. >>> >>> Best wishes >>> Gordon >>> >>> --------------------------------------------- >>> Professor Gordon K Smyth, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> http://www.statsci.org/smyth >>> >>> >>> >>>> Date: Wed, 2 Jan 2013 13:31:58 +0100 >>>> From: Reiner Schulz <reiner.schulz at="" kcl.ac.uk=""> >>>> To: <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] limma, voom, weights: factorial versus nested >>>> interaction results >>>> >>>> Dear all, >>>> >>>> Am analysing an RNA-seq experiment with a two-way factorial design, and >>>> am observing different results for two contrasts of interest depending >>>> on whether I use a factorial model matrix or a nested interaction model >>>> matrix. I only observe this if the data are pre-processed w/ voom: >>>>> counts.voom= voom( ... >>>> i.e., when there are observational level weights passed to lmFit. If I >>>> only pass counts.voom$E (no observational level weights) to lmFit, the >>>> results for the two contrasts of interest are identical for the >>>> factorial and nested interaction model matrices. >>>> Is this to be expected, and if so, what is the explanation? >>>> >>>> Much appreciated, Reiner >>>> >>>> PS: some more details ... >>>> >>>>> strain >>>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT >>>> WT WT >>>> [20] WT WT WT WT WT WT WT >>>> Levels: WT MUT >>>>> treat >>>> [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 >>>> T1 U T2 >>>> [26] T1 >>>> Levels: U T1 T2 >>>> >>>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', >>>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >>>>> design.matrix= model.matrix( ~0 + cond) >>>>> design.matrix >>>> WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 >>>> 1 0 1 0 0 0 0 >>>> 2 0 0 0 0 0 1 >>>> 3 0 0 0 1 0 0 >>>> 4 0 1 0 0 0 0 >>>> 5 0 0 0 0 0 1 >>>> 6 0 0 0 1 0 0 >>>> 7 0 1 0 0 0 0 >>>> 8 0 1 0 0 0 0 >>>> 9 0 0 0 0 0 1 >>>> 10 0 0 0 1 0 0 >>>> 11 0 1 0 0 0 0 >>>> 12 0 0 0 0 0 1 >>>> 13 0 0 0 1 0 0 >>>> 14 1 0 0 0 0 0 >>>> 15 0 0 0 0 1 0 >>>> 16 0 0 1 0 0 0 >>>> 17 1 0 0 0 0 0 >>>> 18 0 0 0 0 1 0 >>>> 19 0 0 1 0 0 0 >>>> 20 1 0 0 0 0 0 >>>> 21 1 0 0 0 0 0 >>>> 22 0 0 0 0 1 0 >>>> 23 0 0 1 0 0 0 >>>> 24 1 0 0 0 0 0 >>>> 25 0 0 0 0 1 0 >>>> 26 0 0 1 0 0 0 >>>> >>>>> design.matrix.alt= model.matrix( ~strain + strain:treat) >>>>> design.matrix.alt >>>> Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 >>>> 1 1 1 0 0 0 0 >>>> 2 1 1 0 0 0 1 >>>> 3 1 1 0 1 0 0 >>>> 4 1 1 0 0 0 0 >>>> 5 1 1 0 0 0 1 >>>> 6 1 1 0 1 0 0 >>>> 7 1 1 0 0 0 0 >>>> 8 1 1 0 0 0 0 >>>> 9 1 1 0 0 0 1 >>>> 10 1 1 0 1 0 0 >>>> 11 1 1 0 0 0 0 >>>> 12 1 1 0 0 0 1 >>>> 13 1 1 0 1 0 0 >>>> 14 1 0 0 0 0 0 >>>> 15 1 0 0 0 1 0 >>>> 16 1 0 1 0 0 0 >>>> 17 1 0 0 0 0 0 >>>> 18 1 0 0 0 1 0 >>>> 19 1 0 1 0 0 0 >>>> 20 1 0 0 0 0 0 >>>> 21 1 0 0 0 0 0 >>>> 22 1 0 0 0 1 0 >>>> 23 1 0 1 0 0 0 >>>> 24 1 0 0 0 0 0 >>>> 25 1 0 0 0 1 0 >>>> 26 1 0 1 0 0 0 >>>> >>>> Observed equalities (as expected): >>>> MUT == MUT.U - WT.U >>>> WT:T1 == WT.T1 - WT.U >>>> MUT:T1 == MUT.T1 - MUT.U >>>> WT:T2 == WT.T2 - WT.U >>>> MUT:T2 == MUT.T2 - MUT.U >>>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) >>>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) >>>> >>>> Observed inequalities (only w/ voom pre-processed data; unexpected): >>>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) >>>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) >>>> >>>> -- >>>> >>> >> >> -- >> Reiner Schulz, Ph.D. >> Senior Lecturer in Bioinformatics and Epigenomics >> Dept. of Medical and Molecular Genetics >> King's College London >> Guy's Hospital, 8th floor Tower Wing >> London SE1 9RT >> https://atlas.genetics.kcl.ac.uk/~rschulz >> >> > > ______________________________________________________________________ > The information in this email is confidential and intended solely for > the addressee. > You must not disclose, forward, print or use it without the permission > of the sender. > ______________________________________________________________________ > -- Reiner Schulz, Ph.D. Senior Lecturer in Bioinformatics and Epigenomics Dept. of Medical and Molecular Genetics King's College London Guy's Hospital, 8th floor Tower Wing London SE1 9RT https://atlas.genetics.kcl.ac.uk/~rschulz -- Reiner Schulz, Ph.D. Senior Lecturer in Bioinformatics and Epigenomics Dept. of Medical and Molecular Genetics King's College London Guy's Hospital, 8th floor Tower Wing London SE1 9RT https://atlas.genetics.kcl.ac.uk/~rschulz

ADD REPLY • link 11.3 years ago Reiner Schulz ▴ 40

0

Entering edit mode

Hi Gordon, Thanks for the great explanation. Could you expand on what you mean by "noisy heterogeneous data"? That term would seem to describe just about any RNA-seq data set to some degree, but I assume you have something specific in mind. Are you talking about data with a large BCV, or maybe data where different genes have widely-varying dispersions? Also, if it's not too much trouble, could you expand on how voom changes the underlying assumptions of limma? Specifically, which of the assumptions of ANOVA are relaxed and which are not changed? It still confuses me that a normal-based method could work on data that are not expected to be normal, and it's not clear to me how estimating the mean-variance relationship adaptively would make up for using the "wrong" distribution (normal vs negative binomial). Thanks, -Ryan Thompson On 01/05/2013 09:08 PM, Gordon K Smyth wrote: > Dear Reiner, > > Well, I don't honestly know yet which of edgeR or voom are better in > what circumstances. I feel that I can't know this until I have pushed > both methods as far as they can go, something that I am still doing, > and then have tried them both out on a wide variety of datasets. > > I started the edgeR project way back in 2004, before RNA-Seq > technology even existed. My thought then was that non-normal methods > would be essential. And I think that it remains true that edgeR has a > clear edge over other methods for the SAGE data that was available then. > > edgeR performs very well, but has not solved all our data analysis > problems for RNA-Seq data, not yet anyway. > > voom has surprised me by performing much better than I expected. It > solves the biggest problem with normal-based methods by estimating the > mean-variance relationship adaptively. It holds its size correctly in > almost any circumstance, scales to very large datasets, can adaptively > estimate the amount of smoothing necessary, and gives immediate access > to all the downstream limma pipeline including gene set testing. > > My feeling the moment is that edgeR is superior for small counts and > but that voom is safer and more reliable for noisy heterogeneous > data. Only edgeR can estimate the biological coefficient of variation > (as we defined this in our 2012 paper). But we are actively working on > both methods, and are open to what we find. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > On Fri, 4 Jan 2013, Reiner Schulz wrote: > >> Thank you very much, Gordon. >> >> Out of curiosity, why did you develop the voom method when there is >> edgeR? Are there good, perhaps data-dependent, reasons for using one >> over the other? >> >> Best, Reiner >> >> On 03/01/13 23:59, Gordon K Smyth wrote: >>> Dear Reiner, >>> >>> You don't mention contrasts.fit(), but I assume that you are using that >>> function to get equivalent quantities from the two design matrices. >>> >>> If I am guessing your problem correctly, then the issue is that >>> contrasts.fit() gives results that are not exact in the presence of >>> voom >>> weights. See the warning note on the help page for contrasts.fit. >>> >>> You will get the exact correct t-statistic if the contrast you want >>> is a >>> coefficient of your design matrix. You should also get exact correct >>> results when using the one way layout design formula ~0+cond, but you >>> may get an approximate test statistic if you use a contrast from the >>> nested formula. The fold changes are exact in all cases. >>> >>> There was a thread about this a few years ago, see: >>> >>> https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December /030947.html >>> >>> >>> >>> Basically I have promised to change contrasts.fit() to give exact >>> results in all cases, even though this will make it much slower in some >>> cases. I have not so far found the time to do it, but will definitely >>> do so by the time we publish the voom method. >>> >>> Best wishes >>> Gordon >>> >>> --------------------------------------------- >>> Professor Gordon K Smyth, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> http://www.statsci.org/smyth >>> >>> >>> >>>> Date: Wed, 2 Jan 2013 13:31:58 +0100 >>>> From: Reiner Schulz <reiner.schulz at="" kcl.ac.uk=""> >>>> To: <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] limma, voom, weights: factorial versus nested >>>> interaction results >>>> >>>> Dear all, >>>> >>>> Am analysing an RNA-seq experiment with a two-way factorial design, >>>> and >>>> am observing different results for two contrasts of interest depending >>>> on whether I use a factorial model matrix or a nested interaction >>>> model >>>> matrix. I only observe this if the data are pre-processed w/ voom: >>>>> counts.voom= voom( ... >>>> i.e., when there are observational level weights passed to lmFit. If I >>>> only pass counts.voom$E (no observational level weights) to lmFit, the >>>> results for the two contrasts of interest are identical for the >>>> factorial and nested interaction model matrices. >>>> Is this to be expected, and if so, what is the explanation? >>>> >>>> Much appreciated, Reiner >>>> >>>> PS: some more details ... >>>> >>>>> strain >>>> [1] MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT MUT WT WT WT WT >>>> WT WT >>>> [20] WT WT WT WT WT WT WT >>>> Levels: WT MUT >>>>> treat >>>> [1] U T2 T1 U T2 T1 U U T2 T1 U T2 T1 U T2 T1 U T2 T1 U U T2 >>>> T1 U T2 >>>> [26] T1 >>>> Levels: U T1 T2 >>>> >>>>> cond= factor( paste( sep='.', strain, treat), lev=c('WT.U', 'MUT.U', >>>> 'WT.T1', 'MUT.T1', 'WT.T2', 'MUT.T2')) >>>>> design.matrix= model.matrix( ~0 + cond) >>>>> design.matrix >>>> WT.U MUT.U WT.T1 MUT.T1 WT.T2 MUT.T2 >>>> 1 0 1 0 0 0 0 >>>> 2 0 0 0 0 0 1 >>>> 3 0 0 0 1 0 0 >>>> 4 0 1 0 0 0 0 >>>> 5 0 0 0 0 0 1 >>>> 6 0 0 0 1 0 0 >>>> 7 0 1 0 0 0 0 >>>> 8 0 1 0 0 0 0 >>>> 9 0 0 0 0 0 1 >>>> 10 0 0 0 1 0 0 >>>> 11 0 1 0 0 0 0 >>>> 12 0 0 0 0 0 1 >>>> 13 0 0 0 1 0 0 >>>> 14 1 0 0 0 0 0 >>>> 15 0 0 0 0 1 0 >>>> 16 0 0 1 0 0 0 >>>> 17 1 0 0 0 0 0 >>>> 18 0 0 0 0 1 0 >>>> 19 0 0 1 0 0 0 >>>> 20 1 0 0 0 0 0 >>>> 21 1 0 0 0 0 0 >>>> 22 0 0 0 0 1 0 >>>> 23 0 0 1 0 0 0 >>>> 24 1 0 0 0 0 0 >>>> 25 0 0 0 0 1 0 >>>> 26 0 0 1 0 0 0 >>>> >>>>> design.matrix.alt= model.matrix( ~strain + strain:treat) >>>>> design.matrix.alt >>>> Intercept MUT WT:T1 MUT:T1 WT:T2 MUT:T2 >>>> 1 1 1 0 0 0 0 >>>> 2 1 1 0 0 0 1 >>>> 3 1 1 0 1 0 0 >>>> 4 1 1 0 0 0 0 >>>> 5 1 1 0 0 0 1 >>>> 6 1 1 0 1 0 0 >>>> 7 1 1 0 0 0 0 >>>> 8 1 1 0 0 0 0 >>>> 9 1 1 0 0 0 1 >>>> 10 1 1 0 1 0 0 >>>> 11 1 1 0 0 0 0 >>>> 12 1 1 0 0 0 1 >>>> 13 1 1 0 1 0 0 >>>> 14 1 0 0 0 0 0 >>>> 15 1 0 0 0 1 0 >>>> 16 1 0 1 0 0 0 >>>> 17 1 0 0 0 0 0 >>>> 18 1 0 0 0 1 0 >>>> 19 1 0 1 0 0 0 >>>> 20 1 0 0 0 0 0 >>>> 21 1 0 0 0 0 0 >>>> 22 1 0 0 0 1 0 >>>> 23 1 0 1 0 0 0 >>>> 24 1 0 0 0 0 0 >>>> 25 1 0 0 0 1 0 >>>> 26 1 0 1 0 0 0 >>>> >>>> Observed equalities (as expected): >>>> MUT == MUT.U - WT.U >>>> WT:T1 == WT.T1 - WT.U >>>> MUT:T1 == MUT.T1 - MUT.U >>>> WT:T2 == WT.T2 - WT.U >>>> MUT:T2 == MUT.T2 - MUT.U >>>> MUT:T1 - WT:T1 == (MUT.T1 - MUT.U) - (WT.T1 - WT.U) >>>> MUT:T2 - WT:T2 == (MUT.T2 - MUT.U) - (WT.T2 - WT.U) >>>> >>>> Observed inequalities (only w/ voom pre-processed data; unexpected): >>>> WT:T1 - WT:T2 != (WT.T1 - WT.U) - (WT.T2 - WT.U) >>>> MUT:T1 - MUT:T2 != (MUT.T1 - MUT.U) - (MUT.T2 - MUT.U) >>>> >>>> -- >>>> >>> >> >> -- >> Reiner Schulz, Ph.D. >> Senior Lecturer in Bioinformatics and Epigenomics >> Dept. of Medical and Molecular Genetics >> King's College London >> Guy's Hospital, 8th floor Tower Wing >> London SE1 9RT >> https://atlas.genetics.kcl.ac.uk/~rschulz >> >> > > ______________________________________________________________________ > The information in this email is confidential and intend...{{dropped:4}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.3 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.