interpretation complex design limma
Entering edit mode
dfrtyu • 0
Last seen 5 months ago
United Kingdom

Hi everyone, and prof Gordon Smyth

Pls help on how best to view two designs used for limma as below. The objective was to pool higher/secondary-level groups as well as first-level groups of samples within the design to get DGE.

So, with a design and the logCPM mean-variance output i.e. voom() function , Four people used the logic of normal designs and therefore added the 'higher/secondary-level' contrasts as below

ct<-makeContrasts(g2v1=(group2_dead+group2_alive) - (group1_dead+group1_alive),
g2v1dead=group2_dead - group1_dead , g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead) - group1dead+group1alive, levels=design)

b<-eBayes( lmFit(data, design),  contrasts=ct))

sessionInfo( )

My question is : Does this approach have any form of interpretation from the resulting DE or it should be discarded completely in favour of division by numbers as below

ct <- makeContrasts(g2v1=(group2_dead+group2_alive)/2 - (group1_dead+group1_alive)/2  ,
g2v1dead=group2_dead - group1_dead ,    g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead)/2 - (group1dead+group1alive)/2 ,  levels=design)

b<-eBayes( lmFit(data, design),  contrasts=ct))

sessionInfo( )
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] GEOmetadb_1.52.0    RSQLite_2.2.7       GSA_1.03.1          sva_3.38.0         
 [5] BiocParallel_1.24.1 genefilter_1.72.1   mgcv_1.8-31         nlme_3.1-148       
 [9] oligo_1.54.1        Biostrings_2.58.0   XVector_0.30.0      IRanges_2.24.1     
[13] S4Vectors_0.28.1    oligoClasses_1.52.0 affy_1.68.0         forcats_0.5.1      
[17] stringr_1.4.0       dplyr_1.0.6         purrr_0.3.4         readr_1.4.0        
[21] tidyr_1.1.3         tibble_3.1.1        ggplot2_3.3.5       tidyverse_1.3.1    
[25] limma_3.46.0        GEOquery_2.58.0     Biobase_2.50.0      BiocGenerics_0.36.1
limma • 329 views
Entering edit mode
Last seen 22 minutes ago
United States

When you fit a linear model and make comparisons you are always computing the average for a group, and you make comparisons by calculating differences between those averages. In your first contrast you are computing sums, whereas the second you are computing averages. In other words, in

g2v1=(group2_dead+group2_alive) - (group1_dead+group1_alive)

That is the sum of group 2 minus the sum of group 1, which isn't something you would normally care to know.

g2v1=(group2_dead+group2_alive)/2 - (group1_dead+group1_alive)/2

is the average of group 2 minus the average of group 1, which is a readily interpretable quantity.

Entering edit mode

Very many thanks for the reply! @ James MacDonald

Indeed it is probably unnecessary to do g2v1=(group2_dead+group2_alive) hence the question about interpretation vis-a-vis the concept of DE. Part of why I asked about interpretability is because there was a 'non-expert' querying me about the input are all sum of log data

I guess you are indicating that such is not interpretable

Entering edit mode

The two different contrast matrices you give will yield identical lists of DE genes, p-values and FDRs. The only difference will be in the log-fold-changes, which will differ by a factor of 2 for the third contrast. As long as you know what the logFCs mean, both choices lead to the same conclusions, but I would always myself use the mean-mean contrast instead of sum-sum.


Login before adding your answer.

Traffic: 761 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6