interpretation complex design limma
1
0
Entering edit mode
dfrtyu • 0
@grateshak-10586
Last seen 5 weeks ago
United Kingdom

Hi everyone, and prof Gordon Smyth

Pls help on how best to view two designs used for limma as below. The objective was to pool higher/secondary-level groups as well as first-level groups of samples within the design to get DGE.

So, with a design and the logCPM mean-variance output i.e. voom() function , Four people used the logic of normal designs and therefore added the 'higher/secondary-level' contrasts as below

ct<-makeContrasts(g2v1=(group2_dead+group2_alive) - (group1_dead+group1_alive),
g2v1dead=group2_dead - group1_dead , g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead) - group1dead+group1alive, levels=design)

b<-eBayes( contrasts.fit( lmFit(data, design),  contrasts=ct))
summary(decideTests(b))

sessionInfo( )

My question is : Does this approach have any form of interpretation from the resulting DE or it should be discarded completely in favour of division by numbers as below

ct <- makeContrasts(g2v1=(group2_dead+group2_alive)/2 - (group1_dead+group1_alive)/2  ,
g2v1dead=group2_dead - group1_dead ,    g2v1alive=group2_alive - group1_alive, status=(group1_dead+group2dead)/2 - (group1dead+group1alive)/2 ,  levels=design)

b<-eBayes( contrasts.fit( lmFit(data, design),  contrasts=ct))
summary(decideTests(b))

sessionInfo( )
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] GEOmetadb_1.52.0    RSQLite_2.2.7       GSA_1.03.1          sva_3.38.0         
 [5] BiocParallel_1.24.1 genefilter_1.72.1   mgcv_1.8-31         nlme_3.1-148       
 [9] oligo_1.54.1        Biostrings_2.58.0   XVector_0.30.0      IRanges_2.24.1     
[13] S4Vectors_0.28.1    oligoClasses_1.52.0 affy_1.68.0         forcats_0.5.1      
[17] stringr_1.4.0       dplyr_1.0.6         purrr_0.3.4         readr_1.4.0        
[21] tidyr_1.1.3         tibble_3.1.1        ggplot2_3.3.5       tidyverse_1.3.1    
[25] limma_3.46.0        GEOquery_2.58.0     Biobase_2.50.0      BiocGenerics_0.36.1
limma • 218 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

When you fit a linear model and make comparisons you are always computing the average for a group, and you make comparisons by calculating differences between those averages. In your first contrast you are computing sums, whereas the second you are computing averages. In other words, in

g2v1=(group2_dead+group2_alive) - (group1_dead+group1_alive)

That is the sum of group 2 minus the sum of group 1, which isn't something you would normally care to know.

g2v1=(group2_dead+group2_alive)/2 - (group1_dead+group1_alive)/2

is the average of group 2 minus the average of group 1, which is a readily interpretable quantity.

0
Entering edit mode

Very many thanks for the reply! @ James MacDonald

Indeed it is probably unnecessary to do g2v1=(group2_dead+group2_alive) hence the question about interpretation vis-a-vis the concept of DE. Part of why I asked about interpretability is because there was a 'non-expert' querying me about the input are all sum of log data

I guess you are indicating that such is not interpretable

ADD REPLY
0
Entering edit mode

The two different contrast matrices you give will yield identical lists of DE genes, p-values and FDRs. The only difference will be in the log-fold-changes, which will differ by a factor of 2 for the third contrast. As long as you know what the logFCs mean, both choices lead to the same conclusions, but I would always myself use the mean-mean contrast instead of sum-sum.

ADD REPLY

Login before adding your answer.

Traffic: 217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6