edgeR contrasts confusion when multiple groups are involved
1
0
Entering edit mode
@506e191c
Last seen 12 hours ago
Ireland

Hi all,

I have control (C) and disease (D) groups, and the status of a mutation in each sample as absent (0), heterozygous (1), and homozygous (2) so that my design matrix looks like this:

                C.0     C.1     C.2     D.0       D.1       D.2
sample.1        1       0       0       0         0         0
sample.2        1       0       0       0         0         0
sample.3        1       0       0       0         0         0
sample.4        0       1       0       0         0         0
sample.5        0       0       1       0         0         0
sample.6        0       0       1       0         0         0
sample.7        0       0       0       1         0         0
sample.8        0       0       0       1         0         0
sample.9        0       0       0       0         1         0
sample.10       0       0       0       0         1         0
sample.11       0       0       0       0         0         1
sample.12       0       0       0       0         0         1

Is there a way to format my contrasts argument to allow a comparison between D.1 and all C groups? I have tried the following, which was wrong:

qlf <- glmQLFTest(fit,contrast=c(-1,-1,-1,0,1,0))

A more correct answer (but still not correct) is the following:

qlf <- glmQLFTest(fit,contrast=c(-0.33,-0.33,-0.33,0,1,0))

Is there a simple way of writing what I want or am I better off collapsing C.0, C.1, and C.2 into a single control group and comparing D.1 vs C? Thank you.

sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EnhancedVolcano_1.10.0 ggrepel_0.9.1          edgeR_3.34.1          
 [4] limma_3.48.3           BiocManager_1.30.16    ggthemes_4.2.4        
 [7] gorr_1.4.4             forcats_0.5.1          stringr_1.4.0         
[10] dplyr_1.0.7            purrr_0.3.4            readr_2.0.1           
[13] tidyr_1.1.3            tibble_3.1.4           ggplot2_3.3.5         
[16] tidyverse_1.3.1       

loaded via a namespace (and not attached):
 [1] fs_1.5.0           lubridate_1.7.10   bit64_4.0.5        ash_1.0-15        
 [5] RColorBrewer_1.1-2 httr_1.4.2         repr_1.1.3         tools_4.1.1       
 [9] backports_1.2.1    utf8_1.2.2         R6_2.5.1           KernSmooth_2.23-20
[13] vipor_0.4.5        DBI_1.1.1          colorspace_2.0-2   withr_2.4.2       
[17] tidyselect_1.1.1   ggrastr_0.2.3      ggalt_0.4.0        bit_4.0.4         
[21] curl_4.3.2         compiler_4.1.1     extrafontdb_1.0    cli_3.0.1         
[25] rvest_1.0.1        Cairo_1.5-12.2     xml2_1.3.2         labeling_0.4.2    
[29] scales_1.1.1       proj4_1.0-10.1     askpass_1.1        pbdZMQ_0.3-5      
[33] digest_0.6.28      base64enc_0.1-3    pkgconfig_2.0.3    htmltools_0.5.2   
[37] extrafont_0.17     dbplyr_2.1.1       fastmap_1.1.0      maps_3.4.0        
[41] rlang_0.4.11       readxl_1.3.1       rstudioapi_0.13    farver_2.1.0      
[45] generics_0.1.0     jsonlite_1.7.2     vroom_1.5.5        magrittr_2.0.1    
[49] Rcpp_1.0.7         ggbeeswarm_0.6.0   IRkernel_1.2       munsell_0.5.0     
[53] fansi_0.4.2        lifecycle_1.0.1    stringi_1.7.4      MASS_7.3-54       
[57] grid_4.1.1         parallel_4.1.1     crayon_1.4.1       lattice_0.20-45   
[61] splines_4.1.1      IRdisplay_1.0      haven_2.4.3        hms_1.1.1         
[65] locfit_1.5-9.4     pillar_1.6.3       uuid_0.1-4         reprex_2.0.1      
[69] glue_1.4.2         evaluate_0.14      modelr_0.1.8       vctrs_0.3.8       
[73] tzdb_0.1.2         Rttf2pt1_1.3.9     cellranger_1.1.0   gtable_0.3.0      
[77] openssl_1.4.5      assertthat_0.2.1   broom_0.7.9        beeswarm_0.4.0    
[81] ellipsis_0.3.2
edgeR • 104 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 22 minutes ago
WEHI, Melbourne, Australia

Why do you say that the second contrast is "still not correct"? It seems the obvious way to make the comparison you want.

You can collapse C.0, C.1 and C.2 into one group if mutation status has no effect in the controls, otherwise not. Collapsing into one group is not actually wrong but it will be more conservative than the contrast if there is genuine DE between C.0, C.1 and C.2.

ADD COMMENT
0
Entering edit mode

I had been using the collapsed C.0, C.1, C.2 as the ground truth, as the results from the second contrast were quite close to the collapsed 'ground truth' (which is conservative as you say).

Apologies for beating a dead horse, but I'm struggling to understand how the second contrast is less conservative than the collapse method:

D.1 - (C.0 + C.1 + C.2)/3  #contrast method
D.1 - C                    #collapse method

Could you explain this please?

Thank you very much for you help.

ADD REPLY
0
Entering edit mode

Because the residual variability (as measured by the two dispersion parameters) is smaller. If you fit the full correct model (with C.0, C.1 and C.2) then the residual dispersion is estimated from pure error. If you fit an overly simple model (with C) then the residual dispersion is inflated by systematic differences.

ADD REPLY

Login before adding your answer.

Traffic: 281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6