edgeR contrasts confusion when multiple groups are involved
1
0
Entering edit mode
@506e191c
Last seen 12 hours ago
Ireland

Hi all,

I have control (C) and disease (D) groups, and the status of a mutation in each sample as absent (0), heterozygous (1), and homozygous (2) so that my design matrix looks like this:

                C.0     C.1     C.2     D.0       D.1       D.2
sample.1        1       0       0       0         0         0
sample.2        1       0       0       0         0         0
sample.3        1       0       0       0         0         0
sample.4        0       1       0       0         0         0
sample.5        0       0       1       0         0         0
sample.6        0       0       1       0         0         0
sample.7        0       0       0       1         0         0
sample.8        0       0       0       1         0         0
sample.9        0       0       0       0         1         0
sample.10       0       0       0       0         1         0
sample.11       0       0       0       0         0         1
sample.12       0       0       0       0         0         1


Is there a way to format my contrasts argument to allow a comparison between D.1 and all C groups? I have tried the following, which was wrong:

qlf <- glmQLFTest(fit,contrast=c(-1,-1,-1,0,1,0))


A more correct answer (but still not correct) is the following:

qlf <- glmQLFTest(fit,contrast=c(-0.33,-0.33,-0.33,0,1,0))


Is there a simple way of writing what I want or am I better off collapsing C.0, C.1, and C.2 into a single control group and comparing D.1 vs C? Thank you.

sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.17.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] EnhancedVolcano_1.10.0 ggrepel_0.9.1          edgeR_3.34.1
[4] limma_3.48.3           BiocManager_1.30.16    ggthemes_4.2.4
[7] gorr_1.4.4             forcats_0.5.1          stringr_1.4.0
[13] tidyr_1.1.3            tibble_3.1.4           ggplot2_3.3.5
[16] tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] fs_1.5.0           lubridate_1.7.10   bit64_4.0.5        ash_1.0-15
[5] RColorBrewer_1.1-2 httr_1.4.2         repr_1.1.3         tools_4.1.1
[9] backports_1.2.1    utf8_1.2.2         R6_2.5.1           KernSmooth_2.23-20
[13] vipor_0.4.5        DBI_1.1.1          colorspace_2.0-2   withr_2.4.2
[17] tidyselect_1.1.1   ggrastr_0.2.3      ggalt_0.4.0        bit_4.0.4
[21] curl_4.3.2         compiler_4.1.1     extrafontdb_1.0    cli_3.0.1
[25] rvest_1.0.1        Cairo_1.5-12.2     xml2_1.3.2         labeling_0.4.2
[33] digest_0.6.28      base64enc_0.1-3    pkgconfig_2.0.3    htmltools_0.5.2
[37] extrafont_0.17     dbplyr_2.1.1       fastmap_1.1.0      maps_3.4.0
[45] generics_0.1.0     jsonlite_1.7.2     vroom_1.5.5        magrittr_2.0.1
[49] Rcpp_1.0.7         ggbeeswarm_0.6.0   IRkernel_1.2       munsell_0.5.0
[53] fansi_0.4.2        lifecycle_1.0.1    stringi_1.7.4      MASS_7.3-54
[57] grid_4.1.1         parallel_4.1.1     crayon_1.4.1       lattice_0.20-45
[61] splines_4.1.1      IRdisplay_1.0      haven_2.4.3        hms_1.1.1
[65] locfit_1.5-9.4     pillar_1.6.3       uuid_0.1-4         reprex_2.0.1
[69] glue_1.4.2         evaluate_0.14      modelr_0.1.8       vctrs_0.3.8
[73] tzdb_0.1.2         Rttf2pt1_1.3.9     cellranger_1.1.0   gtable_0.3.0
[77] openssl_1.4.5      assertthat_0.2.1   broom_0.7.9        beeswarm_0.4.0
[81] ellipsis_0.3.2

edgeR • 104 views
2
Entering edit mode
@gordon-smyth
Last seen 24 minutes ago
WEHI, Melbourne, Australia

Why do you say that the second contrast is "still not correct"? It seems the obvious way to make the comparison you want.

You can collapse C.0, C.1 and C.2 into one group if mutation status has no effect in the controls, otherwise not. Collapsing into one group is not actually wrong but it will be more conservative than the contrast if there is genuine DE between C.0, C.1 and C.2.

0
Entering edit mode

I had been using the collapsed C.0, C.1, C.2 as the ground truth, as the results from the second contrast were quite close to the collapsed 'ground truth' (which is conservative as you say).

Apologies for beating a dead horse, but I'm struggling to understand how the second contrast is less conservative than the collapse method:

D.1 - (C.0 + C.1 + C.2)/3  #contrast method
D.1 - C                    #collapse method


Thank you very much for you help.

0
Entering edit mode

Because the residual variability (as measured by the two dispersion parameters) is smaller. If you fit the full correct model (with C.0, C.1 and C.2) then the residual dispersion is estimated from pure error. If you fit an overly simple model (with C) then the residual dispersion is inflated by systematic differences.