Question

Main effect on multiple comparisons in limma

0

Entering edit mode

roger.moreno.justicia • 0

@035265c1

Last seen 9 weeks ago

Denmark

Dear limma experts,

I am running a limma analysis where I am testing for differential expression analysis and I have two variables I want to test. One is tissue and can be muscle or adipose_tissue, the other one is timepoint, which can be pre and post. When entering my contrasts, if I define my groupings as a combination of both variables (muscle_pre, muscle_post, AT_pre, AT_post), I understand that the main effect of timepoint should be written as:

(muscle_post + AT_post)/2 - (muscle_pre + AT_pre)/2

Nonetheless, if I only group my samples as pre and post (ingoring tissue) and use this contrast:

post - pre

The results differsubstantially. My first contrast gives many more significantly different genes. When using the multiple comparison approach I get 853 significantly different proteins, whereas when using the post - pre contrast I get 280.

Why is that?

Many thanks in advance!

Roger

First contrast:


metadata <- data.frame(
    "sample_id" = colnames(data_matrix),
    "grouping" = rep(c(
        "AT_post",
        "AT_pre",
        "muscle_post",
        "muscle_pre"
    ), 8),
    "subject_id" = c(
        rep("S01", 4),
        rep("S02", 4),
        rep("S03", 4),
        rep("S04", 4),
        rep("S05", 4),
        rep("S06", 4),
        rep("S07", 4),
        rep("S08", 4)
    )
)

design_matrix <- model.matrix(~ 0 + metadata$grouping + metadata$subject_id)

colnames(design_matrix) <- c(
    "AT_post",
    "AT_pre",
    "muscle_post",
    "muscle_pre",
    "S02",
    "S03",
    "S04",
    "S05",
    "S06",
    "S07",
    "S08"
)

# Define the contrasts:

contrast_matrix <- limma::makeContrasts(
    muscleVsAT = (muscle_pre + muscle_post) / 2 - (AT_pre + AT_post) / 2,
    PostVsPre = (muscle_post + AT_post) / 2 - (muscle_pre + AT_pre) / 2,
    baselinemuscleVsAT = muscle_pre - AT_pre,
    musclePostVsPre = muscle_post - muscle_pre,
    ATPostVsPre = AT_post - AT_pre,
    levels = colnames(design_matrix)
)


fit_1 <- limma::eBayes(
    limma::lmFit(
        data_matrix,
        design_matrix
    )
)

ebayes_fit <- limma::eBayes(
    limma::contrasts.fit(
        fit_1,
        contrast_matrix
    )
)

# Extracting results:

DE_results <- limma::topTable(
    ebayes_fit,
    number = Inf
)

length(DE_results |> 
           dplyr::filter(adj.P.Val < 0.05) |> 
           dplyr::pull(genes))

[1] 853

Second contrast:

metadata <- data.frame(
    "sample_id" = colnames(data_matrix),
    "grouping" = rep(c(
        "post",
        "pre"
    ), 16),
    "subject_id" = c(
        rep("S01", 4),
        rep("S02", 4),
        rep("S03", 4),
        rep("S04", 4),
        rep("S05", 4),
        rep("S06", 4),
        rep("S07", 4),
        rep("S08", 4)
    )
)

design_matrix <- model.matrix(~ 0 + metadata$grouping + metadata$subject_id)

colnames(design_matrix) <- c(
    "post",
    "pre",
    "S02",
    "S03",
    "S04",
    "S05",
    "S06",
    "S07",
    "S08"
)

# Define the contrasts:

contrast_matrix <- limma::makeContrasts(
    PostVsPre = post - pre,
    levels = colnames(design_matrix)
)


fit_1 <- limma::eBayes(
    limma::lmFit(
        data_matrix,
        design_matrix
    )
)

ebayes_fit <- limma::eBayes(
    limma::contrasts.fit(
        fit_1,
        contrast_matrix
    )
)

# Extracting results:

DE_results <- limma::topTable(
    ebayes_fit,
    number = Inf
)

length(DE_results |> 
           dplyr::filter(adj.P.Val < 0.05) |> 
           dplyr::pull(genes))

[1] 280

sessionInfo( )

R version 4.3.2 (2023-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.utf8 [2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

time zone: Europe/Copenhagen tzcode source: internal

attached base packages: [1] stats graphics grDevices datasets utils methods
[7] base

other attached packages: [1] usethis_2.2.2

loaded via a namespace (and not attached): [1] bit_4.0.5 jsonlite_1.8.8 gtable_0.3.4
[4] limma_3.58.1 dplyr_1.1.4 compiler_4.3.2
[7] renv_1.0.3 crayon_1.5.2 Rcpp_1.0.12
[10] tidyselect_1.2.0 parallel_4.3.2 tidyr_1.3.1
[13] scales_1.3.0 yaml_2.3.8 fastmap_1.1.1
[16] statmod_1.5.0 here_1.0.1 ggplot2_3.4.4
[19] R6_2.5.1 labeling_0.4.3 pak_0.7.1
[22] generics_0.1.3 htmlwidgets_1.6.4 ggrepel_0.9.5
[25] tibble_3.2.1 munsell_0.5.0 rprojroot_2.0.4
[28] pillar_1.9.0 tzdb_0.4.0 rlang_1.1.3
[31] utf8_1.2.4 fs_1.6.3 lazyeval_0.2.2
[34] bit64_4.0.5 viridisLite_0.4.2 plotly_4.10.4
[37] cli_3.6.2 withr_3.0.0 magrittr_2.0.3
[40] crosstalk_1.2.1 digest_0.6.34 grid_4.3.2
[43] vroom_1.6.5 rstudioapi_0.15.0 lifecycle_1.0.4
[46] vctrs_0.6.5 glue_1.7.0 data.table_1.14.10 [49] farver_2.1.1 fansi_1.0.6 colorspace_2.1-0
[52] httr_1.4.7 purrr_1.0.2 ellipsis_0.3.2
[55] tools_4.3.2 pkgconfig_2.0.3 htmltools_0.5.7

```

limmaGUI lim MultipleComparison multip • 213 views

ADD COMMENT • link updated 10 weeks ago by Gordon Smyth 50k • written 10 weeks ago by roger.moreno.justicia • 0

score 0 · Answer 1 · 2024-02-17

The first analysis adjusts for differences between the two tissues (muscle and adipose) whereas the second doesn't. The first approach will therefore produce a lower residual variance and hence more DE genes.

To put the question around the other way, how could you reasonably expect to misrepresent the nature of your data, ignoring the differences between the two tissues, and still get good results?

By the way, I don't recommend main effects. In my opinion, conditional effects are more useful than main effects for genomic analyses, see for example: How does one test for interactions in EdgeR?