Question

Multiple testing across both contrasts and genes using the "nestedF" method

2

Entering edit mode

fl ▴ 20

@fl-16173

Last seen 10 days ago

Germany

Hello,

I'm carrying out an analysis using voomWithQualityWeights. I used the decideTests function with default settings and also with the method nestedF. I noticed that nestedF identifies many more differentially expressed genes relative to the separate method. I read here that it might be desirable to use the nestedF method when an experiment includes many contrasts. I'm creating 15 contrasts, would it be justified to use the nestedF method in this analysis? Any help would be greatly appreciated.

Here is the code:

txi_kallisto <- tximport::tximport(input_files,
  type = "kallisto",
  tx2gene = transcript_to_gene,
  countsFromAbundance = "lengthScaledTPM"
)

group <- samples_table$condition
dge <- edgeR::DGEList(txi_kallisto$counts, group = group)

keep <- edgeR::filterByExpr(dge, min.count = 10, min.total.count = 15, min.prop = 0.7)
dge <- dge[keep, ]

dge <- edgeR::calcNormFactors(dge, method = "TMM")

head(dge$samples)
       group lib.size norm.factors
ACE-C15   ACE 13184400    0.8924529
ACE-C24   ACE 12104176    1.2864667
ACE-C46   ACE 20154212    0.9370196
ACE-C58   ACE 13473161    1.1760556
BCY-C09   BCY 16361992    0.9140883
BCY-C44   BCY  8405135    0.9635902

# "block" and "ct.room" are two batch factors
design <- model.matrix(~ 0 + condition + block + ct.room, data = samples_table)
colnames(design) <- gsub("condition", "", colnames(design))

Then, I created a matrix of contrasts, which looks like this:

contrs_mat[1:4, 1:4]

      Contrasts
Levels CONvsACE CONvsBCY CONvsCCY CONvsCIM
   CON        1        1        1        1
   ACE       -1        0        0        0
   BCY        0       -1        0        0
   CCY        0        0       -1        0

vwts <- limma::voomWithQualityWeights(dge, design, plot = TRUE)
vwts_fit <- limma::lmFit(vwts)
contrs_fit <- limma::contrasts.fit(vwts_fit, contrasts = contrs_mat)
contrs_fit <- limma::eBayes(contrs_fit, robust = TRUE)

dt <- limma::decideTests(contrs_fit, method = "separate",
  adjust.method = "BH", p.value = 0.05
)
summary(dt)[, 1:4]
       CONvsACE CONvsBCY CONvsCCY CONvsCIM
Down          1        0       49      213
NotSig     9779     9785     9638     9402
Up            5        0       98      170


dt_nestedf <- limma::decideTests(contrs_fit, method = "nestedF",
  adjust.method = "BH", p.value = 0.05
)
summary(dt_nestedf)[, 1:4]
       CONvsACE CONvsBCY CONvsCCY CONvsCIM
Down        152       45      139      272
NotSig     9532     9691     9546     9349
Up          101       49      100      164

Lastly, I extracted 10 significant genes in a treatment of interest and plotted the CPMs:

IMIgenes <- dt_nestedf %>%
  as.data.frame() %>%
  dplyr::select(matches("IMI")) %>%
  tibble::rownames_to_column(var = "ens_gene") %>%
  dplyr::filter_all(all_vars(. != 0)) %>% # Choose rows/cases that equal 1 or -1
  head(n = 10)

cpm <- edgeR::cpm(dge)

Here is the plot: enter image description here

sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggthemr_1.1.0      Glimma_1.14.0      edgeR_3.28.1       limma_3.42.2       tximport_1.14.0    rhdf5_2.30.1      
 [7] biomaRt_2.42.0     statmod_1.4.34     RColorBrewer_1.1-2 forcats_0.5.0      stringr_1.4.0      dplyr_0.8.4       
[13] purrr_0.3.3        readr_1.3.1        tidyr_1.0.2        tibble_2.1.3       ggplot2_3.2.1      tidyverse_1.3.0

limma rnaseq • 605 views

ADD COMMENT • link 4.2 years ago • updated 4.1 years ago fl ▴ 20

score 2 · Answer 1 · 2020-03-03

I don't recommend the nestedF method merely because many contrasts are being tested. The default decideTests method still performs fine in that circumstance.

Please see the ?decideTests help page or Section 13.3 of the limma User's Guide for advice about which decideTests adjustment method to use. The limma User's Guide says:

method="nestedF" has a more specialised aim to give greater weight to probes which are significant for two or more contrasts. Most multiple testing methods tend to underestimate the number of such probes. There is some practical experience to suggest that method="nestedF" gives less conservative results when finding probes which respond to several different contrasts at once. However this method should still be viewed as experimental. It provides formal false discovery rate control at the probe level only, not at the contrast level.

To find out what nestedF does in more detail, read the documentation page ?classifyTestsF. It does give often give more DE genes than the default method. The total number of genes with any DE contrasts might not change much, but the contrasts with the fewest DE genes with get more entries from "nestedF" than from the default method. It does not provide error rate control for individual contrasts, only at the overall F-test level. For genes with a significant F-test overall, the nestedF method fills in contrasts that are "probably" DE for that gene even if the contrasts do not achieve conventional significance levels. You should use it only if your emphasis is on classifying genes according to which contrasts they are DE for.