Hello,
I'm carrying out an analysis using voomWithQualityWeights
. I used the decideTests
function with default settings and also with the method nestedF
. I noticed that nestedF
identifies many more differentially expressed genes relative to the separate
method. I read here that it might be desirable to use the nestedF
method when an experiment includes many contrasts. I'm creating 15 contrasts, would it be justified to use the nestedF
method in this analysis? Any help would be greatly appreciated.
Here is the code:
txi_kallisto <- tximport::tximport(input_files,
type = "kallisto",
tx2gene = transcript_to_gene,
countsFromAbundance = "lengthScaledTPM"
)
group <- samples_table$condition
dge <- edgeR::DGEList(txi_kallisto$counts, group = group)
keep <- edgeR::filterByExpr(dge, min.count = 10, min.total.count = 15, min.prop = 0.7)
dge <- dge[keep, ]
dge <- edgeR::calcNormFactors(dge, method = "TMM")
head(dge$samples)
group lib.size norm.factors
ACE-C15 ACE 13184400 0.8924529
ACE-C24 ACE 12104176 1.2864667
ACE-C46 ACE 20154212 0.9370196
ACE-C58 ACE 13473161 1.1760556
BCY-C09 BCY 16361992 0.9140883
BCY-C44 BCY 8405135 0.9635902
# "block" and "ct.room" are two batch factors
design <- model.matrix(~ 0 + condition + block + ct.room, data = samples_table)
colnames(design) <- gsub("condition", "", colnames(design))
Then, I created a matrix of contrasts, which looks like this:
contrs_mat[1:4, 1:4]
Contrasts
Levels CONvsACE CONvsBCY CONvsCCY CONvsCIM
CON 1 1 1 1
ACE -1 0 0 0
BCY 0 -1 0 0
CCY 0 0 -1 0
vwts <- limma::voomWithQualityWeights(dge, design, plot = TRUE)
vwts_fit <- limma::lmFit(vwts)
contrs_fit <- limma::contrasts.fit(vwts_fit, contrasts = contrs_mat)
contrs_fit <- limma::eBayes(contrs_fit, robust = TRUE)
dt <- limma::decideTests(contrs_fit, method = "separate",
adjust.method = "BH", p.value = 0.05
)
summary(dt)[, 1:4]
CONvsACE CONvsBCY CONvsCCY CONvsCIM
Down 1 0 49 213
NotSig 9779 9785 9638 9402
Up 5 0 98 170
dt_nestedf <- limma::decideTests(contrs_fit, method = "nestedF",
adjust.method = "BH", p.value = 0.05
)
summary(dt_nestedf)[, 1:4]
CONvsACE CONvsBCY CONvsCCY CONvsCIM
Down 152 45 139 272
NotSig 9532 9691 9546 9349
Up 101 49 100 164
Lastly, I extracted 10 significant genes in a treatment of interest and plotted the CPMs:
IMIgenes <- dt_nestedf %>%
as.data.frame() %>%
dplyr::select(matches("IMI")) %>%
tibble::rownames_to_column(var = "ens_gene") %>%
dplyr::filter_all(all_vars(. != 0)) %>% # Choose rows/cases that equal 1 or -1
head(n = 10)
cpm <- edgeR::cpm(dge)
Here is the plot:
sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggthemr_1.1.0 Glimma_1.14.0 edgeR_3.28.1 limma_3.42.2 tximport_1.14.0 rhdf5_2.30.1
[7] biomaRt_2.42.0 statmod_1.4.34 RColorBrewer_1.1-2 forcats_0.5.0 stringr_1.4.0 dplyr_0.8.4
[13] purrr_0.3.3 readr_1.3.1 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.3.0
Thank you for your response, Professor Smyth. Yes, the main purpose of this analysis is classifying genes according to which contrasts they are DE for. I also have a basic question regarding the
limma
output and would appreciate your help. When I run thewrite.fit
function:limma::write.fit(contrs_fit, dt_nestedf, file = "results.txt", adjust = "BH", method = "separate", F.adjust = "BH")
The p-values for the
nestedF
results would be the columnsF.p.value
andF.p.value.adj
? And these values correspond to the numeric matrix with elements classified as -1, 0 or 1 bydecideTests
?