Unexpectedly low number of DE genes?
1
0
Entering edit mode
mhalache • 0
@e91d524f
Last seen 18 months ago
United Kingdom

Hello!

This is my first attempt to do DESeq2 analysis. I followed the vignettes and tutorials I could find online as best, but got unexpectedly low number of DE genes. In the experiment, there are 2 subjects (X90 and X92) and 2 paired samples from each (Tom=control and Venus=treated). The counts were obtained using salmon, and for the DESeq2 analysis (DESeq2_1.36.0) I have used the salmon.merged.gene_counts_length_scaled.tsv, as recommended.

salmon_table<-read.table("results/star_salmon/salmon.merged.gene_counts_length_scaled.tsv",
                          sep="\t",header=TRUE,row.names="gene_id")
cts <- salmon_table[,c(2:5)]      # sub-setting to exclude a subject with problematic sequencing quality  
cts <- as.matrix(round(cts))

coldata<-read.csv("metadata.csv",header=TRUE,row.names=1)
coldata<-coldata[1:4,]              # sub-setting to exclude a subject with problematic sequencing quality

dds<- DESeqDataSetFromMatrix(countData = cts,
                                         colData = coldata,
                                         design= ~ mouse+genotype)

dds <- dds[ rowSums(counts(dds)) > 1, ]

vsd<-vst(dds, blind=TRUE)
pcaData <- plotPCA(vsd, intgroup=c("mouse","genotype"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=genotype, shape=mouse)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) +
  coord_fixed() +
  geom_label_repel(aes(label = name), colour = I(alpha("black", 0.85)), 
                    size = 3,box.padding = 1.5,fill="white",max.overlaps = 30 );

enter image description here

dds<-DESeq(dds)

res_genotype_Venus_vs_Tom_noshrink<- results(dds, contrast=c("genotype","Venus","Tom"), independentFiltering=TRUE)

summary(res_genotype_Venus_vs_Tom_noshrink)
out of 30674 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 12, 0.039%
LFC < 0 (down)     : 57, 0.19%
outliers [1]       : 0, 0%
low counts [2]     : 16057, 52%
(mean count < 20)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

About half (52%) of the genes were filtered due to low counts (<20). Looking at the MA plot, it appears that there are some genes high-ish log2FC exhibiting such average counts.

plot(log2(res_genotype_Venus_vs_Tom_noshrink$baseMean), 
     res_genotype_Venus_vs_Tom_noshrink$log2FoldChange)

enter image description here

I had to plot it as above, since the default plotting command plotMA() produced

plotMA(res_genotype_Venus_vs_Tom_noshrink)

enter image description here

I also performed shrinking and it looks like the genes with < 2^^5 = 32 counts have shrunk log2FC set to ~0 (which, from what I read online can be due to either "low counts" or "high dispersion")

res_genotype_Venus_vs_Tom <- lfcShrink(dds, coef="genotype_Venus_vs_Tom", 
   type="apeglm",res=res_genotype_Venus_vs_Tom_noshrink)

plot(log2(res_genotype_Venus_vs_Tom$baseMean), 
  res_genotype_Venus_vs_Tom$log2FoldChange)

enter image description here

My questions are: am I doing something wrong? Am I missing some DE genes from the mean count < 20 region? And how DESeq2 arrived at the cut-off value of 20?

Thank you in advance for any insight/suggestions!

sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.5 (Nitrogen)

Matrix products: default
BLAS:   /exports/eddie3_apps_local/apps/community/roslin/R/4.2.0/lib64/R/lib/libRblas.so
LAPACK: /exports/eddie3_apps_local/apps/community/roslin/R/4.2.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] apeglm_1.18.0               PCAtools_2.8.0
 [3] tximport_1.24.0             readr_2.1.3
 [5] tzdb_0.3.0                  RColorBrewer_1.1-3
 [7] edgeR_3.38.4                limma_3.52.4
 [9] patchwork_1.1.2             viridis_0.6.2
[11] viridisLite_0.4.0           EnhancedVolcano_1.14.0
[13] ggrepel_0.9.1               cowplot_1.1.1
[15] org.Mm.eg.db_3.15.0         AnnotationDbi_1.58.0
[17] DOSE_3.22.1                 clusterProfiler_4.4.4
[19] pheatmap_1.0.12             ggplot2_3.3.6
[21] DESeq2_1.36.0               memoise_2.0.1
[23] SummarizedExperiment_1.26.1 Biobase_2.56.0
[25] MatrixGenerics_1.8.1        matrixStats_0.62.0
[27] GenomicRanges_1.48.0        GenomeInfoDb_1.32.4
[29] IRanges_2.30.0              S4Vectors_0.34.0
[31] BiocGenerics_0.42.0         Matrix_1.5-1

loaded via a namespace (and not attached):
  [1] shadowtext_0.1.2          fastmatch_1.1-3
  [3] plyr_1.8.7                igraph_1.3.5
  [5] lazyeval_0.2.2            splines_4.2.0
  [7] BiocParallel_1.30.3       digest_0.6.29
  [9] yulab.utils_0.0.5         GOSemSim_2.22.0
 [11] GO.db_3.15.0              fansi_1.0.3
 [13] magrittr_2.0.3            ScaledMatrix_1.4.1
 [15] Biostrings_2.64.1         annotate_1.74.0
 [17] graphlayouts_0.8.2        bdsmatrix_1.3-6
 [19] enrichplot_1.16.2         colorspace_2.0-3
 [21] blob_1.2.3                dplyr_1.0.10
 [23] crayon_1.5.1              RCurl_1.98-1.9
 [25] jsonlite_1.8.2            scatterpie_0.1.8
 [27] genefilter_1.78.0         survival_3.3-1
 [29] ape_5.6-2                 glue_1.6.2
 [31] polyclip_1.10-0           gtable_0.3.0
 [33] zlibbioc_1.42.0           XVector_0.36.0
 [35] DelayedArray_0.22.0       BiocSingular_1.12.0
 [37] scales_1.2.0              mvtnorm_1.1-3
 [39] DBI_1.1.3                 Rcpp_1.0.9
 [41] emdbook_1.3.12            xtable_1.8-4
 [43] gridGraphics_0.5-1        tidytree_0.4.1
 [45] dqrng_0.3.0               bit_4.0.4
 [47] rsvd_1.0.5                httr_1.4.4
 [49] fgsea_1.22.0              ellipsis_0.3.2
 [51] pkgconfig_2.0.3           XML_3.99-0.11
 [53] farver_2.1.1              locfit_1.5-9.6
 [55] utf8_1.2.2                labeling_0.4.2
 [57] ggplotify_0.1.0           tidyselect_1.1.2
 [59] rlang_1.0.4               reshape2_1.4.4
 [61] munsell_0.5.0             tools_4.2.0
 [63] cachem_1.0.6              downloader_0.4
 [65] cli_3.3.0                 generics_0.1.3
 [67] RSQLite_2.2.18            stringr_1.4.0
 [69] fastmap_1.1.0             ggtree_3.4.4
 [71] bit64_4.0.5               tidygraph_1.2.2
 [73] purrr_0.3.5               KEGGREST_1.36.3
 [75] ggraph_2.0.6              nlme_3.1-158
 [77] sparseMatrixStats_1.8.0   aplot_0.1.7
 [79] DO.db_2.9                 compiler_4.2.0
 [81] png_0.1-7                 treeio_1.20.2
 [83] tibble_3.1.7              tweenr_2.0.2
 [85] geneplotter_1.74.0        stringi_1.7.8
 [87] lattice_0.20-45           vctrs_0.4.1
 [89] pillar_1.8.0              lifecycle_1.0.1
 [91] data.table_1.14.2         bitops_1.0-7
 [93] irlba_2.3.5.1             qvalue_2.28.0
 [95] R6_2.5.1                  gridExtra_2.3
 [97] codetools_0.2-18          MASS_7.3-58
 [99] withr_2.5.0               GenomeInfoDbData_1.2.8
[101] parallel_4.2.0            hms_1.1.2
[103] grid_4.2.0                ggfun_0.0.7
[105] beachmat_2.12.0           coda_0.19-4
[107] tidyr_1.2.1               DelayedMatrixStats_1.18.1
[109] bbmle_1.0.25              ggforce_0.4.1
[111] numDeriv_2016.8-1.1
DESeq2 • 802 views
ADD COMMENT
0
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 20 hours ago
San Diego

1) Looking at your PCA, I am not surprised that you don't have many DE genes. You have very little power with so few samples, and the first PC is clearly associated with mouse, not genotype. 2) I think your MAPlot looks like that because limma has an MAplot command as well, and R is using that, not DESeq's. Specify that you want to use DESeq's version, and it should work the way you expect.

I see no reason to think that DESeq is not giving you the correct results for the experiment you ran. You have very few samples, and your read counts are low.

ADD COMMENT
0
Entering edit mode

Thanks, makes sense!

ADD REPLY

Login before adding your answer.

Traffic: 570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6