I have an RNAseq dataset, where one of the genes I intend to analyze has hundreds of counts ranging from 10 to 12, with a few counts > 9000. I process this data in Deseq2 and get that the gene is differentially expressed across several samples of interest. What can justify these strange counts? Are the results reliable?
library("DESeq2")
counts2=C
dds <- DESeqDataSetFromMatrix(countData=counts2,
colData=expdesign_fucos,
design=~subtype, tidy = TRUE)
# Rodando a função DESeq
dds <- DESeq(dds)
#normalized.counts = counts(dds, normalized=T, replaced=TRUE)
res = results(dds, alpha = 0.01, lfcThreshold =1,
contr = c("subtype","Her2","LumB"))
summary(res)
out of 15 with nonzero total read count
adjusted p-value < 0.01
LFC > 1.00 (up) : 0, 0%
LFC < -1.00 (down) : 1, 6.7%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 12)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
with(colData(dds), table(subtype))
> with(colData(dds), table(subtype))
subtype
Basal Her2 LumA LumB
187 82 563 215
mcols(dds)$maxCooks
> mcols(dds)$maxCooks
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
The results don't seem to be correct for me as this gene (POFUT1) usually has very low values (between 10-12), some samples with large counts (>9000), and looking at the counts table doesn't show significant differences between #groups.
>subset(res, res$lfcSE < 2 & res$padj < 0.01)
log2 fold change (MLE): subtype Her2 vs LumB
Wald test p-value: subtype Her2 vs LumB
DataFrame with 1 row and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
POFUT1 12.2556 -2.15519 0.171235 -6.74621 1.51756e-11 2.27633e-10
sessionInfo( )
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
[3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnvStats_2.4.0 genefilter_1.72.1 DESeq2_1.30.1
[4] SummarizedExperiment_1.20.0 Biobase_2.50.0 MatrixGenerics_1.2.1
[7] matrixStats_0.58.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
[10] IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.1
loaded via a namespace (and not attached):
[1] rgl_0.107.10 Rcpp_1.0.6 locfit_1.5-9.4
[4] lattice_0.20-41 digest_0.6.27 assertthat_0.2.1
[7] utf8_1.2.1 R6_2.5.0 RSQLite_2.2.7
[10] httr_1.4.2 ggplot2_3.3.3 pillar_1.6.1
[13] zlibbioc_1.36.0 rlang_0.4.11 annotate_1.68.0
[16] blob_1.2.1 Matrix_1.3-2 splines_4.0.4
[19] BiocParallel_1.24.1 geneplotter_1.68.0 htmlwidgets_1.5.3
[22] RCurl_1.98-1.3 bit_4.0.4 munsell_0.5.0
[25] DelayedArray_0.16.3 xfun_0.23 compiler_4.0.4
[28] pkgconfig_2.0.3 htmltools_0.5.1.1 tidyselect_1.1.1
[31] tibble_3.1.1 GenomeInfoDbData_1.2.4 XML_3.99-0.6
[34] fansi_0.4.2 crayon_1.4.1 dplyr_1.0.6
[37] MASS_7.3-53.1 bitops_1.0-7 grid_4.0.4
[40] jsonlite_1.7.2 xtable_1.8-4 gtable_0.3.0
[43] lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1
[46] scales_1.1.1 cachem_1.0.5 XVector_0.30.0
[49] robustbase_0.93-8 qpcR_1.4-1 ellipsis_0.3.2
[52] vctrs_0.3.8 generics_0.1.0 RColorBrewer_1.1-2
[55] tools_4.0.4 bit64_4.0.5 glue_1.4.2
[58] DEoptimR_1.0-9 purrr_0.3.4 crosstalk_1.1.1
[61] fastmap_1.1.0 survival_3.2-10 AnnotationDbi_1.52.0
[64] colorspace_2.0-1 minpack.lm_1.2-1 memoise_2.0.0
[67] knitr_1.33