I am running the DTU analysis with DRIMseq.
I was wondering what could be the reasons why I get so many NAs in the
results() (in the lr, pvalue and adj_pvalue columns)?
Out of 161750 features, I get 128038 with NAs (running results with the
level="feature" option). At the gene level, I get 33272 NAs out of 40110 genes).
Does it seem like a "reasonable" proportion?
Can I make it a little less stringent in some way?
Note that I was first filtering with
dmFilter(), which I then removed to see if that made a difference, but it doesn't.
On this data set I first performed the differential expression analysis at the gene-level using DESeq2, and then was asked to perform the differential transcript usage, so I gave DRIMSeq a try.
What worries me is that about half the genes that appear significantly differentially expressed between 2 experimental groups (padj < 0.05 in DESeq2) show NA in the lr, pvalue and adj_pvalue fields, in the DRIMSeq gene-level results table.
I know these are different algorithms, and one should not expect the same results, but this is dramatically different!
Do you have any suggestion on how I can "improve" this, or comments in general about this issue?
These are the steps I run:
# import counts (from SALMON) txi <- tximport(files, type="salmon", txOut=TRUE, countsFromAbundance="scaledTPM") # export count-scale data cts <- txi$counts cts <- cts[rowSums(cts) > 0,] # Build a data frame with the gene ID, transcript ID, and columns for each sample (counts). txdf was created from a gencode gtf counts <- data.frame(gene_id=txdf$GENEID, feature_id=txdf$TXNAME, cts) # create a dmDSdata object d <- dmDSdata(counts=counts, samples=sample_table) # build design matrix design_full <- model.matrix(~0+Group, data=DRIMSeq::samples(d)) # Estimate precision set.seed(1) d <- dmPrecision(d, design=design_full) # Fit model d <- dmFit(d, design=design_full, verbose=1) # hypothesis testing dtest <- dmTest(d, contrast=mycontrast) # gene-level results res <- results(dtest) # feature/transcript-level results res.txp <- results(dtest, level = "feature")
R version 4.0.0 (2020-04-24) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Scientific Linux 7.2 (Nitrogen) Matrix products: default BLAS/LAPACK: /nfs/software/as/el7.2/EasyBuild/CRG/software/OpenBLAS/0.3.9-GCC-9.3.0/lib/libopenblas_sandybridgep-r0.3.9.so locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8  LC_PAPER=en_US.UTF-8 LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  stats4 parallel stats graphics grDevices utils datasets  methods base other attached packages:  stageR_1.12.0 SummarizedExperiment_1.20.0  MatrixGenerics_1.2.1 matrixStats_0.59.0  GenomicFeatures_1.42.3 AnnotationDbi_1.52.0  Biobase_2.50.0 GenomicRanges_1.42.0  GenomeInfoDb_1.26.7 IRanges_2.24.1  S4Vectors_0.28.1 BiocGenerics_0.36.1  tximport_1.18.0 DRIMSeq_1.18.0 loaded via a namespace (and not attached):  httr_1.4.2 edgeR_3.32.1 jsonlite_1.7.2  bit64_4.0.5 assertthat_0.2.1 askpass_1.1  BiocFileCache_1.14.0 blob_1.2.1 GenomeInfoDbData_1.2.4  Rsamtools_2.6.0 progress_1.2.2 pillar_1.6.1  RSQLite_2.2.7 lattice_0.20-44 glue_1.4.2  limma_3.46.0 XVector_0.30.0 colorspace_2.0-2  Matrix_1.3-4 plyr_1.8.6 XML_3.99-0.6  pkgconfig_2.0.3 biomaRt_2.46.3 zlibbioc_1.36.0  purrr_0.3.4 scales_1.1.1 BiocParallel_1.24.1  tibble_3.1.2 openssl_1.4.4 generics_0.1.0  ggplot2_3.3.5 ellipsis_0.3.2 cachem_1.0.5  magrittr_2.0.1 crayon_1.4.1 memoise_2.0.0  fansi_0.5.0 MASS_7.3-54 xml2_1.3.2  tools_4.0.0 prettyunits_1.1.1 hms_1.1.0  lifecycle_1.0.0 stringr_1.4.0 munsell_0.5.0  locfit_1.5-9.4 DelayedArray_0.16.3 Biostrings_2.58.0  compiler_4.0.0 rlang_0.4.11 grid_4.0.0  RCurl_1.98-1.3 rstudioapi_0.13 rappdirs_0.3.3  bitops_1.0-7 gtable_0.3.0 DBI_1.1.1  curl_4.3.2 reshape2_1.4.4 R6_2.5.0  GenomicAlignments_1.26.0 dplyr_1.0.7 rtracklayer_1.50.0  fastmap_1.1.0 bit_4.0.4 utf8_1.2.1  readr_1.4.0 stringi_1.6.2 Rcpp_1.0.6  vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.1