Entering edit mode
@aleksandarbaburskig-23324
Last seen 3.9 years ago
Hi, I am trying to remove (filter out) low expressed exons / genes to speed up DEXSeq 1.34.0 analysis.
1) What would be the best way to do that? I am not sure how to do that starting from dexseq object created by DEXSeqDataSetFromHTSeq() function. Can you please post some example code for filtering?
2) What would be the suggested minimal exon / gene expression? Can you please give me some starting directions for filtering?
Here is my code:
library(argparser)
library(dplyr)
library(readr)
library(DEXSeq)
p <- arg_parser("inputs")
p <- add_argument(p, "--countFiles", nargs=Inf)
p <- add_argument(p, "--flattenedGTF", nargs=1)
p <- add_argument(p, "--sampleTable", nargs=1)
p <- add_argument(p, "--factor", nargs=1)
args <- parse_args(p)
count_files = args$countFiles
flattened_gtf = args$flattenedGTF
sampleTable = read.csv(args$sampleTable)
factor_ = args$factor
# Number of cores / workers
BPPARAM = MulticoreParam(workers = 18)
# Create null and full formula
formula_null = as.formula("~ sample + exon")
formula_full = as.formula(paste0("~ sample + exon + ", factor_, ":exon"))
#DEXSeq
dxd = DEXSeqDataSetFromHTSeq(
countfiles = count_files,
sampleData = sampleTable,
design = formula_full,
flattenedfile = flattened_gtf)
dxd_norm = estimateSizeFactors( dxd )
dxd_esf = estimateDispersions(dxd_norm,
formula=formula_full,
BPPARAM = BPPARAM)
dxd_test = testForDEU(dxd_esf,
reducedModel=formula_null,
fullModel=formula_full,
BPPARAM = BPPARAM)
dxd_foldchange = estimateExonFoldChanges(dxd_test,
fitExpToVar=factor_,
denominator="normal",
BPPARAM = BPPARAM)
dxr1 = DEXSeqResults( dxd_foldchange )
Thanks!
Hi aleksandar.baburski.g Seems Alejandro Reyes is busy these days so he'll take some time to answer.
Nevertheless, the DEXSeqResults function automatically will remove those exons that have low counts when compared to the mean of normalized counts and these genes and exons thereof will have NA in the padj column. So when you set padj<0.1 or 0.05 cutoff, they will not appear in your output of Differentially used Exons.
Thanks, I was actually thinking about removing low expressed genes / exons from dexseq object before normalisation, dispersion estimation and testing for DEU (and I do not know how to do that). Dispersion estimation step takes 10+ hours for large number of samples (70+), thus I am trying to reduce the number of exons in dxd object... Hopefully, that will speed up the analysis, or not?