Question

DEXSeq - what is the best way to filter out low expressed exons / genes?

0

Entering edit mode

aleksandar.baburski.g • 0

@aleksandarbaburskig-23324

Last seen 4.8 years ago

Hi, I am trying to remove (filter out) low expressed exons / genes to speed up DEXSeq 1.34.0 analysis.

1) What would be the best way to do that? I am not sure how to do that starting from dexseq object created by DEXSeqDataSetFromHTSeq() function. Can you please post some example code for filtering?

2) What would be the suggested minimal exon / gene expression? Can you please give me some starting directions for filtering?

Here is my code:

library(argparser)
library(dplyr)
library(readr)
library(DEXSeq)

p <- arg_parser("inputs")
p <- add_argument(p, "--countFiles", nargs=Inf)
p <- add_argument(p, "--flattenedGTF", nargs=1)
p <- add_argument(p, "--sampleTable", nargs=1)
p <- add_argument(p, "--factor", nargs=1)

args <- parse_args(p)

count_files = args$countFiles
flattened_gtf = args$flattenedGTF
sampleTable = read.csv(args$sampleTable) 
factor_ = args$factor

# Number of cores / workers
BPPARAM = MulticoreParam(workers = 18)

# Create null and full formula
formula_null = as.formula("~ sample + exon")
formula_full = as.formula(paste0("~ sample + exon + ", factor_, ":exon"))

#DEXSeq
dxd = DEXSeqDataSetFromHTSeq(
      countfiles = count_files,
      sampleData = sampleTable,
      design = formula_full,
      flattenedfile = flattened_gtf)

dxd_norm = estimateSizeFactors( dxd )

dxd_esf = estimateDispersions(dxd_norm, 
                          formula=formula_full,
                          BPPARAM = BPPARAM)

dxd_test = testForDEU(dxd_esf,
                  reducedModel=formula_null,
                  fullModel=formula_full,
                  BPPARAM = BPPARAM)  

dxd_foldchange = estimateExonFoldChanges(dxd_test,
                                     fitExpToVar=factor_,
                                     denominator="normal",
                                     BPPARAM = BPPARAM) 
dxr1 = DEXSeqResults( dxd_foldchange )

Thanks!

dexseq • 1.8k views

ADD COMMENT • link 5.5 years ago • updated 5.0 years ago aleksandar.baburski.g • 0

0

Entering edit mode

Hi aleksandar.baburski.g Seems Alejandro Reyes is busy these days so he'll take some time to answer.

Nevertheless, the DEXSeqResults function automatically will remove those exons that have low counts when compared to the mean of normalized counts and these genes and exons thereof will have NA in the padj column. So when you set padj<0.1 or 0.05 cutoff, they will not appear in your output of Differentially used Exons.

ADD REPLY • link 5.1 years ago rohitsatyam102 ▴ 20

0

Entering edit mode

Thanks, I was actually thinking about removing low expressed genes / exons from dexseq object before normalisation, dispersion estimation and testing for DEU (and I do not know how to do that). Dispersion estimation step takes 10+ hours for large number of samples (70+), thus I am trying to reduce the number of exons in dxd object... Hopefully, that will speed up the analysis, or not?

ADD REPLY • link 5.0 years ago aleksandar.baburski.g • 0