DEXSeq - what is the best way to filter out low expressed exons / genes?
0
0
Entering edit mode
@aleksandarbaburskig-23324
Last seen 3.8 years ago

Hi, I am trying to remove (filter out) low expressed exons / genes to speed up DEXSeq 1.34.0 analysis.

1) What would be the best way to do that? I am not sure how to do that starting from dexseq object created by DEXSeqDataSetFromHTSeq() function. Can you please post some example code for filtering?

2) What would be the suggested minimal exon / gene expression? Can you please give me some starting directions for filtering?

Here is my code:

library(argparser)
library(dplyr)
library(readr)
library(DEXSeq)

p <- arg_parser("inputs")
p <- add_argument(p, "--countFiles", nargs=Inf)
p <- add_argument(p, "--flattenedGTF", nargs=1)
p <- add_argument(p, "--sampleTable", nargs=1)
p <- add_argument(p, "--factor", nargs=1)

args <- parse_args(p)

count_files = args$countFiles
flattened_gtf = args$flattenedGTF
sampleTable = read.csv(args$sampleTable) 
factor_ = args$factor

# Number of cores / workers
BPPARAM = MulticoreParam(workers = 18)

# Create null and full formula
formula_null = as.formula("~ sample + exon")
formula_full = as.formula(paste0("~ sample + exon + ", factor_, ":exon"))

#DEXSeq
dxd = DEXSeqDataSetFromHTSeq(
      countfiles = count_files,
      sampleData = sampleTable,
      design = formula_full,
      flattenedfile = flattened_gtf)

dxd_norm = estimateSizeFactors( dxd )

dxd_esf = estimateDispersions(dxd_norm, 
                          formula=formula_full,
                          BPPARAM = BPPARAM)

dxd_test = testForDEU(dxd_esf,
                  reducedModel=formula_null,
                  fullModel=formula_full,
                  BPPARAM = BPPARAM)  

dxd_foldchange = estimateExonFoldChanges(dxd_test,
                                     fitExpToVar=factor_,
                                     denominator="normal",
                                     BPPARAM = BPPARAM) 
dxr1 = DEXSeqResults( dxd_foldchange )

Thanks!

dexseq • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi aleksandar.baburski.g Seems Alejandro Reyes is busy these days so he'll take some time to answer.

Nevertheless, the DEXSeqResults function automatically will remove those exons that have low counts when compared to the mean of normalized counts and these genes and exons thereof will have NA in the padj column. So when you set padj<0.1 or 0.05 cutoff, they will not appear in your output of Differentially used Exons.

ADD REPLY
0
Entering edit mode

Thanks, I was actually thinking about removing low expressed genes / exons from dexseq object before normalisation, dispersion estimation and testing for DEU (and I do not know how to do that). Dispersion estimation step takes 10+ hours for large number of samples (70+), thus I am trying to reduce the number of exons in dxd object... Hopefully, that will speed up the analysis, or not?

ADD REPLY

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6