While running DEXSeq2 on a Drosophila dataset in triplicates I noticed that some of my differentially regulated exons within my DEXSeqResults object (FDR < 0.05) have an undefined estimated foldchange but do have an adjusted p-value and associated counts. They also do not have estimated exon coefficient values.
I did investigate this and noticed that there are differences in the way DEXSeqResults is filtering the exons to the way estimatelog2Foldchanges is doing it:
within DEXSeqResults the first line is:
LRTresults <- results(object, filter=rowMeans( featureCounts(object) ) )
while the estimatelog2Foldchanges function is using:
testablegenes <- as.character( unique( groupIDs(object)[!is.na( results(object)$padj )] ) )
to define the testable genes.
Since I expected the log2FoldChange estimation to be based on the same filtering as the DEXSeqResults I got a little bit confused. According to the description of the results function of DESeq2 the default filtering as done within results(object) should be based on the mean of normalised counts which is retrieved from the object using the counts function. But the DEXSeqResults function is using the unnormalised counts as a standard filtering while the estimatelog2FoldChanges is using the DESeq2 default which is the mean of normalised counts as retrieved by counts(object). For an object as produced by testForDEU (DEXSeq) the counts do not include only the counts per exon but also the sum of the rest of the counts per gene (for 6 samples there is a 12 column matrix returned).
Which kind of filtering method is the correct one? If I go with the DESeq2 default than it should be
LRTresults <- results(object, filter=rowMeans(featureCounts(object, normalized=TRUE)))
filtering on the mean of normalised counts and the same should apply to the determination of the testablegenes.
#here are my commands
mmatrix="~ sample + exon + group:exon"
dxd <- DEXSeqDataSetFromHTSeq(targets$file,
formulaFullModel = as.formula(mmatrix)
formulaReducedModel = ~ sample + exon
counts <- featureCounts(dxd, normalized=FALSE)
dxd <- estimateSizeFactors (dxd)
dxd <- estimateDispersions (dxd, formula=formulaFullModel, BPPARAM=BPPARAM)
dxd <- testForDEU (dxd, reducedModel= formulaReducedModel, fullModel = formulaFullModel, BPPARAM=BPPARAM)
dxd <- estimateExonFoldChanges(dxd, fitExpToVar=conds, BPPARAM=BPPARAM)
Thanks in advance for comments on this one.
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
 LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
 stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
 DEXSeq_1.12.1 WriteXLS_3.5.1 biomaRt_2.22.0 rtracklayer_1.26.2 BiocParallel_1.0.0 DESeq2_1.6.2 RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3 GenomicRanges_1.18.3
 GenomeInfoDb_1.2.3 IRanges_2.0.0 S4Vectors_0.4.0 Biobase_2.26.0 BiocGenerics_0.12.1 vimcom_0.9-93 setwidth_1.0-3 colorout_1.0-3
loaded via a namespace (and not attached):
 acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 Biostrings_2.34.0 bitops_1.0-6 brew_1.0-6
 checkmate_1.5.0 cluster_1.15.3 codetools_0.2-9 colorspace_1.2-4 DBI_0.3.1 digest_0.6.4 fail_1.2 foreach_1.4.2 foreign_0.8-61
 Formula_1.1-2 genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.1 ggplot2_1.0.0 grid_3.1.2 gtable_0.1.2 Hmisc_3.14-6 hwriter_1.3.2
 iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-35 munsell_0.4.2 nnet_7.3-8 plyr_1.8.1 proto_0.3-10
 RColorBrewer_1.0-5 RCurl_1.95-4.4 reshape2_1.4 rpart_4.1-8 Rsamtools_1.18.2 RSQLite_1.0.0 scales_0.2.4 sendmailR_1.2-1 splines_3.1.2
 statmod_1.4.20 stringr_0.6.2 survival_2.37-7 tools_3.1.2 XML_3.98-1.1 xtable_1.7-4 XVector_0.6.0 zlibbioc_1.12.0