While running DEXSeq2 on a Drosophila dataset in triplicates I noticed that some of my differentially regulated exons within my DEXSeqResults object (FDR < 0.05) have an undefined estimated foldchange but do have an adjusted p-value and associated counts. They also do not have estimated exon coefficient values.
I did investigate this and noticed that there are differences in the way DEXSeqResults is filtering the exons to the way estimatelog2Foldchanges is doing it:
within DEXSeqResults the first line is:
LRTresults <- results(object, filter=rowMeans( featureCounts(object) ) )
while the estimatelog2Foldchanges function is using:
testablegenes <- as.character( unique( groupIDs(object)[!is.na( results(object)$padj )] ) )
to define the testable genes.
Since I expected the log2FoldChange estimation to be based on the same filtering as the DEXSeqResults I got a little bit confused. According to the description of the results function of DESeq2 the default filtering as done within results(object) should be based on the mean of normalised counts which is retrieved from the object using the counts function. But the DEXSeqResults function is using the unnormalised counts as a standard filtering while the estimatelog2FoldChanges is using the DESeq2 default which is the mean of normalised counts as retrieved by counts(object). For an object as produced by testForDEU (DEXSeq) the counts do not include only the counts per exon but also the sum of the rest of the counts per gene (for 6 samples there is a 12 column matrix returned).
Which kind of filtering method is the correct one? If I go with the DESeq2 default than it should be
LRTresults <- results(object, filter=rowMeans(featureCounts(object, normalized=TRUE)))
filtering on the mean of normalised counts and the same should apply to the determination of the testablegenes.
#here are my commands
mmatrix="~ sample + exon + group:exon"
dxd <- DEXSeqDataSetFromHTSeq(targets$file,
sampleData=sampleTable,
design =as.formula(mmatrix),
flattenedfile=gfffile)
formulaFullModel = as.formula(mmatrix)
formulaReducedModel = ~ sample + exon
counts <- featureCounts(dxd, normalized=FALSE)
dxd <- estimateSizeFactors (dxd)
dxd <- estimateDispersions (dxd, formula=formulaFullModel, BPPARAM=BPPARAM)
dxd <- testForDEU (dxd, reducedModel= formulaReducedModel, fullModel = formulaFullModel, BPPARAM=BPPARAM)
dxd <- estimateExonFoldChanges(dxd, fitExpToVar=conds, BPPARAM=BPPARAM)
Thanks in advance for comments on this one.
Nastasja
sessionInfo():
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DEXSeq_1.12.1 WriteXLS_3.5.1 biomaRt_2.22.0 rtracklayer_1.26.2 BiocParallel_1.0.0 DESeq2_1.6.2 RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3 GenomicRanges_1.18.3
[10] GenomeInfoDb_1.2.3 IRanges_2.0.0 S4Vectors_0.4.0 Biobase_2.26.0 BiocGenerics_0.12.1 vimcom_0.9-93 setwidth_1.0-3 colorout_1.0-3
loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 Biostrings_2.34.0 bitops_1.0-6 brew_1.0-6
[10] checkmate_1.5.0 cluster_1.15.3 codetools_0.2-9 colorspace_1.2-4 DBI_0.3.1 digest_0.6.4 fail_1.2 foreach_1.4.2 foreign_0.8-61
[19] Formula_1.1-2 genefilter_1.48.1 geneplotter_1.44.0 GenomicAlignments_1.2.1 ggplot2_1.0.0 grid_3.1.2 gtable_0.1.2 Hmisc_3.14-6 hwriter_1.3.2
[28] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-35 munsell_0.4.2 nnet_7.3-8 plyr_1.8.1 proto_0.3-10
[37] RColorBrewer_1.0-5 RCurl_1.95-4.4 reshape2_1.4 rpart_4.1-8 Rsamtools_1.18.2 RSQLite_1.0.0 scales_0.2.4 sendmailR_1.2-1 splines_3.1.2
[46] statmod_1.4.20 stringr_0.6.2 survival_2.37-7 tools_3.1.2 XML_3.98-1.1 xtable_1.7-4 XVector_0.6.0 zlibbioc_1.12.0
Hi Alejandro,
I just noticed that the genes which do not have foldchanges assigned to one exon which has a padj value tend to have more than one exon which does not have a padj.
The gene I had a look at has 7 exons.
Nastasja
Hi Nastasja,
Could you send me your objects, as well as the example of the gene that you are mentioning, so I can have a closer look?
Alejandro
Hi Alejandro,
since the data is unpublished I have to talk to the responsible person and come back to you. I may be able to reproduce it with some other already published data.
Nastasja
For the record, Nastasja and I continued with some e-mail exchanges and were able to solve this issue.
Alejandro
Hi Alejandro, I am noticing the same with mine dataset i.e. log2fold as NA with values in pads. Did you reran DEXSeq with changed parameter or tackled DEXSeq object? Mine experiment design is exactly like the thread starter. Thanks. Surendra