Question

Undefined log2Foldchanges within DEXSeq for significantly regulated exons

1

Entering edit mode

n.kreim ▴ 20

@nkreim-7374

Last seen 9.5 years ago

Germany

While running DEXSeq2 on a Drosophila dataset in triplicates I noticed that some of my differentially regulated exons within my DEXSeqResults object (FDR < 0.05) have an undefined estimated foldchange but do have an adjusted p-value and associated counts. They also do not have estimated exon coefficient values.

I did investigate this and noticed that there are differences in the way DEXSeqResults is filtering the exons to the way estimatelog2Foldchanges is doing it:

within DEXSeqResults the first line is:

LRTresults <- results(object, filter=rowMeans( featureCounts(object) ) )

while the estimatelog2Foldchanges function is using:

testablegenes <- as.character( unique( groupIDs(object)[!is.na( results(object)$padj )] ) )

to define the testable genes.

Since I expected the log2FoldChange estimation to be based on the same filtering as the DEXSeqResults I got a little bit confused. According to the description of the results function of DESeq2 the default filtering as done within results(object) should be based on the mean of normalised counts which is retrieved from the object using the counts function. But the DEXSeqResults function is using the unnormalised counts as a standard filtering while the estimatelog2FoldChanges is using the DESeq2 default which is the mean of normalised counts as retrieved by counts(object). For an object as produced by testForDEU (DEXSeq) the counts do not include only the counts per exon but also the sum of the rest of the counts per gene (for 6 samples there is a 12 column matrix returned).

Which kind of filtering method is the correct one? If I go with the DESeq2 default than it should be
LRTresults <- results(object, filter=rowMeans(featureCounts(object, normalized=TRUE)))
filtering on the mean of normalised counts and the same should apply to the determination of the testablegenes.

#here are my commands
mmatrix="~ sample + exon + group:exon"
dxd <- DEXSeqDataSetFromHTSeq(targets$file,
           sampleData=sampleTable,
           design =as.formula(mmatrix),
           flattenedfile=gfffile)

formulaFullModel = as.formula(mmatrix)
formulaReducedModel = ~ sample + exon
counts <- featureCounts(dxd, normalized=FALSE)
dxd <- estimateSizeFactors (dxd)
dxd <- estimateDispersions (dxd, formula=formulaFullModel, BPPARAM=BPPARAM)
dxd <- testForDEU (dxd, reducedModel= formulaReducedModel, fullModel = formulaFullModel, BPPARAM=BPPARAM)
dxd <- estimateExonFoldChanges(dxd, fitExpToVar=conds, BPPARAM=BPPARAM)

Thanks in advance for comments on this one.

Nastasja

sessionInfo():

R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] DEXSeq_1.12.1 WriteXLS_3.5.1 biomaRt_2.22.0 rtracklayer_1.26.2 BiocParallel_1.0.0 DESeq2_1.6.2 RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3 GenomicRanges_1.18.3
[10] GenomeInfoDb_1.2.3 IRanges_2.0.0 S4Vectors_0.4.0 Biobase_2.26.0 BiocGenerics_0.12.1 vimcom_0.9-93 setwidth_1.0-3 colorout_1.0-3

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3         annotate_1.44.0         AnnotationDbi_1.28.1    base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8              Biostrings_2.34.0       bitops_1.0-6            brew_1.0-6
[10] checkmate_1.5.0         cluster_1.15.3          codetools_0.2-9         colorspace_1.2-4        DBI_0.3.1               digest_0.6.4            fail_1.2                foreach_1.4.2           foreign_0.8-61
[19] Formula_1.1-2           genefilter_1.48.1       geneplotter_1.44.0      GenomicAlignments_1.2.1 ggplot2_1.0.0           grid_3.1.2              gtable_0.1.2            Hmisc_3.14-6            hwriter_1.3.2
[28] iterators_1.0.7         lattice_0.20-29         latticeExtra_0.6-26     locfit_1.5-9.1          MASS_7.3-35             munsell_0.4.2           nnet_7.3-8              plyr_1.8.1              proto_0.3-10
[37] RColorBrewer_1.0-5      RCurl_1.95-4.4          reshape2_1.4            rpart_4.1-8             Rsamtools_1.18.2        RSQLite_1.0.0           scales_0.2.4            sendmailR_1.2-1         splines_3.1.2
[46] statmod_1.4.20          stringr_0.6.2           survival_2.37-7         tools_3.1.2             XML_3.98-1.1            xtable_1.7-4            XVector_0.6.0           zlibbioc_1.12.0

DEXSeq • 3.5k views

ADD COMMENT • link 11.0 years ago n.kreim ▴ 20

0

Entering edit mode

Alejandro Reyes ★ 1.9k

@alejandro-reyes-5124

Last seen 18 months ago

Novartis Institutes for BioMedical Rese…

Hi Nastasja,

Thanks for noticing this!

The filtering should be done in the normalized counts, as you mentioned. I just added to the svn versions (both release branch (1.12.2) and devel) these changes.

Regarding the fold changes, is it possible that this is the case particularly for genes with many exons? If so, could you try to increase the maxMf parameter of the estimateExonFoldChanges (this should allow estimating fold changes for such genes, but it will take some time to compute!)?

Alejandro

ADD COMMENT • link 11.0 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

n.kreim ▴ 20

@nkreim-7374

Last seen 9.5 years ago

Germany

Hi Alejandro,

thanks for the answer I will check out the changes.

I found the maxMf parameter in another post and tried it out. But it did not change anything. Even when increasing it to 150,000. That was when I started to have a look at it.

In my case the genes which do not have a log2foldchange estimate are not part of the testable genes within the function estimateExonFoldChanges()

testablegenes <- as.character( unique( groupIDs(object)[!is.na( results(object)$padj )] ) ).

Which is still the case after using the new filtering within the DEXSeq results.

I was wondering if this may be due to the fact that the results function in this line is using the counts function to determine the filtering. If I understood it correctly the results function is part of the DESeq2 package and will use as a filtering the default which is the baseMean which I am not sure how it is calculated in this case. Or because the filtering is done on genes instead of exons? Frankly I am a little bit lost here between dexseq and deseq2 functions.

Nastasja

ADD COMMENT • link 11.0 years ago n.kreim ▴ 20

0

Entering edit mode

Alejandro Reyes ★ 1.9k

@alejandro-reyes-5124

Last seen 18 months ago

Novartis Institutes for BioMedical Rese…

Hi Nastasja,

I see, just to verify, were you using the development version when you tried the maxMf parameter?

Alejandro

ADD COMMENT • link 11.0 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

n.kreim ▴ 20

@nkreim-7374

Last seen 9.5 years ago

Germany

Hi Alejandro,

I was not using the development version but added the changes which were within the two functions estimateExonFoldChanges and fitAndArrangeCoefs to the released Bioconductor version.

I do not know if this was clear in the post before. The maxMf parameter is used within the geteffects() function within estimateExonFoldChanges but the genes which do not have log2Foldchanges are not within the testablegene list fed to the function.

In the current development source core.R from Bioconductor line 105 defines the testablegenes and in line 142 the effects are calculated for all testablegenes.

These testablegenes somehow do not include all genes which have exons with estimated padj but some of them are missing. And I assume that one cannot calculate the foldchanges without the associated effects which are assigned during the call to geteffects() (if I understood the code correctly).

Nastasja

ADD COMMENT • link 11.0 years ago n.kreim ▴ 20

0

Entering edit mode

Hi Alejandro,

I just noticed that the genes which do not have foldchanges assigned to one exon which has a padj value tend to have more than one exon which does not have a padj.

The gene I had a look at has 7 exons.

Nastasja

ADD REPLY • link 11.0 years ago n.kreim ▴ 20

0

Entering edit mode

Hi Nastasja,

Could you send me your objects, as well as the example of the gene that you are mentioning, so I can have a closer look?

Alejandro

ADD REPLY • link 11.0 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Hi Alejandro,

since the data is unpublished I have to talk to the responsible person and come back to you. I may be able to reproduce it with some other already published data.

Nastasja

ADD REPLY • link 11.0 years ago n.kreim ▴ 20

0

Entering edit mode

For the record, Nastasja and I continued with some e-mail exchanges and were able to solve this issue.

Alejandro

ADD REPLY • link 10.9 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Hi Alejandro, I am noticing the same with mine dataset i.e. log2fold as NA with values in pads. Did you reran DEXSeq with changed parameter or tackled DEXSeq object? Mine experiment design is exactly like the thread starter. Thanks. Surendra

ADD REPLY • link 9.8 years ago skdhanraj • 0

score 1 · Accepted Answer · 2015-02-25

1

Entering edit mode

n.kreim ▴ 20

@nkreim-7374

Last seen 9.5 years ago

Germany

Hi all,

Alejandro told me he fixed the issues within the svn repository for release version 1.12.2 and the devel version 1.13.6.

So thanks again for the fast fix Alejandro.

Nastasja

ADD COMMENT • link 11.0 years ago n.kreim ▴ 20