Search
Question: Undefined log2Foldchanges within DEXSeq for significantly regulated exons
1
gravatar for n.kreim
2.8 years ago by
n.kreim20
Germany
n.kreim20 wrote:

While running DEXSeq2 on a Drosophila dataset in triplicates I noticed that some of my differentially regulated exons within my DEXSeqResults object (FDR < 0.05)  have an undefined estimated foldchange but do have an adjusted p-value and associated counts. They also do not have estimated exon coefficient values.

I did investigate this and noticed that there are differences in the way DEXSeqResults is filtering the exons to the way estimatelog2Foldchanges is doing it:

within DEXSeqResults the first line is:

LRTresults <- results(object, filter=rowMeans( featureCounts(object) ) )

while the estimatelog2Foldchanges function is using:

testablegenes <- as.character( unique( groupIDs(object)[!is.na( results(object)$padj )] ) )

to define the testable genes.

Since I expected the log2FoldChange estimation to be based on the same filtering as the DEXSeqResults I got a little bit confused. According to the description of the results function of DESeq2 the default filtering as done within results(object) should be based on the mean of normalised counts which is retrieved from the object using the counts function. But the DEXSeqResults function is using the unnormalised counts as a standard filtering while the estimatelog2FoldChanges is using the DESeq2 default which is the mean of normalised counts as retrieved by counts(object). For an object as produced by testForDEU (DEXSeq) the counts do not include only the counts per exon but also the sum of the rest of the counts per gene (for 6 samples there is a 12 column matrix returned).

Which kind of filtering method is the correct one? If I go with the DESeq2 default than it should be
LRTresults <- results(object, filter=rowMeans(featureCounts(object, normalized=TRUE)))  
filtering on the mean of normalised counts and the same should apply to the determination of the testablegenes.

 

#here are my commands
mmatrix="~ sample + exon + group:exon"
dxd <- DEXSeqDataSetFromHTSeq(targets$file,
             sampleData=sampleTable,
             design =as.formula(mmatrix),
             flattenedfile=gfffile)
 

formulaFullModel = as.formula(mmatrix)
formulaReducedModel = ~ sample + exon
counts <- featureCounts(dxd, normalized=FALSE)
dxd <- estimateSizeFactors (dxd)
dxd <- estimateDispersions (dxd, formula=formulaFullModel, BPPARAM=BPPARAM)
dxd <- testForDEU (dxd, reducedModel= formulaReducedModel, fullModel = formulaFullModel, BPPARAM=BPPARAM)
dxd <- estimateExonFoldChanges(dxd, fitExpToVar=conds, BPPARAM=BPPARAM)

Thanks in advance for comments on this one.

Nastasja

 

sessionInfo():

R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.12.1             WriteXLS_3.5.1            biomaRt_2.22.0            rtracklayer_1.26.2        BiocParallel_1.0.0        DESeq2_1.6.2              RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3               GenomicRanges_1.18.3     
[10] GenomeInfoDb_1.2.3        IRanges_2.0.0             S4Vectors_0.4.0           Biobase_2.26.0            BiocGenerics_0.12.1       vimcom_0.9-93             setwidth_1.0-3            colorout_1.0-3           

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3         annotate_1.44.0         AnnotationDbi_1.28.1    base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8              Biostrings_2.34.0       bitops_1.0-6            brew_1.0-6             
[10] checkmate_1.5.0         cluster_1.15.3          codetools_0.2-9         colorspace_1.2-4        DBI_0.3.1               digest_0.6.4            fail_1.2                foreach_1.4.2           foreign_0.8-61         
[19] Formula_1.1-2           genefilter_1.48.1       geneplotter_1.44.0      GenomicAlignments_1.2.1 ggplot2_1.0.0           grid_3.1.2              gtable_0.1.2            Hmisc_3.14-6            hwriter_1.3.2          
[28] iterators_1.0.7         lattice_0.20-29         latticeExtra_0.6-26     locfit_1.5-9.1          MASS_7.3-35             munsell_0.4.2           nnet_7.3-8              plyr_1.8.1              proto_0.3-10           
[37] RColorBrewer_1.0-5      RCurl_1.95-4.4          reshape2_1.4            rpart_4.1-8             Rsamtools_1.18.2        RSQLite_1.0.0           scales_0.2.4            sendmailR_1.2-1         splines_3.1.2          
[46] statmod_1.4.20          stringr_0.6.2           survival_2.37-7         tools_3.1.2             XML_3.98-1.1            xtable_1.7-4            XVector_0.6.0           zlibbioc_1.12.0


 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by n.kreim20
1
gravatar for n.kreim
2.8 years ago by
n.kreim20
Germany
n.kreim20 wrote:

Hi all,

Alejandro told me he fixed the issues within the svn repository for release version 1.12.2 and the devel version 1.13.6.

So thanks again for the fast fix Alejandro.

Nastasja

ADD COMMENTlink written 2.8 years ago by n.kreim20
0
gravatar for Alejandro Reyes
2.8 years ago by
Alejandro Reyes1.5k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.5k wrote:

Hi Nastasja,

Thanks for noticing this!

The filtering should be done in the normalized counts, as you mentioned. I just added to the svn versions (both release branch (1.12.2) and devel) these changes.

Regarding the fold changes, is it possible that this is the case particularly for genes with many exons? If so, could you try to increase the maxMf parameter of the estimateExonFoldChanges (this should allow estimating fold changes for such genes, but it will take some time to compute!)?

Alejandro

ADD COMMENTlink written 2.8 years ago by Alejandro Reyes1.5k
0
gravatar for n.kreim
2.8 years ago by
n.kreim20
Germany
n.kreim20 wrote:

Hi Alejandro,

 

thanks for the answer I will check out the changes.

I found the maxMf parameter in another post and tried it out. But it did not change anything. Even when increasing it to 150,000. That was when I started to have a look at it.

In my case the genes which do not have a log2foldchange estimate are not part of the testable genes within the function estimateExonFoldChanges()

testablegenes <- as.character( unique( groupIDs(object)[!is.na( results(object)$padj )] ) ).

Which is still the case after using the new filtering within the DEXSeq results.

I was wondering if this may be due to the fact that the results function in this line is using the counts function to determine the filtering. If I understood it correctly the results function is part of the DESeq2 package and will use as a filtering the default which is the baseMean which I am not sure how it is calculated in this case. Or because the filtering is done on genes instead of exons? Frankly I am a little bit lost here between dexseq and deseq2 functions.

Nastasja

 

ADD COMMENTlink written 2.8 years ago by n.kreim20
0
gravatar for Alejandro Reyes
2.8 years ago by
Alejandro Reyes1.5k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.5k wrote:

Hi Nastasja,

I see, just to verify, were you using the development version when you tried the maxMf parameter? 

Alejandro

ADD COMMENTlink written 2.8 years ago by Alejandro Reyes1.5k
0
gravatar for n.kreim
2.8 years ago by
n.kreim20
Germany
n.kreim20 wrote:

Hi Alejandro,

I was not using the development version but added the changes which were within the two functions estimateExonFoldChanges and fitAndArrangeCoefs to the released Bioconductor version.

I do not know if this was clear in the post before. The maxMf parameter is used within the geteffects() function within estimateExonFoldChanges but the genes which do not have log2Foldchanges are not within the testablegene list fed to the function.

In the current development source core.R from Bioconductor line 105 defines the testablegenes and in line 142 the effects are calculated for all testablegenes.

These testablegenes somehow do not include all genes which have exons with estimated padj but some of them are missing.  And I assume that one cannot calculate the foldchanges without the associated effects which are assigned during the call to geteffects() (if I understood the code correctly).

Nastasja

 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by n.kreim20

Hi Alejandro,

I just noticed that the genes which do not have foldchanges assigned to one exon which has a padj value tend to have more than one exon which does not have a padj.

The gene I had a look at has 7 exons.

Nastasja

ADD REPLYlink written 2.8 years ago by n.kreim20

Hi Nastasja,

Could you send me your objects, as well as the example of the gene that you are mentioning, so I can have a closer look? 

Alejandro

ADD REPLYlink written 2.8 years ago by Alejandro Reyes1.5k

Hi Alejandro,

since the data is unpublished I have to talk to the responsible person and come back to you. I may be able to reproduce it with some other already published data.

Nastasja

ADD REPLYlink written 2.8 years ago by n.kreim20

For the record, Nastasja and I continued with some e-mail exchanges and were able to solve this issue.

Alejandro

ADD REPLYlink written 2.8 years ago by Alejandro Reyes1.5k

Hi Alejandro, I am noticing the same with mine dataset i.e. log2fold as NA with values in pads. Did you reran DEXSeq with changed parameter or tackled DEXSeq object? Mine experiment design is exactly like the thread starter. Thanks. Surendra

 

 

 

ADD REPLYlink modified 19 months ago • written 19 months ago by skdhanraj0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 267 users visited in the last hour