DEXSeq offset term
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi all, I am interested in using DEXSeq to look for differential expression across seven conditions, each with ~7 biological replicates. I have created an ExonCountSet object and am trying to estimate the dispersions from my model. I have two questions: 1) Is it possible to supply an offset term to the glm? This would be very helpful when trying to incorporate normalization info, since the count data is restricted to integers. I noticed that the glmnb.fit object called by DEXSeq contains a mf$offset variable. Additionally, 2) Has anyone seen and/or corrected the following error when trying to use multiple cores to estimate dispersions? ecs <- estimateDispersions(ecs, nCores=3) Estimating Cox-Reid exon dispersion estimates using 3 cores. (Progress report: one dot per 100 genes) The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec(). Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATI ON_FUNCTIONALITY___YOU_MUST_EXEC__() to debug. This message repeats several times and also contains the following: Tcl_ServiceModeHook: Notifier not initialized. Thanks a bunch for any potential insight! -- output of sessionInfo(): > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin12.0.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DEXSeq_1.2.1 Biobase_2.16.0 BiocGenerics_0.2.0 [4] RColorBrewer_1.0-5 scales_0.2.2 cqn_1.2.0 [7] quantreg_4.91 SparseM_0.96 preprocessCore_1.18.0 [10] nor1mix_1.1-3 mclust_4.0 plyr_1.7.1 [13] multicore_0.1-7 loaded via a namespace (and not attached): [1] biomaRt_2.12.0 colorspace_1.1-1 dichromat_1.2-4 hwriter_1.3 [5] labeling_0.1 munsell_0.4 RCurl_1.95-0 statmod_1.4.16 [9] stringr_0.6.1 tcltk_2.15.1 tools_2.15.1 XML_3.95-0 -- Sent via the guest posting facility at bioconductor.org.
Normalization PROcess DEXSeq Normalization PROcess DEXSeq • 1.0k views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 6 days ago
Novartis Institutes for BioMedical Reseā€¦
Dear Alicia, Thanks for your input! > Hi all, > > I am interested in using DEXSeq to look for differential expression across seven conditions, each with ~7 biological replicates. I have created an ExonCountSet object and am trying to estimate the dispersions from my model. I have two questions: 1) Is it possible to supply an offset term to the glm? This would be very helpful when trying to incorporate normalization info, since the count data is restricted to integers. I noticed that the glmnb.fit object called by DEXSeq contains a mf$offset variable. Additionally, 2) Has anyone seen and/or corrected the following error when trying to use multiple cores to estimate dispersions? DEXSeq is designed to test for differences in exon usage, not differential expression. To test for differential expression you can use other packages like DESeq or edgeR. Anyway, in both DESeq and DEXSeq you can assign normalization factors using: sizeFactors(yourobject) <- your normalization factors or why dont you use the normalization factors from DESeq/DEXSeq? > ecs <- estimateDispersions(ecs, nCores=3) > Estimating Cox-Reid exon dispersion estimates using 3 cores. (Progress report: one dot per 100 genes) > The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec(). > Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDA TION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug. > This message repeats several times and also contains the following: Tcl_ServiceModeHook: Notifier not initialized. If you are testing for differential exon usage, do you get the error message using only a single core? I believe that it is a problem with your multicore + apple... some "googling" took me to this: https://stat.ethz.ch/pipermail/r-sig-mac/2009-August/006426.html Best wishes, Alejandro > Thanks a bunch for any potential insight! > > -- output of sessionInfo(): > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin12.0.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] DEXSeq_1.2.1 Biobase_2.16.0 BiocGenerics_0.2.0 > [4] RColorBrewer_1.0-5 scales_0.2.2 cqn_1.2.0 > [7] quantreg_4.91 SparseM_0.96 preprocessCore_1.18.0 > [10] nor1mix_1.1-3 mclust_4.0 plyr_1.7.1 > [13] multicore_0.1-7 > > loaded via a namespace (and not attached): > [1] biomaRt_2.12.0 colorspace_1.1-1 dichromat_1.2-4 hwriter_1.3 > [5] labeling_0.1 munsell_0.4 RCurl_1.95-0 statmod_1.4.16 > [9] stringr_0.6.1 tcltk_2.15.1 tools_2.15.1 XML_3.95-0 > > -- > Sent via the guest posting facility at bioconductor.org. >
ADD COMMENT
0
Entering edit mode
Hi Alexander, Thanks for your rapid response! DEXSeq is designed to test for differences in exon usage, not differential > expression. I guess I should clarify--I have already used edgeR to look for overall differential expression but was also wanting to see if I could pin down functional components where the DE is occurring, i.e. which exon(s). Anyway, in both DESeq and DEXSeq you can assign normalization factors using: > > sizeFactors(yourobject) <- your normalization factors > > or why dont you use the normalization factors from DESeq/DEXSeq? Thanks a bunch! I should have read the sizeFactors documentation more carefully. The reason I really wanted to be able to supply my own normalization factors is because we have some large sample-specific GC effects on expression that most normalization techniques will be unable to take into account. Using conditional quantile normalization (cqn from bioconductor) has shown a large improvement in our count distributions. Also, there are no issues with a single core. I was just hoping to parallelize the process since it takes several hours with our dataset to run. Thanks again, Alicia On Tue, Oct 9, 2012 at 1:37 AM, Alejandro Reyes <alejandro.reyes@embl.de>wrote: > Dear Alicia, > > Thanks for your input! > > > Hi all, >> >> I am interested in using DEXSeq to look for differential expression >> across seven conditions, each with ~7 biological replicates. I have created >> an ExonCountSet object and am trying to estimate the dispersions from my >> model. I have two questions: 1) Is it possible to supply an offset term to >> the glm? This would be very helpful when trying to incorporate >> normalization info, since the count data is restricted to integers. I >> noticed that the glmnb.fit object called by DEXSeq contains a mf$offset >> variable. Additionally, 2) Has anyone seen and/or corrected the following >> error when trying to use multiple cores to estimate dispersions? >> > > DEXSeq is designed to test for differences in exon usage, not differential > expression. To test for differential expression you can use other packages > like DESeq or edgeR. Anyway, in both DESeq and DEXSeq you can assign > normalization factors using: > > sizeFactors(yourobject) <- your normalization factors > > or why dont you use the normalization factors from DESeq/DEXSeq? > > > ecs <- estimateDispersions(ecs, nCores=3) >> Estimating Cox-Reid exon dispersion estimates using 3 cores. (Progress >> report: one dot per 100 genes) >> The process has forked and you cannot use this CoreFoundation >> functionality safely. You MUST exec(). >> Break on __THE_PROCESS_HAS_FORKED_AND_**YOU_CANNOT_USE_THIS_** >> COREFOUNDATION_FUNCTIONALITY__**_YOU_MUST_EXEC__() to debug. >> This message repeats several times and also contains the following: >> Tcl_ServiceModeHook: Notifier not initialized. >> > > If you are testing for differential exon usage, do you get the error > message using only a single core? I believe that it is a problem with your > multicore + apple... some "googling" took me to this: > > https://stat.ethz.ch/**pipermail/r-sig- mac/2009-**August/006426.html<https: stat.ethz.ch="" pipermail="" r-sig-="" mac="" 2009-august="" 006426.html=""> > > Best wishes, > Alejandro > > > Thanks a bunch for any potential insight! >> >> -- output of sessionInfo(): >> >> sessionInfo() >>> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-apple-darwin12.0.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.**UTF-8/C/en_US.UTF-8/en_US.UTF-**8 >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] DEXSeq_1.2.1 Biobase_2.16.0 BiocGenerics_0.2.0 >> [4] RColorBrewer_1.0-5 scales_0.2.2 cqn_1.2.0 >> [7] quantreg_4.91 SparseM_0.96 preprocessCore_1.18.0 >> [10] nor1mix_1.1-3 mclust_4.0 plyr_1.7.1 >> [13] multicore_0.1-7 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.12.0 colorspace_1.1-1 dichromat_1.2-4 hwriter_1.3 >> [5] labeling_0.1 munsell_0.4 RCurl_1.95-0 statmod_1.4.16 >> [9] stringr_0.6.1 tcltk_2.15.1 tools_2.15.1 XML_3.95-0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Alicia On 09/10/12 19:16, Alicia Martin wrote: > Anyway, in both DESeq and DEXSeq you can assign normalization factors using: >> >> sizeFactors(yourobject) <- your normalization factors >> >> or why dont you use the normalization factors from DESeq/DEXSeq? > > > Thanks a bunch! I should have read the sizeFactors documentation more > carefully. The reason I really wanted to be able to supply my own > normalization factors is because we have some large sample-specific GC > effects on expression that most normalization techniques will be unable to > take into account. Using conditional quantile normalization (cqn from > bioconductor) has shown a large improvement in our count distributions. Well, actually, you can only specify one size factor per sample that way. If you use CQN, you probably want to have different size factors for every gene. Changing DESeq/DEXSeq to allow for this has been on my to-do list since a while, and I still haven't done it, I'm afraid. On the other hand, as you have already looked into the code anyway and found the place where we specify the offsets, you may be able to put this together yourself. The function 'testGeneForDEU' calls 'modelFrameForGene' which returns a model frame, to be used in the GLM fit, with a column 'sizeFactors', and the log of this is used as offset. If you are comfortable with looking into the innards of DEXSeq, you could patch testGeneForDEU: Right after the all to modelFrameForGene, modify the modelFrame returned by it by multiplying the sizeFactors column with the correction factors supplied by CQN. But, of course, we should offer a proper interface to get this information in. After all, everything else is in place. I hope we get to that soon. Simon
ADD REPLY

Login before adding your answer.

Traffic: 821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6