DEXSeq - identical p-values for all exons in a gene

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hi All, I have been running DEXSeq successfully on my dataset using the default, in-built linear models, but have been repeatedly running into problems when attempting to add an extra 'blocking' term to the model. I have checked the vignette and I thought my usage was correct: formulaFullModel <- ~ sample + exon + libType:exon + condition:exon formulaReducedModel <- ~ sample + exon + libType:exon ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile) ecs <- estimateSizeFactors(ecs) ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') ecs<- fitDispersionFunction(ecs) ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt') To check whether my problem was related to the extra term or just manually specifying the model, I ran DEXSeq using only a basic model (no additional terms: formulaFullModel <- ~ sample + exon + condition:exon formulaReducedModel <- ~ sample + exon ), and had the same problem. This suggests it must be something to do with how I am specifying the model or providing it to the functions...? This is the sample table setup for the basic model: X countFile condition 1 H1 H.r1.counts control 2 M1 M.r1.counts treated 3 H2ac H.r2ac.counts control 4 M2ac M.r2ac.counts treated 5 Hshap H.NA.counts control 6 Mshap M.NA.counts treated I assume the cause is something silly on my part but I am stumped. Any help would be greatly appreciated! Thanks! P.S. I also tried running the pasilla dataset in the same way but that time it worked, normal p-values. -- output of sessionInfo(): Sorry but this session info might not be quite right - the original run was done on a server and the workspace was saved and opened in RStudio. This is the session info from RStudio. R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.12.0 DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 GenomicRanges_1.14.4 hwriter_1.3 [6] IRanges_1.20.6 RCurl_1.95-4.1 Rsamtools_1.14.2 statmod_1.4.18 stats4_3.0.2 [11] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.

DEXSeq DEXSeq • 1.4k views

ADD COMMENT • link updated 10.3 years ago by Steve Lianoglou ★ 13k • written 10.3 years ago by Guest User ★ 13k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi Katherine, You mention running into "the same problem" no matter how the formulas are encoded, but I do not see anywhere in your email what this problem actually is. Can you please explain and show explicitly what the problem actually is? Is an error being thrown somewhere, or? Thanks, -steve On Thu, Jan 16, 2014 at 2:48 PM, KP [guest] <guest at="" bioconductor.org=""> wrote: > > Hi All, > > I have been running DEXSeq successfully on my dataset using the default, in-built linear models, but have been repeatedly running into problems when attempting to add an extra 'blocking' term to the model. > > I have checked the vignette and I thought my usage was correct: > > formulaFullModel <- ~ sample + exon + libType:exon + condition:exon > formulaReducedModel <- ~ sample + exon + libType:exon > > ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile) > ecs <- estimateSizeFactors(ecs) > ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') > ecs<- fitDispersionFunction(ecs) > > ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt') > > To check whether my problem was related to the extra term or just manually specifying the model, I ran DEXSeq using only a basic model (no additional terms: formulaFullModel <- ~ sample + exon + condition:exon > formulaReducedModel <- ~ sample + exon ), and had the same problem. This suggests it must be something to do with how I am specifying the model or providing it to the functions...? > > This is the sample table setup for the basic model: > > X countFile condition > 1 H1 H.r1.counts control > 2 M1 M.r1.counts treated > 3 H2ac H.r2ac.counts control > 4 M2ac M.r2ac.counts treated > 5 Hshap H.NA.counts control > 6 Mshap M.NA.counts treated > > > I assume the cause is something silly on my part but I am stumped. Any help would be greatly appreciated! > > Thanks! > > P.S. I also tried running the pasilla dataset in the same way but that time it worked, normal p-values. > > -- output of sessionInfo(): > > Sorry but this session info might not be quite right - the original run was done on a server and the workspace was saved and opened in RStudio. This is the session info from RStudio. > > R version 3.0.2 (2013-09-25) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.12.0 DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 GenomicRanges_1.14.4 hwriter_1.3 > [6] IRanges_1.20.6 RCurl_1.95-4.1 Rsamtools_1.14.2 statmod_1.4.18 stats4_3.0.2 > [11] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Computational Biologist Genentech

ADD COMMENT • link 10.3 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks Steve, Sorry, the problem is the issue title - that for every gene, I get one p-value for all of the exons in that gene. This didn't happen when I ran DEXSeq without specifying the model. e.g. here is the top of my results table: geneID exonID dispersion pvalue padjust meanBase log2fold(treated/control) 1/2-SBSRNA4 E001 0.11143032705478 0.910345433347734 1 30.9067618720574 0.015363670607322 1/2-SBSRNA4 E002 0.583064733453923 0.910345433347734 1 6.62219699771489 -0.131125949462748 1/2-SBSRNA4 E003 0.767713143685937 0.910345433347734 1 2.21699055323382 0.0731263380599324 A1BG E001 0.123758213037027 0.229711870771802 1 117.754224941408 0.249562974820941 A1BG E002 0.110317995296594 0.229711870771802 1 152.932234776212 -0.0091268347351396 A1BG E003 0.275216588214621 0.229711870771802 1 7.30724196291172 -0.4763859471178 A1BG E004 0.625237938618851 0.229711870771802 1 10.2174627774744 -0.623376548608748 A1BG E005 1e+08 NA NA 0 0.0978995620155234 A1BG E006 0.435477490574276 0.229711870771802 1 4.18241118313392 -0.0329868528499689 I've noticed the format for specifying the model seems to have changed - the original paper used something like: formuladispersion <- count ~ sample + (condition + type) * exon formula0 <- count ~ sample + type * exon + condition formula1 <- count ~ sample + type * exon + condition * I(exon == exonID) Whereas the current vignette uses the format I used: formula1 <- ~ sample + exon + type:exon + condition:exon formula0 <- ~ sample + exon + type:exon Thanks for your help! Cheers, Kat ________________________________________ From: mailinglist.honeypot@gmail.com [mailinglist.honeypot@gmail.com] On Behalf Of Steve Lianoglou [lianoglou.steve@gene.com] Sent: Friday, 17 January 2014 9:33 AM To: KP [guest] Cc: bioconductor at r-project.org list; Pillman, Katherine (Health) Subject: Re: [BioC] DEXSeq - identical p-values for all exons in a gene Hi Katherine, You mention running into "the same problem" no matter how the formulas are encoded, but I do not see anywhere in your email what this problem actually is. Can you please explain and show explicitly what the problem actually is? Is an error being thrown somewhere, or? Thanks, -steve On Thu, Jan 16, 2014 at 2:48 PM, KP [guest] <guest at="" bioconductor.org=""> wrote: > > Hi All, > > I have been running DEXSeq successfully on my dataset using the default, in-built linear models, but have been repeatedly running into problems when attempting to add an extra 'blocking' term to the model. > > I have checked the vignette and I thought my usage was correct: > > formulaFullModel <- ~ sample + exon + libType:exon + condition:exon > formulaReducedModel <- ~ sample + exon + libType:exon > > ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile) > ecs <- estimateSizeFactors(ecs) > ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') > ecs<- fitDispersionFunction(ecs) > > ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt') > > To check whether my problem was related to the extra term or just manually specifying the model, I ran DEXSeq using only a basic model (no additional terms: formulaFullModel <- ~ sample + exon + condition:exon > formulaReducedModel <- ~ sample + exon ), and had the same problem. This suggests it must be something to do with how I am specifying the model or providing it to the functions...? > > This is the sample table setup for the basic model: > > X countFile condition > 1 H1 H.r1.counts control > 2 M1 M.r1.counts treated > 3 H2ac H.r2ac.counts control > 4 M2ac M.r2ac.counts treated > 5 Hshap H.NA.counts control > 6 Mshap M.NA.counts treated > > > I assume the cause is something silly on my part but I am stumped. Any help would be greatly appreciated! > > Thanks! > > P.S. I also tried running the pasilla dataset in the same way but that time it worked, normal p-values. > > -- output of sessionInfo(): > > Sorry but this session info might not be quite right - the original run was done on a server and the workspace was saved and opened in RStudio. This is the session info from RStudio. > > R version 3.0.2 (2013-09-25) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.12.0 DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 GenomicRanges_1.14.4 hwriter_1.3 > [6] IRanges_1.20.6 RCurl_1.95-4.1 Rsamtools_1.14.2 statmod_1.4.18 stats4_3.0.2 > [11] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 10.3 years ago Pillman, Katherine Health ▴ 40

0

Entering edit mode

There is supposed to be one p-value per exon, because you are testing each exon separately to see if it has a different count to what you'd expect if there was no splicing. In other words, the exons with a low p-value are the ones that are included in the transcript more or less in one condition that the other, accounting for differential expression. -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia

ADD REPLY • link 10.3 years ago Dario Strbenac ★ 1.5k

0

Entering edit mode

Hi In your code >> ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile) >> ecs <- estimateSizeFactors(ecs) >> ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') >> ecs<- fitDispersionFunction(ecs) >> >> ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt') you specify the full model formula in 'testForDEU' but do not in 'estimateDispersions'. You have noticed that we changed the way how formulae should be specified. Before, you needed three formulae, now only two, because we got rid of the need for a special "dispersion formula". Rather, you use the full model formula for dispersion estimation, too. However, you have to pass the formula to _both_ functions. Simon

ADD REPLY • link 10.3 years ago Simon Anders ★ 3.7k

0

Entering edit mode

I should add - for specific explanations on these recent changes, please see the Appendix in the vignette. - If the problem persists, please post your sample table. Simon

ADD REPLY • link 10.3 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Hi Dario, Thanks for your reply. Your explanation is correct and makes sense, however my problem is not that I get one p-value per exon. The problem is that the exact same p-value is reported for ALL exons in the gene (or 'NA'), which is not supposed to happen (you're supposed to get different p-values for each exon). See below: geneID exonID dispersion pvalue padjust meanBase log2fold(treated/control) 1/2-SBSRNA4 E001 0.11143032705478 0.910345433347734 1 30.9067618720574 0.015363670607322 1/2-SBSRNA4 E002 0.583064733453923 0.910345433347734 1 6.62219699771489 -0.131125949462748 1/2-SBSRNA4 E003 0.767713143685937 0.910345433347734 1 2.21699055323382 0.0731263380599324 A1BG E001 0.123758213037027 0.229711870771802 1 117.754224941408 0.249562974820941 A1BG E002 0.110317995296594 0.229711870771802 1 152.932234776212 -0.0091268347351396 A1BG E003 0.275216588214621 0.229711870771802 1 7.30724196291172 -0.4763859471178 A1BG E004 0.625237938618851 0.229711870771802 1 10.2174627774744 -0.623376548608748 A1BG E005 1e+08 NA NA 0 0.0978995620155234 A1BG E006 0.435477490574276 0.229711870771802 1 4.18241118313392 -0.0329868528499689 I just tried running models in the old format (as per DEXSeq paper script: ) and kind of seemed to fix the problem... (Yay!??) Old format models: formuladispersion <- count ~ sample + (condition + type) * exon formula0 <- count ~ sample + type * exon + condition formula1 <- count ~ sample + type * exon + condition * I(exon == exonID) But I'm hesitant to change to this method permanently because a) DEXSeq has been updated since the paper was published so it makes more sense to use the current recommendations (as per vignette) than old ones, and b) the old models don't even seem to be very similar to the ones in the vignette and I don't understand how/whether the differences will affect the analysis. Also, the vignette now uses the full model for estimateDispersions, rather than a separate formula and I don't know why or what the effect of not doing this would be, either. For the record, the current vignette models are: formula1 <- ~ sample + exon + type:exon + condition:exon formula0 <- ~ sample + exon + type:exon Cheers, Kat ________________________________________ There is supposed to be one p-value per exon, because you are testing each exon separately to see if it has a different count to what you'd expect if there was no splicing. In other words, the exons with a low p-value are the ones that are included in the transcript more or less in one condition that the other, accounting for differential expression. Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia ________________________________________ From: mailinglist.honeypot@gmail.com [mailinglist.honeypot@gmail.com] On Behalf Of Steve Lianoglou [lianoglou.steve@gene.com] Sent: Friday, 17 January 2014 9:33 AM To: KP [guest] Cc: bioconductor at r-project.org list; Pillman, Katherine (Health) Subject: Re: [BioC] DEXSeq - identical p-values for all exons in a gene Hi Katherine, You mention running into "the same problem" no matter how the formulas are encoded, but I do not see anywhere in your email what this problem actually is. Can you please explain and show explicitly what the problem actually is? Is an error being thrown somewhere, or? Thanks, -steve On Thu, Jan 16, 2014 at 2:48 PM, KP [guest] <guest at="" bioconductor.org=""> wrote: > > Hi All, > > I have been running DEXSeq successfully on my dataset using the default, in-built linear models, but have been repeatedly running into problems when attempting to add an extra 'blocking' term to the model. > > I have checked the vignette and I thought my usage was correct: > > formulaFullModel <- ~ sample + exon + libType:exon + condition:exon > formulaReducedModel <- ~ sample + exon + libType:exon > > ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile) > ecs <- estimateSizeFactors(ecs) > ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') > ecs<- fitDispersionFunction(ecs) > > ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt') > > To check whether my problem was related to the extra term or just manually specifying the model, I ran DEXSeq using only a basic model (no additional terms: formulaFullModel <- ~ sample + exon + condition:exon > formulaReducedModel <- ~ sample + exon ), and had the same problem. This suggests it must be something to do with how I am specifying the model or providing it to the functions...? > > This is the sample table setup for the basic model: > > X countFile condition > 1 H1 H.r1.counts control > 2 M1 M.r1.counts treated > 3 H2ac H.r2ac.counts control > 4 M2ac M.r2ac.counts treated > 5 Hshap H.NA.counts control > 6 Mshap M.NA.counts treated > > > I assume the cause is something silly on my part but I am stumped. Any help would be greatly appreciated! > > Thanks! > > P.S. I also tried running the pasilla dataset in the same way but that time it worked, normal p-values. > > -- output of sessionInfo(): > > Sorry but this session info might not be quite right - the original run was done on a server and the workspace was saved and opened in RStudio. This is the session info from RStudio. > > R version 3.0.2 (2013-09-25) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.12.0 DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 GenomicRanges_1.14.4 hwriter_1.3 > [6] IRanges_1.20.6 RCurl_1.95-4.1 Rsamtools_1.14.2 statmod_1.4.18 stats4_3.0.2 > [11] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 10.3 years ago Pillman, Katherine Health ▴ 40

0

Entering edit mode

Hi Simon, Thanks for your response to my message. I'm sorry but I don't understand what you mean (perhaps this is the root of my problem!). You wrote that I did not specify the full model formula to estimateDispersions. I am confused because I thought that was what I was doing when I wrote "formula=formulaFullModel" in the line: >> ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') Can you please explain further what I have done incorrectly? Do I have the syntax wrong or something? Or maybe you mean the whole model (i.e. both formulas, somehow)? Is it correct that my usage for testForDEU() was correct, but that my problem lies in my call of estimateDispersions()? Also, this is my sample table: X countFile condition type end 1 H1 H.r1.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts control plus pe 2 M1 M.r1.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts treated plus pe 3 H2ac H.r2ac.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts control all pe 4 M2ac M.r2ac.RNA-seq_mRNA.tophat2_pe.hg19.nsorted.counts treated all pe 5 Hs H.NA.RNA-seq_mRNA.tophat2_se.hg19.nsorted.counts control plus se 6 Ms M.NA.RNA-seq_mRNA.tophat2_se.hg19.nsorted.counts treated plus se Thanks again for your help! Cheers, Kat (P.S. Thanks so much for your work on DEXSeq and HTSeq, they are both fantastic packages which has saved us a countless amount of time and energy!!) ________________________________________ Hi In your code >> ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile) >> ecs <- estimateSizeFactors(ecs) >> ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') >> ecs<- fitDispersionFunction(ecs) >> >> ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt') you specify the full model formula in 'testForDEU' but do not in 'estimateDispersions'. You have noticed that we changed the way how formulae should be specified. Before, you needed three formulae, now only two, because we got rid of the need for a special "dispersion formula". Rather, you use the full model formula for dispersion estimation, too. However, you have to pass the formula to _both_ functions. Simon ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: mailinglist.honeypot@gmail.com [mailinglist.honeypot@gmail.com] On Behalf Of Steve Lianoglou [lianoglou.steve@gene.com] Sent: Friday, 17 January 2014 9:33 AM To: KP [guest] Cc: bioconductor at r-project.org list; Pillman, Katherine (Health) Subject: Re: [BioC] DEXSeq - identical p-values for all exons in a gene Hi Katherine, You mention running into "the same problem" no matter how the formulas are encoded, but I do not see anywhere in your email what this problem actually is. Can you please explain and show explicitly what the problem actually is? Is an error being thrown somewhere, or? Thanks, -steve On Thu, Jan 16, 2014 at 2:48 PM, KP [guest] <guest at="" bioconductor.org=""> wrote: > > Hi All, > > I have been running DEXSeq successfully on my dataset using the default, in-built linear models, but have been repeatedly running into problems when attempting to add an extra 'blocking' term to the model. > > I have checked the vignette and I thought my usage was correct: > > formulaFullModel <- ~ sample + exon + libType:exon + condition:exon > formulaReducedModel <- ~ sample + exon + libType:exon > > ecs <- read.HTSeqCounts(countfiles=countFiles, design=sampleTable, flattenedfile=annotationfile) > ecs <- estimateSizeFactors(ecs) > ecs <- estimateDispersions(ecs, formula=formulaFullModel, nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') > ecs<- fitDispersionFunction(ecs) > > ecs <- testForDEU(ecs, formula0=formulaReducedModel, formula1=formulaFullModel, nCores=cores, quiet=FALSE, file='testforDEU_progress_report.txt') > > To check whether my problem was related to the extra term or just manually specifying the model, I ran DEXSeq using only a basic model (no additional terms: formulaFullModel <- ~ sample + exon + condition:exon > formulaReducedModel <- ~ sample + exon ), and had the same problem. This suggests it must be something to do with how I am specifying the model or providing it to the functions...? > > This is the sample table setup for the basic model: > > X countFile condition > 1 H1 H.r1.counts control > 2 M1 M.r1.counts treated > 3 H2ac H.r2ac.counts control > 4 M2ac M.r2ac.counts treated > 5 Hshap H.NA.counts control > 6 Mshap M.NA.counts treated > > > I assume the cause is something silly on my part but I am stumped. Any help would be greatly appreciated! > > Thanks! > > P.S. I also tried running the pasilla dataset in the same way but that time it worked, normal p-values. > > -- output of sessionInfo(): > > Sorry but this session info might not be quite right - the original run was done on a server and the workspace was saved and opened in RStudio. This is the session info from RStudio. > > R version 3.0.2 (2013-09-25) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.12.0 DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 GenomicRanges_1.14.4 hwriter_1.3 > [6] IRanges_1.20.6 RCurl_1.95-4.1 Rsamtools_1.14.2 statmod_1.4.18 stats4_3.0.2 > [11] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Computational Biologist Genentech

ADD REPLY • link 10.3 years ago Pillman, Katherine Health ▴ 40

0

Entering edit mode

Hi Katherine On 19/01/14 23:34, Pillman, Katherine (Health) wrote: > Thanks for your response to my message. I'm sorry but I don't > understand what you mean (perhaps this is the root of my problem!). > You wrote that I did not specify the full model formula to > estimateDispersions. I am confused because I thought that was what I > was doing when I wrote "formula=formulaFullModel" in the line: > >>> ecs <- estimateDispersions(ecs, formula=formulaFullModel, >>> nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') Actually, this line is correct and does contain the formula as it should. Not sure, how I have overlooked this last time. Very sorry! So, your problem must lie somewhere else. Maybe something is wrong with the counts. Could you post the output of "countTableForGene" for one of the genes that have all identical p values? countTableForGene( ecs, "A1BG", normalized=FALSE ) countTableForGene( ecs, "A1BG", normalized=TRUE ) Maybe also post the full code and sample tables once more, because there is some strange mismatch: The formulas you posted earlier mention a factor "libType", which does not appear in your sample table. There, you have, besides "condition", the factors "type" and "end". Simon

ADD REPLY • link 10.3 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Hi Simon/everyone, Thanks for your help troubleshooting this, I believe I have figured out the problem. Embarrassingly, it was a version issue - the version of DEXSeq on my server was not the most current version (it was version 1.7.x). I should have checked earlier but didn?t think to because I didn't realise there had been a recent version change and my basic analyses ran without problems... I presume that the way the model needs to be specified changed between the previous (v1.7.x) and current (v1.8.0) versions? Anyway, I updated BiocGenerics and DEXSeq and now I get unique p-values for each exon in the gene when I specify the models! :) I noticed that the ?news? page for DEXSeq hasn?t been updated yet with the changes for version 1.8.0. Would you mind please adding the changes? Thanks again for your help and sorry for the trouble. Cheers, Katherine ________________________________________ From: Simon Anders [anders@embl.de] Sent: Monday, 20 January 2014 6:45 PM To: bioconductor at r-project.org; Pillman, Katherine (Health) Subject: Re: [BioC] DEXSeq - identical p-values for all exons in a gene Hi Katherine On 19/01/14 23:34, Pillman, Katherine (Health) wrote: > Thanks for your response to my message. I'm sorry but I don't > understand what you mean (perhaps this is the root of my problem!). > You wrote that I did not specify the full model formula to > estimateDispersions. I am confused because I thought that was what I > was doing when I wrote "formula=formulaFullModel" in the line: > >>> ecs <- estimateDispersions(ecs, formula=formulaFullModel, >>> nCores=cores, quiet=FALSE, file='dexseq_progress_report.txt') Actually, this line is correct and does contain the formula as it should. Not sure, how I have overlooked this last time. Very sorry! So, your problem must lie somewhere else. Maybe something is wrong with the counts. Could you post the output of "countTableForGene" for one of the genes that have all identical p values? countTableForGene( ecs, "A1BG", normalized=FALSE ) countTableForGene( ecs, "A1BG", normalized=TRUE ) Maybe also post the full code and sample tables once more, because there is some strange mismatch: The formulas you posted earlier mention a factor "libType", which does not appear in your sample table. There, you have, besides "condition", the factors "type" and "end". Simon

ADD REPLY • link 10.3 years ago Pillman, Katherine Health ▴ 40

Login before adding your answer.