Question

DEXseq: several "1" and "NA" in padjust column

0

Entering edit mode

Vincenzo Capece ▴ 140

@vincenzo-capece-4556

Last seen 9.6 years ago

Dear bioconductorers, I am using DEXseq for the splicing analysis. The pipeline works perfectly, but I have some doubts about the output: on 351000 exons I have just 10 padjust significant ones. Googling I found this thread, in which the user developed my same pipeline and in which she has the same problem: https://stat.ethz.ch/pipermail/bioconductor/2012-February/043476.html This thread finished in a "private" way, but I need to know if there are any bugs(I think fixed, because It past one year); moreover I am interested knowing if the problem was at "user level" or "source code level" and, in the first case, how she fixed the bug. Thanks in advance and congratulation for your package: it is awesome. Regards, Vincenzo [[alternative HTML version deleted]]

DEXSeq DEXSeq • 931 views

ADD COMMENT • link updated 11.2 years ago by Steve Lianoglou ★ 13k • written 11.2 years ago by Vincenzo Capece ▴ 140

score 0 · Answer 1 · 2013-02-05

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi Vincenzo, On Tue, Feb 5, 2013 at 5:11 AM, Vincenzo Capece <vivo0304 at="" gmail.com=""> wrote: > Dear bioconductorers, > I am using DEXseq for the splicing analysis. > The pipeline works perfectly, but I have some doubts about the output: on > 351000 exons I have just 10 padjust significant ones. > Googling I found this thread, in which the user developed my same pipeline > and in which she has the same problem: > https://stat.ethz.ch/pipermail/bioconductor/2012-February/043476.html > This thread finished in a "private" way, but I need to know if there are > any bugs(I think fixed, because It past one year); The thread actually finished off "publicly." Perhaps you thing it went private because you see messages like "embedded and charset-unspecified text was scrubbed" in the emails of the thread that follow the one you posted? If you click on the URL below, it's there: https://stat.ethz.ch/pipermail/bioconductor/attachments/20120213/5f13a 499/attachment.pl That having been said, it's hard to say what (if anything) is going wrong with your analysis without you posting any source code, but you can calculate the adjust pvalues yourself. After running the testForDEU step, the fData(ecs)$pvalue and fData(ecs)$padjust columns should be filled. One could calculate the padjust column manually by doing: adjusted <- p.adjust(fData(ecs)$pvalue, 'BH') but if the values in the pvalue column are NA, so too will be the adjust pvalues. If you have many NA pvalues after your testForDEU, then there might be several reasons why they are there: (1) These exons have been marked as "untestable" (fData(ecs)$testable) during the estimateDispersions step, which might be due to: (a) read counts for the exon below the "minCount" required per exon (b) the exon is in a gene with more than "maxExon" exons (c) the gene only has one exon? (2) The calculation of the pvalue errored and didn't return anything -- not sure why this would happen so frequently, though. (3) ... ? ... > moreover I am interested > knowing if the problem was at "user level" or "source code level" and, in > the first case, how she fixed the bug. I'm not sure if this can be answered without your data, but you will first need to post the code yo used. As for the original poster that you mentioned -- you'll see how her problem was fixed when you click on the URL's in the messages that look "hidden" from view. > Thanks in advance and congratulation for your package: it is awesome. Hear, hear! HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 11.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks a lot Steve. This is the last part of my pipeline(the first part is just the concatenation of strings to build the paths to input files, but they are right): > samples condition replicate 28M_1 28 1 28M_2 28 2 28M_3 28 3 3M_1 3 1 3M_2 3 2 3M_3 3 3 exonCounts=read.HTSeqCounts(countfiles = fullFileNames,design = samples,flattenedfile = annotationFile) exonCounts=estimateSizeFactors(exonCounts) #sizeFactors(exonCounts) exonCounts=estimateDispersions(exonCounts,nCores=32) exonCounts=fitDispersionFunction(exonCounts) exonCounts=estimatelog2FoldChanges(exonCounts,nCores=32) exonCounts=testForDEU(exonCounts,nCores=32) res=DEUresultTable(exonCounts) DEXSeqHTML(exonCounts, path=pathToExonFiles ,color = c("#DDDDDD", "#CCFFFF")) plotOut=paste(pathToExonFiles,"res.csv",sep="/") write.csv(res,file=plotOut) This is the command that you suggested me to use: > table(fData(exonCounts)$testable) FALSE TRUE 151884 199146 If you think that my pipeline is right, I can send you the file "res.csv" in private way. Thanks a lot On 5 February 2013 12:17, Steve Lianoglou <mailinglist.honeypot@gmail.com>wrote: > Hi Vincenzo, > > On Tue, Feb 5, 2013 at 5:11 AM, Vincenzo Capece <vivo0304@gmail.com> > wrote: > > Dear bioconductorers, > > I am using DEXseq for the splicing analysis. > > The pipeline works perfectly, but I have some doubts about the output: on > > 351000 exons I have just 10 padjust significant ones. > > Googling I found this thread, in which the user developed my same > pipeline > > and in which she has the same problem: > > https://stat.ethz.ch/pipermail/bioconductor/2012-February/043476.html > > This thread finished in a "private" way, but I need to know if there are > > any bugs(I think fixed, because It past one year); > > The thread actually finished off "publicly." Perhaps you thing it went > private because you see messages like "embedded and > charset-unspecified text was scrubbed" in the emails of the thread > that follow the one you posted? > > If you click on the URL below, it's there: > > https://stat.ethz.ch/pipermail/bioconductor/attachments/20120213/5f1 3a499/attachment.pl > > That having been said, it's hard to say what (if anything) is going > wrong with your analysis without you posting any source code, but you > can calculate the adjust pvalues yourself. > > After running the testForDEU step, the fData(ecs)$pvalue and > fData(ecs)$padjust columns should be filled. One could calculate the > padjust column manually by doing: > > adjusted <- p.adjust(fData(ecs)$pvalue, 'BH') > > but if the values in the pvalue column are NA, so too will be the > adjust pvalues. > > If you have many NA pvalues after your testForDEU, then there might be > several reasons why they are there: > > (1) These exons have been marked as "untestable" (fData(ecs)$testable) > during the estimateDispersions step, which might be due to: > (a) read counts for the exon below the "minCount" required per exon > (b) the exon is in a gene with more than "maxExon" exons > (c) the gene only has one exon? > (2) The calculation of the pvalue errored and didn't return anything > -- not sure why this would happen so frequently, though. > (3) ... ? ... > > > moreover I am interested > > knowing if the problem was at "user level" or "source code level" and, in > > the first case, how she fixed the bug. > > I'm not sure if this can be answered without your data, but you will > first need to post the code yo used. > > As for the original poster that you mentioned -- you'll see how her > problem was fixed when you click on the URL's in the messages that > look "hidden" from view. > > > Thanks in advance and congratulation for your package: it is awesome. > > Hear, hear! > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- Regards, Capece Vincenzo [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Vincenzo Capece ▴ 140

0

Entering edit mode

Hi, Your code seems correct -- although I think the vignette has you you estimate log2foldChanges after differential exon usage testing, but that shouldn't matter. It also looks as if you have plenty of testable exons (almost 200k), so the question is: (1) For every exon that is testable, do you get a value in the `pvalue` column for it? That is to say, the only NA's in the pvalue column should be for rows that are not `testable`, eg: R> all(!is.na(subset(fData(exonCounts), testable)$pvalue)) (2) Are the NA's in your padjust column only present because the pvalue column is also NA? R> allis.na(fData(exonCounts)$pvalue) == is.na(fData(exonCounts)$padjust)) (3) If 1 and 2 are kosher -- we might be running out of things to debug, and perhaps the high pvalues are what we're stuck with here. What if you plot the log2fold(what/ever) aginst the -log10(pvalue)? Do you see large fold changes that are getting small pvalues that look like they should be "real"? That is to say, large fold changes might come from a region of extreme low expression (although the vst xform I think does try to mitigate that), so while the fold change is "high", the pvalue won't be. HTH, -steve On Tue, Feb 5, 2013 at 6:50 AM, Vincenzo Capece <vivo0304 at="" gmail.com=""> wrote: > Thanks a lot Steve. > This is the last part of my pipeline(the first part is just the > concatenation of strings to build the paths to input files, but they are > right): > >> samples > condition replicate > 28M_1 28 1 > 28M_2 28 2 > 28M_3 28 3 > 3M_1 3 1 > 3M_2 3 2 > 3M_3 3 3 > > exonCounts=read.HTSeqCounts(countfiles = fullFileNames,design = > samples,flattenedfile = annotationFile) > > exonCounts=estimateSizeFactors(exonCounts) > > #sizeFactors(exonCounts) > > exonCounts=estimateDispersions(exonCounts,nCores=32) > > exonCounts=fitDispersionFunction(exonCounts) > > exonCounts=estimatelog2FoldChanges(exonCounts,nCores=32) > > exonCounts=testForDEU(exonCounts,nCores=32) > > res=DEUresultTable(exonCounts) > > DEXSeqHTML(exonCounts, path=pathToExonFiles ,color = c("#DDDDDD", > "#CCFFFF")) > > plotOut=paste(pathToExonFiles,"res.csv",sep="/") > write.csv(res,file=plotOut) > > This is the command that you suggested me to use: > >> table(fData(exonCounts)$testable) > > FALSE TRUE > 151884 199146 > > If you think that my pipeline is right, I can send you the file "res.csv" in > private way. > > Thanks a lot > > > On 5 February 2013 12:17, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> > wrote: >> >> Hi Vincenzo, >> >> On Tue, Feb 5, 2013 at 5:11 AM, Vincenzo Capece <vivo0304 at="" gmail.com=""> >> wrote: >> > Dear bioconductorers, >> > I am using DEXseq for the splicing analysis. >> > The pipeline works perfectly, but I have some doubts about the output: >> > on >> > 351000 exons I have just 10 padjust significant ones. >> > Googling I found this thread, in which the user developed my same >> > pipeline >> > and in which she has the same problem: >> > https://stat.ethz.ch/pipermail/bioconductor/2012-February/043476.html >> > This thread finished in a "private" way, but I need to know if there are >> > any bugs(I think fixed, because It past one year); >> >> The thread actually finished off "publicly." Perhaps you thing it went >> private because you see messages like "embedded and >> charset-unspecified text was scrubbed" in the emails of the thread >> that follow the one you posted? >> >> If you click on the URL below, it's there: >> >> https://stat.ethz.ch/pipermail/bioconductor/attachments/20120213/5f 13a499/attachment.pl >> >> That having been said, it's hard to say what (if anything) is going >> wrong with your analysis without you posting any source code, but you >> can calculate the adjust pvalues yourself. >> >> After running the testForDEU step, the fData(ecs)$pvalue and >> fData(ecs)$padjust columns should be filled. One could calculate the >> padjust column manually by doing: >> >> adjusted <- p.adjust(fData(ecs)$pvalue, 'BH') >> >> but if the values in the pvalue column are NA, so too will be the >> adjust pvalues. >> >> If you have many NA pvalues after your testForDEU, then there might be >> several reasons why they are there: >> >> (1) These exons have been marked as "untestable" (fData(ecs)$testable) >> during the estimateDispersions step, which might be due to: >> (a) read counts for the exon below the "minCount" required per exon >> (b) the exon is in a gene with more than "maxExon" exons >> (c) the gene only has one exon? >> (2) The calculation of the pvalue errored and didn't return anything >> -- not sure why this would happen so frequently, though. >> (3) ... ? ... >> >> > moreover I am interested >> > knowing if the problem was at "user level" or "source code level" and, >> > in >> > the first case, how she fixed the bug. >> >> I'm not sure if this can be answered without your data, but you will >> first need to post the code yo used. >> >> As for the original poster that you mentioned -- you'll see how her >> problem was fixed when you click on the URL's in the messages that >> look "hidden" from view. >> >> > Thanks in advance and congratulation for your package: it is awesome. >> >> Hear, hear! >> >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > > -- > Regards, > Capece Vincenzo -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 11.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Dear Steve, > all(!is.na(subset(fData(exonCounts), testable)$pvalue)) [1] TRUE > allis.na(fData(exonCounts)$pvalue) == is.na(fData(exonCounts)$padjust)) [1] TRUE Do you see large fold changes that are getting small pvalues that look like they should be "real"? No I don't Thanks About the point 3, I am not quite sure if I understand well. Anyway, I asked to a collegue and he said that On 5 February 2013 14:12, Steve Lianoglou <mailinglist.honeypot@gmail.com>wrote: > Hi, > > Your code seems correct -- although I think the vignette has you you > estimate log2foldChanges after differential exon usage testing, but > that shouldn't matter. > > It also looks as if you have plenty of testable exons (almost 200k), > so the question is: > > (1) For every exon that is testable, do you get a value in the > `pvalue` column for it? That is to say, the only NA's in the pvalue > column should be for rows that are not `testable`, eg: > > R> all(!is.na(subset(fData(exonCounts), testable)$pvalue)) > > (2) Are the NA's in your padjust column only present because the > pvalue column is also NA? > > R> allis.na(fData(exonCounts)$pvalue) == is.na > (fData(exonCounts)$padjust)) > > (3) If 1 and 2 are kosher -- we might be running out of things to > debug, and perhaps the high pvalues are what we're stuck with here. > What if you plot the log2fold(what/ever) aginst the -log10(pvalue)? Do > you see large fold changes that are getting small pvalues that look > like they should be "real"? That is to say, large fold changes might > come from a region of extreme low expression (although the vst xform I > think does try to mitigate that), so while the fold change is "high", > the pvalue won't be. > > HTH, > -steve > > On Tue, Feb 5, 2013 at 6:50 AM, Vincenzo Capece <vivo0304@gmail.com> > wrote: > > Thanks a lot Steve. > > This is the last part of my pipeline(the first part is just the > > concatenation of strings to build the paths to input files, but they are > > right): > > > >> samples > > condition replicate > > 28M_1 28 1 > > 28M_2 28 2 > > 28M_3 28 3 > > 3M_1 3 1 > > 3M_2 3 2 > > 3M_3 3 3 > > > > exonCounts=read.HTSeqCounts(countfiles = fullFileNames,design = > > samples,flattenedfile = annotationFile) > > > > exonCounts=estimateSizeFactors(exonCounts) > > > > #sizeFactors(exonCounts) > > > > exonCounts=estimateDispersions(exonCounts,nCores=32) > > > > exonCounts=fitDispersionFunction(exonCounts) > > > > exonCounts=estimatelog2FoldChanges(exonCounts,nCores=32) > > > > exonCounts=testForDEU(exonCounts,nCores=32) > > > > res=DEUresultTable(exonCounts) > > > > DEXSeqHTML(exonCounts, path=pathToExonFiles ,color = c("#DDDDDD", > > "#CCFFFF")) > > > > plotOut=paste(pathToExonFiles,"res.csv",sep="/") > > write.csv(res,file=plotOut) > > > > This is the command that you suggested me to use: > > > >> table(fData(exonCounts)$testable) > > > > FALSE TRUE > > 151884 199146 > > > > If you think that my pipeline is right, I can send you the file > "res.csv" in > > private way. > > > > Thanks a lot > > > > > > On 5 February 2013 12:17, Steve Lianoglou < > mailinglist.honeypot@gmail.com> > > wrote: > >> > >> Hi Vincenzo, > >> > >> On Tue, Feb 5, 2013 at 5:11 AM, Vincenzo Capece <vivo0304@gmail.com> > >> wrote: > >> > Dear bioconductorers, > >> > I am using DEXseq for the splicing analysis. > >> > The pipeline works perfectly, but I have some doubts about the output: > >> > on > >> > 351000 exons I have just 10 padjust significant ones. > >> > Googling I found this thread, in which the user developed my same > >> > pipeline > >> > and in which she has the same problem: > >> > https://stat.ethz.ch/pipermail/bioconductor/2012-February/043476.html > >> > This thread finished in a "private" way, but I need to know if there > are > >> > any bugs(I think fixed, because It past one year); > >> > >> The thread actually finished off "publicly." Perhaps you thing it went > >> private because you see messages like "embedded and > >> charset-unspecified text was scrubbed" in the emails of the thread > >> that follow the one you posted? > >> > >> If you click on the URL below, it's there: > >> > >> > https://stat.ethz.ch/pipermail/bioconductor/attachments/20120213/5f1 3a499/attachment.pl > >> > >> That having been said, it's hard to say what (if anything) is going > >> wrong with your analysis without you posting any source code, but you > >> can calculate the adjust pvalues yourself. > >> > >> After running the testForDEU step, the fData(ecs)$pvalue and > >> fData(ecs)$padjust columns should be filled. One could calculate the > >> padjust column manually by doing: > >> > >> adjusted <- p.adjust(fData(ecs)$pvalue, 'BH') > >> > >> but if the values in the pvalue column are NA, so too will be the > >> adjust pvalues. > >> > >> If you have many NA pvalues after your testForDEU, then there might be > >> several reasons why they are there: > >> > >> (1) These exons have been marked as "untestable" (fData(ecs)$testable) > >> during the estimateDispersions step, which might be due to: > >> (a) read counts for the exon below the "minCount" required per exon > >> (b) the exon is in a gene with more than "maxExon" exons > >> (c) the gene only has one exon? > >> (2) The calculation of the pvalue errored and didn't return anything > >> -- not sure why this would happen so frequently, though. > >> (3) ... ? ... > >> > >> > moreover I am interested > >> > knowing if the problem was at "user level" or "source code level" and, > >> > in > >> > the first case, how she fixed the bug. > >> > >> I'm not sure if this can be answered without your data, but you will > >> first need to post the code yo used. > >> > >> As for the original poster that you mentioned -- you'll see how her > >> problem was fixed when you click on the URL's in the messages that > >> look "hidden" from view. > >> > >> > Thanks in advance and congratulation for your package: it is awesome. > >> > >> Hear, hear! > >> > >> HTH, > >> -steve > >> > >> -- > >> Steve Lianoglou > >> Graduate Student: Computational Systems Biology > >> | Memorial Sloan-Kettering Cancer Center > >> | Weill Medical College of Cornell University > >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > > > > > > > -- > > Regards, > > Capece Vincenzo > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- Regards, Capece Vincenzo [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Vincenzo Capece ▴ 140