DEXSeq all p-values are 1

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hello, I'm trying to use DEXSeq to identify alternative exon usage. Using DESeq I've identified ~200 differentially expressed genes in my gene set. I've basically applied the guidelines from the manual to my data set - single reads in duplicates +/- treatment. I've played around with the parameters in different ways, but no matter how I do it the adjusted p-values all come out as 1 or N/A. The non-adjusted p-values are pretty high, so I reckon the adjusted p-values are "true", however, when I go true single genes I find exons that have really high fold-change values indicating differential expression. Is this a result one can expect (due to e.g. high variance in replicates) or is it possible that something is wrong in my analysis? Regards, Philip -- output of sessionInfo(): R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DEXSeq_1.4.0 Biobase_2.16.0 BiocGenerics_0.2.0 [4] BiocInstaller_1.4.9 svMisc_0.9-65 JavaGD_0.5-5 [7] rJava_0.9-3 loaded via a namespace (and not attached): [1] RCurl_1.91-1 XML_3.9-4 biomaRt_2.14.0 hwriter_1.3 plyr_1.7.1 [6] statmod_1.4.16 stringr_0.6.1 -- Sent via the guest posting facility at bioconductor.org.

GO DEXSeq GO DEXSeq • 943 views

ADD COMMENT • link updated 11.4 years ago by Steve Lianoglou ★ 13k • written 11.4 years ago by Guest User ★ 13k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 13 months ago

United States

Hi, It's hard to provide any meaningful help, since we really don't have any information about your data, or what you code (code examples) to identify a problem. But: On Thu, Dec 13, 2012 at 10:08 AM, Philip [guest] <guest at="" bioconductor.org=""> wrote: > > Hello, > > I'm trying to use DEXSeq to identify alternative exon usage. Using DESeq I've identified ~200 differentially expressed genes in my gene set. I've basically applied the guidelines from the manual to my data set - single reads in duplicates +/- treatment. Does that mean you have 4, singe end read runs? 2 biological replicates for (+) treatment, and 2 for (-) treatment? > I've played around with the parameters in different ways, Which parameters? What ways? How did you count reads / bin? How did you define the bins? > but no matter how I do it the adjusted p-values all come out as 1 or N/A. The non-adjusted p-values are pretty high, so I reckon the adjusted p-values are "true", however, when I go true single genes I find exons that have really high fold-change values indicating differential expression. High dispersion values can make high fold changes statistically insignificant. Did you explore the quality of your replicates? How? How do they look? > Is this a result one can expect (due to e.g. high variance in replicates) That can explain it. There is, of course, always the possibility that there is very little differential splicing in your experiment. > or is it possible that something is wrong in my analysis? It is always possible that there is something wrong with an analysis. As I mentioned at the start, without knowing more about your analysis and seeing the code, there is no way that anybody can answer this question. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 11.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

I realize that it can be hard to nail down a source/source of this (possible) error. So, the data is four single-end runs - two treated and two control- treated samples. The design I set up looks like this: condition replicate ./treatedsample1.txt Treatment 1 ./treatedsample2.txt Treatment 2 ./controlsample1.txt Vehicle 1 ./controlsample2.txt Vehicle 2 After using read.HTSeqCounts I use estimateSizeFactors, followed by estimateDispersions. For estimateDispersions I get the same results whether I set minCount to 0 or 10 and whether I set maxExon to default 70 or something higher, like 1000. I have not tried to change initialGuess, since I don't really grasp what it means. I also leave formula unchanged. Afterwards I use fitDispersionFunction and testForDEU with default parameters. My ExonCountSet was created with the package's Python scripts. I used a GFF file from UCSC which is of the same genome build as used for the read mapping. There is variance across my replicates, but it doesn't seem too extreme. I could, as mentioned, call differentially expressed genes with DESeq without problems. Do the values for dispBeforeSharing, dispFitCoefs, or dispFitted tell me anything about this? On 13 December 2012 10:14, Steve Lianoglou <mailinglist.honeypot@gmail.com>wrote: > Hi, > > It's hard to provide any meaningful help, since we really don't have > any information about your data, or what you code (code examples) to > identify a problem. > > But: > > On Thu, Dec 13, 2012 at 10:08 AM, Philip [guest] <guest@bioconductor.org> > wrote: > > > > Hello, > > > > I'm trying to use DEXSeq to identify alternative exon usage. Using DESeq > I've identified ~200 differentially expressed genes in my gene set. I've > basically applied the guidelines from the manual to my data set - single > reads in duplicates +/- treatment. > > Does that mean you have 4, singe end read runs? 2 biological > replicates for (+) treatment, and 2 for (-) treatment? > > > I've played around with the parameters in different ways, > > Which parameters? > What ways? > How did you count reads / bin? > How did you define the bins? > > > but no matter how I do it the adjusted p-values all come out as 1 or > N/A. The non-adjusted p-values are pretty high, so I reckon the adjusted > p-values are "true", however, when I go true single genes I find exons that > have really high fold-change values indicating differential expression. > > High dispersion values can make high fold changes statistically > insignificant. > > Did you explore the quality of your replicates? How? How do they look? > > > Is this a result one can expect (due to e.g. high variance in replicates) > > That can explain it. > > There is, of course, always the possibility that there is very little > differential splicing in your experiment. > > > or is it possible that something is wrong in my analysis? > > It is always possible that there is something wrong with an analysis. > As I mentioned at the start, without knowing more about your analysis > and seeing the code, there is no way that anybody can answer this > question. > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago Philip Jonsson ▴ 20

0

Entering edit mode

Hi, On Thu, Dec 13, 2012 at 11:59 AM, Philip Jonsson <philip.jonsson at="" gmail.com=""> wrote: > I realize that it can be hard to nail down a source/source of this > (possible) error. > > So, the data is four single-end runs - two treated and two control- treated > samples. The design I set up looks like this: > condition replicate > ./treatedsample1.txt Treatment 1 > ./treatedsample2.txt Treatment 2 > ./controlsample1.txt Vehicle 1 > ./controlsample2.txt Vehicle 2 > > After using read.HTSeqCounts I use estimateSizeFactors, followed > by estimateDispersions. For estimateDispersions I get the same results > whether I set minCount to 0 or 10 and whether I set maxExon to default 70 > or something higher, like 1000. I have not tried to change initialGuess, > since I don't really grasp what it means. I also leave formula unchanged. > Afterwards I use fitDispersionFunction and testForDEU with default > parameters. > > My ExonCountSet was created with the package's Python scripts. I used a GFF > file from UCSC which is of the same genome build as used for the read > mapping. > > There is variance across my replicates, but it doesn't seem too extreme. I > could, as mentioned, call differentially expressed genes with DESeq without > problems. Do the values for dispBeforeSharing, dispFitCoefs, or dispFitted > tell me anything about this? There is a `plotDispEsts` function which is defined in the DEXSeq vignette (not sure why it's not included & exported in the package) which is useful to see how your dispersions look. This is the function: plotDispEsts = function( cds, ymin, linecol="#ff000080", xlab = "mean of normalized counts", ylab = "dispersion", log = "xy", cex = 0.45, ... ) { px = rowMeans( counts( cds, normalized=TRUE ) ) sel = (px>0) px = px[sel] py = fData(cds)$dispBeforeSharing[sel] if(missing(ymin)) ymin = 10^floor(log10(min(py[py>0], na.rm=TRUE))-0.1) plot(px, pmax(py, ymin), xlab=xlab, ylab=ylab, log=log, pch=ifelse(py<ymin, 6,="" 16),="" cex="cex," ...="" )="" xg="10^seq(" -.5,="" 5,="" length.out="100" )="" fun="function(x)" {="" cds="" at="" dispfitcoefs[1]="" +="" cds="" at="" dispfitcoefs[2]="" x="" }="" lines(="" xg,="" fun(xg),="" col="linecol," lwd="4)" }="" after="" you="" call="" estimatedispersions,="" you="" can="" call="" that="" on="" your="" exoncountset="" to="" see="" how="" your="" dispersions="" "look".="" that="" having="" been="" said,="" the="" description="" of="" the="" steps="" you="" are="" taking="" in="" your="" analysis="" sound="" quite="" right,="" so="" it="" might="" be="" a="" (1)="" data="" quality="" problem="" (but="" you="" say="" they="" look="" fine);="" or="" (2)="" no="" real="" differential="" exon="" usage="" ;-)="" perhaps="" simon="" and="" crew="" can="" chime="" in="" with="" some="" tips="" they="" might="" have="" from="" experience="" w="" other="" data,="" since="" i="" guess="" they're="" likely="" to="" be="" the="" most="" seasoned="" dexseq-ers.="" -steve="" --="" steve="" lianoglou="" graduate="" student:="" computational="" systems="" biology="" |="" memorial="" sloan-kettering="" cancer="" center="" |="" weill="" medical="" college="" of="" cornell="" university="" contact="" info:="" http:="" cbio.mskcc.org="" ~lianos="" contact="" <="" div="">

ADD REPLY • link 11.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

With only two replicates for each condition, you will not have a huge amount of power to detect differential expression. Combine that with the fact that you are looking at exons and not whole genes, so your counts will be much lower, and you will have a difficult time achieving statistical significance. -Ryan On 12/13/2012 08:59 AM, Philip Jonsson wrote: > I realize that it can be hard to nail down a source/source of this > (possible) error. > > So, the data is four single-end runs - two treated and two control- treated > samples. The design I set up looks like this: > condition replicate > ./treatedsample1.txt Treatment 1 > ./treatedsample2.txt Treatment 2 > ./controlsample1.txt Vehicle 1 > ./controlsample2.txt Vehicle 2 > > After using read.HTSeqCounts I use estimateSizeFactors, followed > by estimateDispersions. For estimateDispersions I get the same results > whether I set minCount to 0 or 10 and whether I set maxExon to default 70 > or something higher, like 1000. I have not tried to change initialGuess, > since I don't really grasp what it means. I also leave formula unchanged. > Afterwards I use fitDispersionFunction and testForDEU with default > parameters. > > My ExonCountSet was created with the package's Python scripts. I used a GFF > file from UCSC which is of the same genome build as used for the read > mapping. > > There is variance across my replicates, but it doesn't seem too extreme. I > could, as mentioned, call differentially expressed genes with DESeq without > problems. Do the values for dispBeforeSharing, dispFitCoefs, or dispFitted > tell me anything about this? > > On 13 December 2012 10:14, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com="">wrote: > >> Hi, >> >> It's hard to provide any meaningful help, since we really don't have >> any information about your data, or what you code (code examples) to >> identify a problem. >> >> But: >> >> On Thu, Dec 13, 2012 at 10:08 AM, Philip [guest] <guest at="" bioconductor.org=""> >> wrote: >>> Hello, >>> >>> I'm trying to use DEXSeq to identify alternative exon usage. Using DESeq >> I've identified ~200 differentially expressed genes in my gene set. I've >> basically applied the guidelines from the manual to my data set - single >> reads in duplicates +/- treatment. >> >> Does that mean you have 4, singe end read runs? 2 biological >> replicates for (+) treatment, and 2 for (-) treatment? >> >>> I've played around with the parameters in different ways, >> Which parameters? >> What ways? >> How did you count reads / bin? >> How did you define the bins? >> >>> but no matter how I do it the adjusted p-values all come out as 1 or >> N/A. The non-adjusted p-values are pretty high, so I reckon the adjusted >> p-values are "true", however, when I go true single genes I find exons that >> have really high fold-change values indicating differential expression. >> >> High dispersion values can make high fold changes statistically >> insignificant. >> >> Did you explore the quality of your replicates? How? How do they look? >> >>> Is this a result one can expect (due to e.g. high variance in replicates) >> That can explain it. >> >> There is, of course, always the possibility that there is very little >> differential splicing in your experiment. >> >>> or is it possible that something is wrong in my analysis? >> It is always possible that there is something wrong with an analysis. >> As I mentioned at the start, without knowing more about your analysis >> and seeing the code, there is no way that anybody can answer this >> question. >> >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.4 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.