DEXSeq: Problems trying to reproduce the example of the vignette

0

Entering edit mode

Simone ▴ 180

@simone-5854

Last seen 5.9 years ago

Hi! As the subject says, I've got some problems with DEXSeq. For the moment, I am just trying to reproduce the example of the Vignette. The first error that occurrs is when I try to plot the per-gene disperion estimates using plotDispEsts: > library("DEXSeq") > library("pasilla") > library("parallel") > data("pasillaExons", package="pasilla") > pasillaExons <- estimateSizeFactors(pasillaExons) > pasillaExons <- estimateDispersions(pasillaExons, nCores=3) Estimating Cox-Reid exon dispersion estimates using 3 cores. (Progress report: one dot per 100 genes) > pasillaExons <- fitDispersionFunction(pasillaExons) > plotDispEsts(pasillaExons) Error: is(cds, "CountDataSet") is not TRUE > traceback() 4: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), ch), call. = FALSE, domain = NA) 3: stopifnot(is(cds, "CountDataSet")) 2: fitInfo(cds, name = name) 1: plotDispEsts(pasillaExons) What am I doing wrong here? In general I work with the vignette which comes with the package (last revision 2013-02-27), but I found a piece of code in an older version which seems to work however: > meanvalues <- rowMeans(counts(pasillaExons)) > plot(meanvalues, fData(pasillaExons)$dispBeforeSharing, log = "xy", main = "mean vs CR dispersion") > x <- 0.01:max(meanvalues) > y <- pasillaExons at dispFitCoefs[1] + pasillaExons at dispFitCoefs[2]/x > lines(x, y, col = "red") The second problem I have got is the MA plot after testing for differential exon usage: > pasillaExons <- testForDEU(pasillaExons, nCores=3) Testing for differential exon usage using 3 cores. (Progress report: one dot per 100 genes) > pasillaExons <- estimatelog2FoldChanges(pasillaExons) > res <- DEUresultTable(pasillaExons) > plotMA(with(res, data.frame(baseMean=meanBase, log2FoldChange=log2fold, padj=padjust)), ylim=c(-4,4), cex=0.8) This works fine. But I wanted to change the threshold for the color highlighting to "padjust < 0.05" (although this wouldn't change anything in this case, I just wanted to try). So I did the following: > plotMA(with(res, data.frame(baseMean=meanBase, log2FoldChange=log2fold, padj=padjust)), ylim=c(-4,4), cex=0.8, col=ifelse(res$padj >= 0.05, "gray32", "red3")) But whenever I use the col parameter with whatever value (also with simply one color), the output is different and more data points than before get displayed. I do not really understand this effect. I worked with R 2.15 and the newest version of DEXSeq but updated everything yesterday (R and all packages, see sessionInfo below) to see if this changes something, but it didn't. Of course I can just copy the function from the source and define my own one changing the threshold, but I'd prefer to understand what's going on here. An another small question regarding this plot: Reading the vignette I could not find out what the triangles mean which sometimes appear. Do they indicate outliers (lying outside the plotting area), or is this something completely different? And my last question: I read in http://seqanswers.com/forums/showthread.php?t=13299 that it should be possible with DEXSeq to analyze paired samples as well. In my case I have for every individual patient two (or more) different celltypes which I would like to analyze, so I would have to use a paired test for differential exon usage. In the thread mentioned it was said that there is more information about how to do this in the vignette available, but I cannot find it. How would I have to add the patient information to account for paired samples in DEXSeq? Sorry for asking that many (pretty ignorant) questions, it's the first time I am trying to use a package dealing with RNAseq-data, and I hope someone can help. Best whishes, Simone > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pasilla_0.2.16 DESeq_1.12.0 lattice_0.20-15 locfit_1.5-9.1 DEXSeq_1.6.0 [6] Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5 [6] DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 GenomicRanges_1.12.4 grid_3.0.1 [11] hwriter_1.3 IRanges_1.18.1 RColorBrewer_1.0-5 RCurl_1.95-4.1 Rsamtools_1.12.3 [16] RSQLite_0.11.4 splines_3.0.1 statmod_1.4.17 stats4_3.0.1 stringr_0.6.2 [21] survival_2.37-4 tools_3.0.1 XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0

DEXSeq DEXSeq • 1.7k views

ADD COMMENT • link updated 10.9 years ago by Wolfgang Huber ★ 13k • written 10.9 years ago by Simone ▴ 180

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 16 days ago

EMBL European Molecular Biology Laborat…

Dear Simone thank you for your feedback! Your problem with 'plotDispEsts' is an oversight of ours in the last release of the DEXSeq package: the function 'plotDispEsts' in the DEXSeq vignette is defined 'on the fly' in the global workspace by the vignette (see http://bioconductor.org/p ackages/release/bioc/vignettes/DEXSeq/inst/doc/DEXSeq.R chunk 10). What happens in your case is that you probably didn't run this, and picked up the DESeq::plotDispEsts function instead, which throws the error. The sane solution, of course, would be for us to define this as a generic function (e.g. in BiocGenerics) and dispatch on argument type. This will be the case in the next release. Till then, as a workaround, please run the R code in the vignette from the .R file provided with the package, rather than copy-pasting it from the PDF. I am not sure I understand your problem with 'plotMA'; what do you get from table(res$padjust, useNA="always") Perhaps it has to do with NA in res$padjust, which would lead to some points having NA colours (and thus not being shown) only when you use below-mentioned ifelse-expression. As for your third question, isn't this solved by just adding another covariate for 'patient' which will absorb the patient effects? Hope this helps, best wishes Wolfgang On Jun 12, 2013, at 3:30 pm, Simone <enomis.bioc at="" gmail.com=""> wrote: > Hi! > > As the subject says, I've got some problems with DEXSeq. For the > moment, I am just trying to reproduce the example of the Vignette. The > first error that occurrs is when I try to plot the per-gene disperion > estimates using plotDispEsts: > >> library("DEXSeq") >> library("pasilla") >> library("parallel") >> data("pasillaExons", package="pasilla") > >> pasillaExons <- estimateSizeFactors(pasillaExons) >> pasillaExons <- estimateDispersions(pasillaExons, nCores=3) > Estimating Cox-Reid exon dispersion estimates using 3 cores. (Progress > report: one dot per 100 genes) >> pasillaExons <- fitDispersionFunction(pasillaExons) > >> plotDispEsts(pasillaExons) > Error: is(cds, "CountDataSet") is not TRUE >> traceback() > 4: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), > ch), call. = FALSE, domain = NA) > 3: stopifnot(is(cds, "CountDataSet")) > 2: fitInfo(cds, name = name) > 1: plotDispEsts(pasillaExons) > > What am I doing wrong here? In general I work with the vignette which > comes with the package (last revision 2013-02-27), but I found a piece > of code in an older version which seems to work however: > >> meanvalues <- rowMeans(counts(pasillaExons)) >> plot(meanvalues, fData(pasillaExons)$dispBeforeSharing, log = "xy", main = "mean vs CR dispersion") >> x <- 0.01:max(meanvalues) >> y <- pasillaExons at dispFitCoefs[1] + pasillaExons at dispFitCoefs[2]/x >> lines(x, y, col = "red") > > The second problem I have got is the MA plot after testing for > differential exon usage: > >> pasillaExons <- testForDEU(pasillaExons, nCores=3) > Testing for differential exon usage using 3 cores. (Progress report: > one dot per 100 genes) >> pasillaExons <- estimatelog2FoldChanges(pasillaExons) >> res <- DEUresultTable(pasillaExons) >> plotMA(with(res, data.frame(baseMean=meanBase, log2FoldChange=log2fold, padj=padjust)), ylim=c(-4,4), cex=0.8) > > This works fine. But I wanted to change the threshold for the color > highlighting to "padjust < 0.05" (although this wouldn't change > anything in this case, I just wanted to try). So I did the following: > >> plotMA(with(res, data.frame(baseMean=meanBase, log2FoldChange=log2fold, padj=padjust)), ylim=c(-4,4), cex=0.8, col=ifelse(res$padj >= 0.05, "gray32", "red3")) > > But whenever I use the col parameter with whatever value (also with > simply one color), the output is different and more data points than > before get displayed. I do not really understand this effect. > > I worked with R 2.15 and the newest version of DEXSeq but updated > everything yesterday (R and all packages, see sessionInfo below) to > see if this changes something, but it didn't. Of course I can just > copy the function from the source and define my own one changing the > threshold, but I'd prefer to understand what's going on here. > > An another small question regarding this plot: Reading the vignette I > could not find out what the triangles mean which sometimes appear. Do > they indicate outliers (lying outside the plotting area), or is this > something completely different? > > And my last question: I read in > http://seqanswers.com/forums/showthread.php?t=13299 that it should be > possible with DEXSeq to analyze paired samples as well. In my case I > have for every individual patient two (or more) different celltypes > which I would like to analyze, so I would have to use a paired test > for differential exon usage. In the thread mentioned it was said that > there is more information about how to do this in the vignette > available, but I cannot find it. How would I have to add the patient > information to account for paired samples in DEXSeq? > > Sorry for asking that many (pretty ignorant) questions, it's the first > time I am trying to use a package dealing with RNAseq-data, and I hope > someone can help. > > Best whishes, > Simone > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] pasilla_0.2.16 DESeq_1.12.0 lattice_0.20-15 > locfit_1.5-9.1 DEXSeq_1.6.0 > [6] Biobase_2.20.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.6 biomaRt_2.16.0 > Biostrings_2.28.0 bitops_1.0-5 > [6] DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 > GenomicRanges_1.12.4 grid_3.0.1 > [11] hwriter_1.3 IRanges_1.18.1 RColorBrewer_1.0-5 > RCurl_1.95-4.1 Rsamtools_1.12.3 > [16] RSQLite_0.11.4 splines_3.0.1 statmod_1.4.17 > stats4_3.0.1 stringr_0.6.2 > [21] survival_2.37-4 tools_3.0.1 XML_3.96-1.1 > xtable_1.7-1 zlibbioc_1.6.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, thank you very much for your reply. > plotDispEsts > as a workaround, please run the R code in the vignette from the > .R file provided with the package I missed this! Now it works fine. > I am not sure I understand your problem with 'plotMA'; what do you get from > table(res$padjust, useNA="always") Indeed, there are NA-values in the table. But when I remove all NAs of the data.frame the plot still behaves different when adding the color parameter. Different data points than without setting the parameter seem to get highlighted and additionally some data points disappear. I still do not really understand what's going on here and how I should effectively change the coloring regarding the p-value threshold I want to use in my analysis without causing such "side effects". Any suggestion? Sorry for the confusion, but I would really like to understand this. > As for your third question, isn't this solved by just adding another > covariate for 'patient' which will absorb the patient effects? Well, yes, I also thought that I'd do it like this, but just wanted to be sure that I am not missing any additional parameter I would have to set explicitely for paired tests or something. So simply adjusting the code to the following should be the right way to account for paired patient samples, shouldn't it? formuladispersion <- count ~ sample + (condition + patient) * exon myExons <- estimateDispersions(myExons, formula=formuladispersion) myExons <- fitDispersionFunction(myExons) formula0 <- count ~ sample + patient * exon + condition formula1 <- count ~ sample + patient * exon + condition * I(exon == exonID) myExons <- testForDEU(myExons, formula0=formula0, formula1=formula1) Best wishes, Simone

ADD REPLY • link 10.9 years ago Simone ▴ 180

Login before adding your answer.