HTMLReport with Ensembl IDs, with multifactorial design
1
0
Entering edit mode
@federico-marini-6465
Last seen 5 months ago
Germany

Dear all,

as a spin-off of this other topic (HTMLReport of DESeq2 results with Ensembl Gene Ids), here's my question:

I have a dataset with multifactorial design, aligned and quantified with Ensembl references/annotations, and I would like to get the DE genes in each contrast to be reported in the nice framework of ReportingTools.

Now, I have 3 levels for one factor (condition) and 2 for the other one (tissue)

ddsmf <- DESeqDataSetFromMatrix(countMatrix,
colData = data.frame(condition=samplesDesign$strain, tissue = samplesDesign$tissueID),
design = ~ tissue + condition)

ddsmf <- DESeq(ddsmf,parallel=T)

res_mf_tissue1vs2 <- results(ddsmf, contrast = c("tissue","t1","t2"))

res_mf_condition1vs2 <- results(ddsmf, contrast = c("condition","c1","c2"))
res_mf_condition1vs3 <- results(ddsmf, contrast = c("condition","c1","c3"))
res_mf_condition2vs3 <- results(ddsmf, contrast = c("condition","c2","c3"))

library("hwriter")
library("biomaRt")
library("ReportingTools")

add.anns <-  function(df, mart, ...) {

nm <- rownames(df)
anns <- getBM( attributes = c("ensembl_gene_id", "mgi_symbol", "description"), filters = "ensembl_gene_id", values = nm, mart = mart)
anns <- anns[match(nm, anns[, 1]), ]
colnames(anns) <- c("Ensembl", "Gene Symbol", "Gene Description")
df <- cbind(anns, df[, 2:ncol(df)]) # slight modification to the suggestion of James
rownames(df) <- nm
df
}

function(df, ...){
naind <- is.na(df$Ensembl) df$Ensembl <- hwrite(as.character(df$Ensembl), link = paste0("http://www.ensembl.org/Mus_musculus/Gene/Summary?g=", as.character(df$Ensembl)), table = FALSE)
df$Ensembl[naind] <- "" return(df) } mart <- useMart("ensembl",dataset="mmusculus_gene_ensembl") Now, my best case scenario is that I specify the contrast as in the results function of DESeq2, and the corresponding DE genes are extracted and reported. Something like... rt_cfrcond <- HTMLReport(shortName = 'report_cond',title = 'report_cond',reportDirectory = "./_reports") publish(ddsmf, rt_cfrcond, .modifyDF = list(add.anns,modifyReportDF,ensemblLinks), mart = mart,factor=colData(ddsmf)$condition)
finish(rt_cfrcond)

When I provide the factor as

colData(ddsmf)$condition then I get the comparison between first and last one (as probably expected from the DESeq2 behaviour), also with the nice boxplots (for all 3 factors, actually). So I can't do it for the other comparisons I had in mind - only by "cheating" and providing the DESeqResults (where I provide the contrasts), but then I lose the nice boxplots of normalized counts. More strangely, when I provide the tissues as the contrast, the results are provided as if I am contrasting conditions... rt_tis <- HTMLReport(shortName = 'report_tissues',title = 'report_tissues',reportDirectory = "./_reports") publish(ddsmf, rt_tis, .modifyDF = list(add.anns,modifyReportDF,ensemblLinks), mart = mart,factor=colData(ddsmf)$tissue)
finish(rt_tis)

As in the other post I linked, I tried already with some searching around, but was unsuccessful in finding *the* solution for my problem, so I thought I could be helped/and maybe helpful to others here.

So, my question is somehow two-fold:

• How can I provide the contrasts correctly when checking on the factor with 3 levels?
• What am I doing wrong/what do I need to tweak for the contrast on tissue to be reported correctly?

Thanks in advance to all who can provide hints and suggestions - I hope my case can be interesting to others too!

Federico

Here is my sessionInfo() output:

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] ReportingTools_2.6.0      AnnotationDbi_1.28.1
[3] Biobase_2.26.0            RSQLite_1.0.0
[5] DBI_0.3.1                 knitr_1.9
[7] biomaRt_2.22.0            hwriter_1.3.2
[9] BiocParallel_1.0.3        DESeq2_1.6.3
[13] GenomicRanges_1.18.4      GenomeInfoDb_1.2.4
[15] IRanges_2.0.1             S4Vectors_0.4.0
[17] BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3          annotate_1.44.0          AnnotationForge_1.8.2
[4] base64enc_0.1-2          BatchJobs_1.5            BBmisc_1.9
[7] Biostrings_2.34.1        biovizBase_1.14.1        bitops_1.0-6
[10] brew_1.0-6               BSgenome_1.34.1          Category_2.32.0
[13] checkmate_1.5.1          cluster_2.0.1            codetools_0.2-11
[16] colorspace_1.2-6         dichromat_2.0-0          digest_0.6.8
[19] edgeR_3.8.6              evaluate_0.5.5           fail_1.2
[22] foreach_1.4.2            foreign_0.8-63           formatR_1.0
[25] Formula_1.2-0            genefilter_1.48.1        geneplotter_1.44.0
[28] GenomicAlignments_1.2.2  GenomicFeatures_1.18.3   GGally_0.5.0
[31] ggbio_1.14.0             ggplot2_1.0.0            GO.db_3.0.0
[34] GOstats_2.32.0           graph_1.44.1             grid_3.1.2
[37] gridExtra_0.9.1          GSEABase_1.28.0          gtable_0.1.2
[40] Hmisc_3.15-0             iterators_1.0.7          lattice_0.20-30
[43] latticeExtra_0.6-26      limma_3.22.7             locfit_1.5-9.1
[46] MASS_7.3-39              Matrix_1.1-5             munsell_0.4.2
[49] nnet_7.3-9               OrganismDbi_1.8.1        PFAM.db_3.0.0
[52] plyr_1.8.1               proto_0.3-10             R.methodsS3_1.7.0
[55] R.oo_1.19.0              R.utils_2.0.0            RBGL_1.42.0
[58] RColorBrewer_1.1-2       RCurl_1.95-4.5           reshape_0.8.5
[61] reshape2_1.4.1           rpart_4.1-9              Rsamtools_1.18.3
[64] rtracklayer_1.26.2       scales_0.2.4             sendmailR_1.2-1
[67] splines_3.1.2            stringr_0.6.2            survival_2.38-1
[70] tools_3.1.2              VariantAnnotation_1.12.9 XML_3.98-1.1
[73] xtable_1.7-4             XVector_0.6.0            zlibbioc_1.12.0
DESeq2 ReportingTools • 994 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Passing in a DESeqDataSet to publish() is a bit complicated, as it doesn't contain what you need, so results() has to be run on it first. From ?publish:

 DESeqDataSet:
‘publish(object, publicationType, factor = NULL, n = 1000,
pvalueCutoff = 0.01, lfc = 0, contrast = NULL, resultName =
NULL, make.plots = TRUE, ..., name)’ To coerce a
‘DESeqDataSet’ to a ‘data.frame’, we use the ‘results’
function from the ‘DESeq2’ package, therefore ‘DESeq’, or
something similar must be run prior to ‘publish’. ‘contrast’
and ‘resultName’ are passed to the ‘results’ function as the
‘contrast’ and ‘name’ parameters, please consult the
documentation for ‘results’ to see how these are specified.
If present, ‘annotation.db’ is used to find gene-level
annotations for the rows in the ‘DESeqDataSet’. Stripplots
showing the expression values for each transcript are
produced based on the normalized counts from the
‘DESeqDataSet’, grouped by the levels in ‘factor’.

It is in general really cool that you can pass various objects into publish() and get reasonable results. But in this case, publish() needs more information in order to generate the results you are looking for.

I find that the element of surprise is reduced the more you simplify the input yourself. So while you can pass in a DESeqDataSet and get stuff out, using a DESeqResults object will probably yield less surprising results. And considering you already have them floating around, that seems like a reasonable thing to do:

 DESeqResults:
‘publish(object, publicationType, DataSet = NULL,
annotation.db = NULL, n = 500, pvalueCutoff = 0.01, lfc = 0,
make.plots = TRUE, ..., name)’ To coerce a ‘DESeqResults’
object to a ‘data.frame’, we filter the results ‘DataFrame’
such that the absolute log fold changes is greater than ‘lfc’
and the adjusted p-value is less than ‘pvalueCutoff’. If
present, ‘annotation.db’ is used to find gene-level
annotations for the rows in the ‘DESeqResults’. If
‘make.plots’ is TRUE, stripplots showing the expression
values for each transcript are produced based on the
normalized counts from the ‘DataSet’, which should be of
class ‘DESeqDataSet’, grouped by the levels in ‘factor’.

But in the end, ReportingTools uses a data.frame to create the output. If you want to have ultimate control of the output, you can just generate whatever data.frame you want to present, and then use that. But then you have to cob together some code to make the sweet glyphs that ReportingTools puts in the HTML tables.

As with all things there are tradeoffs; the more control you have over the finished product, the more work you have to do to get there.

<shameless plug>

If you want to be a control freak like me, see makeImages in my affycoretools package

</shameless plug>

0
Entering edit mode

I agree with James. In DESeq2 version 1.4 we added a class DESeqResults to wrap up the results object, which is a simple DataFrame. This allows downstream packages like ReportingTools to annotate the results table directly, so that downstream packages can avoid trying to mimic a results() call internally.

You say,

"only by cheating and providing the DESeqResults (where I provide the contrasts), but then I lose the nice boxplots of normalized counts."

But did you try passing the DESeqResults to object, and the DESeqDataSet to DataSet?

0
Entering edit mode

Thank you James.

The shameful part for me was that I did not spot correctly the contrast parameter.

With that, if specified correctly, it seems to work like a charm.

Where I should not thank you is for teasing the inner strings of my tiny control freak - but hey, in the end that has always been a push to try and learn a little more on coding especially in R :)

@Mike: you're right, that way works also pretty well. Thank you too for chiming in!

Federico