Question: HTMLReport of DESeq2 results with Ensembl Gene Ids
0
4.7 years ago by
United States
noushin.farnoud0 wrote:

Hello All,

I would like to generate an HTML report from DESeq2 results of RNASeq analysis. I have followed the example in the ReportingTool documentation and have no problem with that, but unlike that example my deseq result dataset does not have Entrez IDs and instead it has Ensembl IDs. And I have not been able to adapt the report function for my dataset.

I also tried converting Ensembl to Entrez IDs prior to using the "publish" function (below), but the issue is that multiple of my dataset row names become NULL following Ensemble-to-Entrez conversion and that causes error in the downstream "publish" function (I can eliminate those, but prefer not to).

Below I have included an example where mockRnaSeqData has Entrez IDs and report1 is successfully created. The second dataset mydata_Ensemble is a subset of this data which has Ensembl Gene IDs. I greatly appreciate if you could tell me how you modify publish command to work with mydata_Ensemble?

Many Thanks
Noushin

rm(list=ls())
library(DESeq2)

data(mockRnaSeqData)
mockRnaSeqData <- mockRnaSeqData[1:5,]
## the second dataset which has arbitrary Ensembl gene IDs (instead of the Entrez IDs)
mydata_Ensemble <- mockRnaSeqData[1:5,]
row.names(mydata_Ensemble) <- c("ENSMUSG00000030359", "ENSMUSG00000020804", "ENSMUSG00000025375", "ENSMUSG00000015243", "ENSMUSG00000028125")

conditions <- c(rep("case",3), rep("control", 3))
mockRna.dse <- DESeqDataSetFromMatrix(countData = mockRnaSeqData,colData = as.data.frame(conditions), design = ~ conditions)

## repeat the same for the Ensembl-dataset example
test.dse <- DESeqDataSetFromMatrix(countData = mydata_Ensemble,colData = as.data.frame(conditions), design = ~ conditions)

colData(mockRna.dse)$conditions <- factor(colData(mockRna.dse)$conditions, levels=c("control", "case"))
colData(test.dse)$conditions <- factor(colData(test.dse)$conditions, levels=c("control", "case"))

## Get a DESeqDataSet object for both datasets
mockRna.dse <- DESeq(mockRna.dse)
test.dse <- DESeq(test.dse)

## Write mockRna results to a report

report1 <- HTMLReport(shortName = 'RNAseq_analysis_with_DESeq2_Entrez.html',title = 'RNA-seq analysis of differential expression using DESeq2',reportDirectory = "./reports")
publish(mockRna.dse,report1, pvalueCutoff=0.05, annotation.db="org.Mm.eg.db", factor = colData(mockRna.dse)$conditions, reportDir="./reports") finish(report1) ## How to write test.dse results to a report? reportingtools deseq2 publish • 1.6k views ADD COMMENTlink modified 4.4 years ago by Federico Marini120 • written 4.7 years ago by noushin.farnoud0 hi Noushin, We try to discourage cross posting on multiple forums, as it scatters the answers and duplicates effort. link to another post: https://www.biostars.org/p/124153/, and at that site someone pointed to this earlier Bioc answer [ReportingTools] HTMLReport of DESeq2 results using Ensembl Gene Ids ADD REPLYlink written 4.7 years ago by Michael Love24k I don't see why this would bring confusion! I have not yet received a working answer on this. That reply was a link to a post that I had seen and tried before posting my question. It does not work if not all Ensembl genes have a unique corresponding Entrez id (which happens to be my case!). Publish function does not work when a dataset has NaN or repeated row names, and when such rows are eliminated from the DESeq dataset prior to 'publish', it generates a completely different error. So I am still hoping one posts a workaround for this problem. ADD REPLYlink written 4.6 years ago by noushin.farnoud0 Answer: HTMLReport of DESeq2 results with Ensembl Gene Ids 4 4.6 years ago by United States James W. MacDonald50k wrote: Mike didn't say it was confusing. He said it scatters answers and duplicates effort. You are asking people to give their time (for free) to help you figure out things that you are having problems with. In return, all you are being asked to do is to stick with one forum, so we can have a consistent set of questions and answers. Is that really too much to ask? In addition, the answer Mike pointed you to does work, and has nothing to do with Entrez IDs, because the annotation is done using biomaRt, annotating via the Ensembl IDs you have supplied. So whatever you saw and tried before was different from what Mike pointed you to. To be absolutely explicit: ## annotate using biomaRt ## note this is slightly different from what Mike pointed you to, as we ## are calling the IDs 'Ensembl', and are using mgi_symbol instead of hgnc_symbol add.anns <- function(df, mart, ...) { nm <- rownames(df) anns <- getBM( attributes = c("ensembl_gene_id", "mgi_symbol", "description"), filters = "ensembl_gene_id", values = nm, mart = mart) anns <- anns[match(nm, anns[, 1]), ] colnames(anns) <- c("Ensembl", "Gene Symbol", "Gene Description") df <- cbind(anns, df[, 2:nrow(df)]) rownames(df) <- nm df } #Add links to Ensembl.org, because that's how we roll. ensemblLinks <- function(df, ...){ naind <- is.na(df$Ensembl)
df$Ensembl <- hwrite(as.character(df$Ensembl), link = paste0("http://www.ensembl.org/Mus_musculus/Gene/Summary?g=",
as.character(df$Ensembl)), table = FALSE) df$Ensembl[naind] <- ""
return(df)
}
## assuming here that you have already created the test.dse object as before
temp <- HTMLReport("index","whatever")
publish(test.dse, temp, .modifyDF = list(add.anns, modifyReportDF, ensemblLinks), mart = mart, pvalueCutoff = 1, factor = colData(test.dse)$conditions) ## note that you have to use a large pvalueCutoff, because test.dse has large p-values finish(temp) browseURL("index.html")  ADD COMMENTlink written 4.6 years ago by James W. MacDonald50k Answer: HTMLReport of DESeq2 results with Ensembl Gene Ids 0 4.4 years ago by Germany Federico Marini120 wrote: Hi James, Hopping in and first of all thanking you for the nice alternative you propose here. My situation is that I have a multifactorial design for my data, and I would like to generate the reports that show the DE genes in all contrasts. My problem is that I have 3 categories for one factor (condition) (and 2 for the tissue factor, but that is no problem to obtain). To my understanding if I provide the factor as colData(dds)$condition

then I get the comparison between first and last one (as probably expected from the DESeq2 behaviour), also with the nice boxplots (for all 3 factors, actually).

I alternatively tried with starting from the DESeqResults object, but the error is

Fehler in .make.gene.plots(df, eSet, factor, figure.directory, ...) :
Can't find expression data for some features

and it goes away if I suppress the plots.

I tried already with some searching around, but was unsuccessful in finding *the* solution for my problem, so I thought I could be helped/and maybe helpful to others here. Suggestions?

Federico

hi Federico,

Could you make a new post? I think this might be best as a new topic.

Could you include:

* all the code you used

* the output of sessionInfo()

And tag it with reportingtools and deseq2?

Sure.

The discussion can go on here: HTMLReport with Ensembl IDs, with multifactorial design

Thanks Mike!