HTMLReport of DESeq2 results with Ensembl Gene Ids
2
0
Entering edit mode
@noushinfarnoud-6928
Last seen 6.2 years ago
United States

Hello All,

I would like to generate an HTML report from DESeq2 results of RNASeq analysis. I have followed the example in the ReportingTool documentation and have no problem with that, but unlike that example my deseq result dataset does not have Entrez IDs and instead it has Ensembl IDs. And I have not been able to adapt the report function for my dataset.


I also tried converting Ensembl to Entrez IDs prior to using the "publish" function (below), but the issue is that multiple of my dataset row names become NULL following Ensemble-to-Entrez conversion and that causes error in the downstream "publish" function (I can eliminate those, but prefer not to).

Below I have included an example where mockRnaSeqData has Entrez IDs and report1 is successfully created. The second dataset mydata_Ensemble is a subset of this data which has Ensembl Gene IDs. I greatly appreciate if you could tell me how you modify publish command to work with mydata_Ensemble?

Many Thanks
Noushin

rm(list=ls())
library(DESeq2)

data(mockRnaSeqData)
mockRnaSeqData <- mockRnaSeqData[1:5,]
## the second dataset which has arbitrary Ensembl gene IDs (instead of the Entrez IDs)
mydata_Ensemble <- mockRnaSeqData[1:5,]
row.names(mydata_Ensemble) <- c("ENSMUSG00000030359", "ENSMUSG00000020804", "ENSMUSG00000025375", "ENSMUSG00000015243", "ENSMUSG00000028125")

conditions <- c(rep("case",3), rep("control", 3))
mockRna.dse <- DESeqDataSetFromMatrix(countData = mockRnaSeqData,colData = as.data.frame(conditions), design = ~ conditions)

## repeat the same for the Ensembl-dataset example
test.dse <- DESeqDataSetFromMatrix(countData = mydata_Ensemble,colData = as.data.frame(conditions), design = ~ conditions)

colData(mockRna.dse)$conditions <- factor(colData(mockRna.dse)$conditions, levels=c("control", "case"))
colData(test.dse)$conditions <- factor(colData(test.dse)$conditions, levels=c("control", "case"))

## Get a DESeqDataSet object for both datasets
mockRna.dse <- DESeq(mockRna.dse)
test.dse <- DESeq(test.dse)

## Write mockRna results to a report

report1 <- HTMLReport(shortName = 'RNAseq_analysis_with_DESeq2_Entrez.html',title = 'RNA-seq analysis of differential expression using DESeq2',reportDirectory = "./reports")
publish(mockRna.dse,report1, pvalueCutoff=0.05, annotation.db="org.Mm.eg.db", factor = colData(mockRna.dse)$conditions, reportDir="./reports")
finish(report1)

## How to write test.dse results to a report?

 

reportingtools publish deseq2 • 2.0k views
ADD COMMENT
0
Entering edit mode

hi Noushin,

We try to discourage cross posting on multiple forums, as it scatters the answers and duplicates effort. link to another post: https://www.biostars.org/p/124153/, and at that site someone pointed to this earlier Bioc answer [ReportingTools] HTMLReport of DESeq2 results using Ensembl Gene Ids

ADD REPLY
0
Entering edit mode

I don't see why this would bring confusion! I have not yet received a working answer on this. That reply was a link to a post that I had seen and tried before posting my question. It does not work if not all Ensembl genes have a unique corresponding Entrez id (which happens to be my case!). 
Publish function does not work when a dataset has NaN or repeated row names, and when such rows are eliminated from the DESeq dataset prior to 'publish', it generates a completely different error. So I am still hoping one posts a workaround for this problem.

ADD REPLY
4
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Mike didn't say it was confusing. He said it scatters answers and duplicates effort. You are asking people to give their time (for free) to help you figure out things that you are having problems with. In return, all you are being asked to do is to stick with one forum, so we can have a consistent set of questions and answers. Is that really too much to ask?

In addition, the answer Mike pointed you to does work, and has nothing to do with Entrez IDs, because the annotation is done using biomaRt, annotating via the Ensembl IDs you have supplied. So whatever you saw and tried before was different from what Mike pointed you to.

To be absolutely explicit:

## annotate using biomaRt
## note this is slightly different from what Mike pointed you to, as we
## are calling the IDs 'Ensembl', and are using mgi_symbol instead of hgnc_symbol
add.anns <- function(df, mart, ...) {
        nm <- rownames(df)
    anns <- getBM( attributes = c("ensembl_gene_id", "mgi_symbol", "description"), filters = "ensembl_gene_id", values = nm, mart = mart)
    anns <- anns[match(nm, anns[, 1]), ]
    colnames(anns) <- c("Ensembl", "Gene Symbol", "Gene Description")
    df <- cbind(anns, df[, 2:nrow(df)])
    rownames(df) <- nm
    df
}

#Add links to Ensembl.org, because that's how we roll.
ensemblLinks <- function(df, ...){
    naind <- is.na(df$Ensembl)
    df$Ensembl <- hwrite(as.character(df$Ensembl), link = paste0("http://www.ensembl.org/Mus_musculus/Gene/Summary?g=",
        as.character(df$Ensembl)), table = FALSE)
    df$Ensembl[naind] <- ""
    return(df)
}
## assuming here that you have already created the test.dse object as before
temp <- HTMLReport("index","whatever")
publish(test.dse, temp, .modifyDF = list(add.anns, modifyReportDF, ensemblLinks), mart = mart, pvalueCutoff = 1, factor = colData(test.dse)$conditions)
## note that you have to use a large pvalueCutoff, because test.dse has large p-values
finish(temp)
browseURL("index.html")

 

ADD COMMENT
0
Entering edit mode
@federico-marini-6465
Last seen 5 months ago
Germany

Hi James,

Hopping in and first of all thanking you for the nice alternative you propose here.

My situation is that I have a multifactorial design for my data, and I would like to generate the reports that show the DE genes in all contrasts. My problem is that I have 3 categories for one factor (condition) (and 2 for the tissue factor, but that is no problem to obtain).

To my understanding if I provide the factor as

colData(dds)$condition

then I get the comparison between first and last one (as probably expected from the DESeq2 behaviour), also with the nice boxplots (for all 3 factors, actually).

I alternatively tried with starting from the DESeqResults object, but the error is

Fehler in .make.gene.plots(df, eSet, factor, figure.directory, ...) :
  Can't find expression data for some features

and it goes away if I suppress the plots.

 

I tried already with some searching around, but was unsuccessful in finding *the* solution for my problem, so I thought I could be helped/and maybe helpful to others here. Suggestions?

Thanks in advance!

Federico

 

ADD COMMENT
0
Entering edit mode

hi Federico,

Could you make a new post? I think this might be best as a new topic. 

Could you include:

* all the code you used

* the output of sessionInfo()

And tag it with reportingtools and deseq2?

ADD REPLY
0
Entering edit mode

Sure.

The discussion can go on here: HTMLReport with Ensembl IDs, with multifactorial design

Thanks Mike!

 

 

ADD REPLY

Login before adding your answer.

Traffic: 232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6