Using the plotting functions of clusterProfiler / enrichplot with enrichment results from other programs
2
3
Entering edit mode
@charlesfoster-17652
Last seen 51 minutes ago
Australia

Hi all,

I'm a big fan of the many plots produced by the clusterProfiler and enrichplot packages. The functions such as emapplot() require the enrichment results to have been generated using clusterProfiler (and a few other related packages, I think). However, I've carried out GO/KEGG enrichment analyses using GOSeq so I can account for any potential sequence length biases.

Is there a way to to use the various plotting functions with data from other programs like GOSeq? Can we convert the results into the format required for emapplot() etc? The results for GOSeq, at least, are just stored in a data frame with the usual columns (GO term, adjusted p-value, number of genes in each category etc.)

Alternatively, any other useful programs for plotting results would be appreciated.

Thanks!

clusterProfiler enrichplot • 5.8k views
ADD COMMENT
5
Entering edit mode
@charlesfoster-17652
Last seen 51 minutes ago
Australia

In case anyone is wondering in the future, I managed to solve the problem. The solution is a bit fiddly, but at least it works.

First, load the clusterProfiler package, then read in your GOSeq enrichment results into a new data frame (in this case it's called results_df). You need to format the dataframe so that the columns match those of an enricher() output. I won't go into how you do that as I'm sure others can figure it out, but the columns need to be:

colnames(results_df) <- c("ID","Description","GeneRatio","BgRatio","pvalue","p.adjust", "qvalue", "geneID","Count")

Next, make a new "enrichResult" object:

my_object <- new("enrichResult",
readable = FALSE,
result = results_df,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
qvalueCutoff = 0.2,
organism = "UNKNOWN",
ontology = "UNKNOWN",
gene = DE_genes_vector,
keytype = "UNKNOWN",
universe = universe_vector,
gene2Symbol = character(0),
geneSets = geneSets)

where DE_genes is a vector of the names of your DE genes, universe_vector is a vector of the names of all genes that are annotated with GO terms (your enrichment universe). geneSets is a named list, where the names are enriched GO terms and the elements are DE genes annotated with that GO term.

Finally, you should be able to use my_object with all of the nice plotting functions, such as cnetplot(). To make sure, check the class of my_object:

class(my_object) [1] "enrichResult" attr(,"package") [1] "DOSE"

Hope this helps someone else!

ADD COMMENT
0
Entering edit mode

Dear Charles,

Thanks for your question and appreciate your solution.

I'm using genelist from DNA methylation results to search for their GO and KEGG pathways via DAVID web-based tools

Could you please advise the workable command with only genelist?

Really appreciate that. :)

Best wishes, WF

ADD REPLY
0
Entering edit mode

Hi WF,

First comment: DAVID is very out of date now, so you probably shouldn't use it. However, if you do still choose to use DAVID, run your DAVID analyses directly through the clusterProfiler package: https://guangchuangyu.github.io/2015/03/david-functional-analysis-with-clusterprofiler/. That way you don't have to do any data reformatting etc. to do the plots.

Charles

ADD REPLY
0
Entering edit mode

Dear Charles,

I am so impressed the way you handled this! Thank you! Could you teach me how you created geneSets list? Did you use getgo() function or anything else? Since now I am working with unsupported organism (Nicotiana benthaiana), I am not sure how to get GO terms and corresponding genes.

Thanks.

ADD REPLY
0
Entering edit mode

There are two broad steps you'll need to take..

  • Annotate genes with GO terms

You'll need to choose a method. For example, you could pay for the blast2go program, or use the Trinotate pipeline, or annotate against the closest reference genome using (e.g.) Blast. The latter option integrates nicely with biomaRt in R. For example: blast your de novo contigs against the Nicotiana attenuata reference genome from Ensembl to get Ensembl gene IDs. Then, you can run the following:

library(biomaRt)
ensembl <- useEnsemblGenomes(biomart = "plants_mart", dataset = "nattenuata_eg_gene")
genes <- readLines('gene_IDs.txt') # the ensembl genes you recovered, one per line

gene_go_mapping <- getBM(attributes = c('ensembl_gene_id', 'go_id'),
      filters = 'ensembl_gene_id',
      values = genes, 
      mart = ensembl)

You'll end up with a data frame where column1 has ensembl gene IDs, and column 2 has the GO terms annotated to those genes. It's a long-form data frame, so there will be a many-to-one mapping.

  • Run GO enrichment analysis

You'll need to use your mapped GO terms as input to a GO enrichment program. Since this post is about clusterProfiler, I assume you're interested in the plots it can generate. Therefore, the easiest option would be to follow the tutorials provided with clusterProfiler to run your enrichment analyses directly through that library (rather than reformat the results from other programs). I just chose to use goseq because it takes into account biases inherent in GO enrichment from RNAseq results, but the program can be hard to use with non-reference organisms.

ADD REPLY
0
Entering edit mode

enrichPlot requirements are rather low. All what it needs is to provide few columns named specifically. I student of mine wrote the first version of this script and I slightly modified it. It takes results from gProfiler and plot them in a clusterProfiler-like dotplot:

Sorry, my previous message when without code, or non properly formatted.

library(enrichplot)
library(DOSE)
library(ggplot2)
library(GOfuncR)
library(RColorBrewer)

# Set working directory for files
setwd("/Users/juanjovel/jj/data_analysis/coltonUnger/DE_analysis/mm10_plus_RNAspades_assembly/apeglm_shrinkage_240521/geneOntology")

# Import gProfiler results
infile <- 'LS1Tib_down_GOenrich.txt'
outfile <- gsub(".txt", "_reformated.txt", infile)

res <- read.table(infile, sep = '\t', header = T)

# Change adjust_p_value to 2 decimal places
res$p_value <- signif(res$p_value, digits = 3)

# Select only fields of interest
res <- res[c("source", "term_id", "term_name", "p_value", "query_size",
                   "intersection_size", "term_size", "effective_domain_size", "parents")]


res$GeneRatio = paste0(res$intersection_size,  "/", res$query_size)
res$BgRatio = paste0(res$term_size, "/", res$effective_domain_size)

# Rename columns
names(res) = c("Category", "ID", "Description", "p.adjust",
               "query_size", "Count", "term_size", "effective_domain_size",
               "geneID", "GeneRatio", "BgRatio")

# define as enrichResult object
res_enrich  = new("enrichResult", result = res)

# OPTIONAL
# Write .csv file with all GO info
write.csv(res_enrich, outfile)

# Filter the results for the BP category and define as enrichResult object
#res_enrich_BP <- res_enrich[res_enrich$Category == "GO:BP", ]
#res_enrich_BP  = new("enrichResult", result = res_enrich_BP)

pal1 <- colorRampPalette(c("red","white", "blue"))(n = 299)


# Create and save dotplot
plot <- dotplot(res_enrich, x = "GeneRatio",
                color = "p.adjust",
                showCategory = 10, size = NULL, split = NULL,
                font.size = 12) +
                # If no palette is specified, it will default to the usual re-blue
                # To use standard palettes uncomment next two lines
                scale_fill_distiller(palette = "Spectral") +
                scale_color_distiller(palette = "Spectral")
                # If you want to use a manual palette uncomment next two lines
                # and comment two lines above
                #scale_fill_gradientn(colors = pal1) +
                #scale_color_gradientn(colors = pal1)

# Print the plot
print(plot)
ADD REPLY

Login before adding your answer.

Traffic: 573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6