Hi all,
I'm a big fan of the many plots produced by the clusterProfiler and enrichplot packages. The functions such as emapplot()
require the enrichment results to have been generated using clusterProfiler (and a few other related packages, I think). However, I've carried out GO/KEGG enrichment analyses using GOSeq so I can account for any potential sequence length biases.
Is there a way to to use the various plotting functions with data from other programs like GOSeq? Can we convert the results into the format required for emapplot()
etc? The results for GOSeq, at least, are just stored in a data frame with the usual columns (GO term, adjusted p-value, number of genes in each category etc.)
Alternatively, any other useful programs for plotting results would be appreciated.
Thanks!
Dear Charles,
Thanks for your question and appreciate your solution.
I'm using genelist from DNA methylation results to search for their GO and KEGG pathways via DAVID web-based tools
Could you please advise the workable command with only genelist?
Really appreciate that. :)
Best wishes, WF
Hi WF,
First comment: DAVID is very out of date now, so you probably shouldn't use it. However, if you do still choose to use DAVID, run your DAVID analyses directly through the clusterProfiler package: https://guangchuangyu.github.io/2015/03/david-functional-analysis-with-clusterprofiler/. That way you don't have to do any data reformatting etc. to do the plots.
Charles
Dear Charles,
I am so impressed the way you handled this! Thank you! Could you teach me how you created geneSets list? Did you use getgo() function or anything else? Since now I am working with unsupported organism (Nicotiana benthaiana), I am not sure how to get GO terms and corresponding genes.
Thanks.
There are two broad steps you'll need to take..
You'll need to choose a method. For example, you could pay for the
blast2go
program, or use the Trinotate pipeline, or annotate against the closest reference genome using (e.g.) Blast. The latter option integrates nicely with biomaRt in R. For example: blast your de novo contigs against the Nicotiana attenuata reference genome from Ensembl to get Ensembl gene IDs. Then, you can run the following:You'll end up with a data frame where column1 has ensembl gene IDs, and column 2 has the GO terms annotated to those genes. It's a long-form data frame, so there will be a many-to-one mapping.
You'll need to use your mapped GO terms as input to a GO enrichment program. Since this post is about clusterProfiler, I assume you're interested in the plots it can generate. Therefore, the easiest option would be to follow the tutorials provided with clusterProfiler to run your enrichment analyses directly through that library (rather than reformat the results from other programs). I just chose to use
goseq
because it takes into account biases inherent in GO enrichment from RNAseq results, but the program can be hard to use with non-reference organisms.enrichPlot requirements are rather low. All what it needs is to provide few columns named specifically. I student of mine wrote the first version of this script and I slightly modified it. It takes results from gProfiler and plot them in a clusterProfiler-like dotplot:
Sorry, my previous message when without code, or non properly formatted.