Unable to generate a dotplot to view the enrichment analysis results (clusterProfiler/ enrichplot)
0
0
Entering edit mode
rama ▴ 10
@215babe0
Last seen 21 months ago
United Kingdom

Hello,

I am trying to generate a dotplot from the enrichment analysis using clusterprofiler, but I keep getting the same error. It worked a week before and isn't working anymore so I'm assuming R updated the function? Is anyone else facing the same issue? I looked up the error and it's usually for an avg function which doesn't help too much. I am following a tutorial listed on this page: https://learn.gencore.bio.nyu.edu/rna-seq-analysis/gene-set-enrichment-analysis/

organism = "org.At.tair.db"
BiocManager::install(organism, character.only = TRUE)
library(organism, character.only = TRUE)

sorted_dfx <- cluster_df$coef20
names(sorted_dfx) <- cluster_df$cluster_df
sorted_dfx <- sort(sorted_dfx, decreasing = TRUE)


keytypes(org.At.tair.db)
gse <- gseGO(geneList=sorted_dfx, 
             ont ="ALL", 
             keyType = "TAIR", 
             minGSSize = 3, 
             maxGSSize = 800, 
             pvalueCutoff = 0.05, 
             verbose = TRUE, 
             OrgDb = organism,
             by = "fgsea",
             pAdjustMethod = "none")


require(DOSE)
library(enrichplot)
library(DOSE)
edo2dot <- gseDO(sorted_dfx)
#barplot(gse, showCategory=10)
dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)


# THE ERROR
> dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)
Error in unique.default(x, nmax = nmax) : 
  unique() applies only to vectors

### SORTED DFX
> head(sorted_dfx)
AT2G19990 AT1G30760 AT4G15160 AT5G10510 AT3G10870 AT4G00080 
 4.979037  4.579719  4.236575  4.102173  4.017096  3.746760 

###GSE
> gse
#
# Gene Set Enrichment Analysis
#
#...@organism    Arabidopsis thaliana 
#...@setType     GOALL 
#...@keytype     TAIR 
#...@geneList    Named num [1:5772] 4.98 4.58 4.24 4.1 4.02 ...
 - attr(*, "names")= chr [1:5772] "AT2G19990" "AT1G30760" "AT4G15160" "AT5G10510" ...
#...nPerm    1e+05 
#...pvalues adjusted by 'none' with cutoff <0.05 
#...478 enriched terms found
'data.frame':   478 obs. of  12 variables:
 $ ONTOLOGY       : chr  "BP" "BP" "BP" "BP" ...
 $ ID             : chr  "GO:0009416" "GO:0009314" "GO:0097305" "GO:0010035" ...
 $ Description    : chr  "response to light stimulus" "response to radiation" "response to alcohol" "response to inorganic substance" ...
 $ setSize        : int  483 490 439 649 685 712 741 471 411 411 ...
 $ enrichmentScore: num  -0.562 -0.558 -0.559 -0.555 -0.537 ...
 $ NES            : num  -1.55 -1.54 -1.54 -1.54 -1.49 ...
 $ pvalue         : num  1e-05 1e-05 1e-05 1e-05 1e-05 ...
 $ p.adjust       : num  1e-05 1e-05 1e-05 1e-05 1e-05 ...
 $ qvalues        : num  0.000973 0.000973 0.000973 0.000973 0.000973 ...
 $ rank           : num  1450 1450 1627 1598 1627 ...
 $ leading_edge   : chr  "tags=45%, list=25%, signal=37%" "tags=44%, list=25%, signal=36%" "tags=54%, list=28%, signal=42%" "tags=50%, list=28%, signal=41%" ...
 $ core_enrichment: chr  "AT4G19230/AT3G03150/AT2G24100/AT2G18790/AT4G35510/AT2G02450/AT3G50830/AT2G17300/AT5G43260/AT1G25560/AT5G48150/A"| __truncated__ "AT4G19230/AT3G03150/AT2G24100/AT2G18790/AT4G35510/AT2G02450/AT3G50830/AT2G17300/AT5G43260/AT1G25560/AT5G48150/A"| __truncated__ "AT4G34990/AT3G51920/AT3G56880/AT3G19240/AT5G18310/AT5G39570/AT1G08800/AT1G35670/AT5G35735/AT1G64950/AT2G41230/A"| __truncated__ "AT5G39570/AT1G53790/AT2G22560/AT5G07800/AT5G04830/AT5G35735/AT1G64950/AT2G41230/AT1G08050/AT1G56160/AT1G16150/A"| __truncated__ ...
clusterProfiler enrichplot clusterpro doseR dose • 3.6k views
ADD COMMENT
0
Entering edit mode

At which step exactly do you get this error?

What is the output of head(sorted_dfx), and head(gse)?

Please note that your code is very confusing, because it seems you are working with data from Arabidopsis. Yet, your call to gseGO uses the argument OrgDb = organism. How is organism defined?

In the next part of your code you rather use the function gseDO on human data... Are you sure you intend do to this in the same analyses?

Lastly, check the help page of the function gseGO (type ?gseGO), because the use of the argument nPerm is not current anymore. Since functionality of the pakage fgsea is nowadays used for GSEA analyses, you should use the argument eps instead.

ADD REPLY
0
Entering edit mode

I apologize for the lack of clarity! I updated the post to reflect some of the questions you posed. I am trying to create the dotplot for arabidopsis genes and not for human data. I defined the organism at the start. Sorted_dfx is a numeric array with TAIR Ids of some genes ordered according to the logFC values. gseGO seems to be working so something seems to be wrong with the dotplot function. Is the dotplot working for you for sample data mentioned here: https://rdrr.io/bioc/enrichplot/man/dotplot.html

ADD REPLY
0
Entering edit mode

Yes, the dotplot worked for me with the sample data!

Yet, I can also not reproduce your error... though I noticed your object gse was not generated according to the code you posted! Because nPerm slot of gse is not empty...! #...nPerm 1e+05 whereas it should be empty when specifying the argument by = "fgsea" (which is actually the default setting) .

Anyway, below find some code + output of enrichGO and gseGO for Arabidsopsis. You should be able tor run everything yourselves as well, and modify settings according to your dataset and needs.

> # load required libraries
> library(clusterProfiler)
> library(org.At.tair.db)
> 
> # make up some data
> # extract all 27,416 TAIR ids from org.At.tair.db
> genes.all <- keys(org.At.tair.db)
> length(genes.all)
[1] 27416
> 
> # randomly select 1750 genes
> # these 1750 genes represent significantly regulated genes
> genes.selected <- base::sample(genes.all, 1750)
> 
> 
> # to get started, perform Gene Ontology ORA analysis
> # note that no significance cutoffs are applied (i.e Cutoff=1)
> # because I use random ids as input but still woud like to obtain results!
> # for GSEA see below
> results.GOBP.ora <- enrichGO(
+   gene=genes.selected,
+   OrgDb="org.At.tair.db",
+   keyType = "TAIR",
+   ont = "BP",
+   pvalueCutoff = 1,
+   pAdjustMethod = "none",
+   universe=genes.all,
+   qvalueCutoff = 1,
+   minGSSize = 10,
+   maxGSSize = 500,
+   readable = TRUE,
+   pool = FALSE)
> 
> #check
> as.data.frame(results.GOBP.ora)[1:4,]
                   ID
GO:0045292 GO:0045292
GO:1901606 GO:1901606
GO:0048646 GO:0048646
GO:0009065 GO:0009065
                                                        Description
GO:0045292                       mRNA cis splicing, via spliceosome
GO:1901606                       alpha-amino acid catabolic process
GO:0048646 anatomical structure formation involved in morphogenesis
GO:0009065            glutamine family amino acid catabolic process
           GeneRatio   BgRatio      pvalue    p.adjust    qvalue
GO:0045292    7/1662  31/26032 0.002889030 0.002889030 0.9764508
GO:1901606   13/1662  87/26032 0.003420239 0.003420239 0.9764508
GO:0048646   30/1662 278/26032 0.003501675 0.003501675 0.9764508
GO:0009065    5/1662  18/26032 0.004481046 0.004481046 0.9764508
                                                                                                                                                                                          geneID
GO:0045292                                                                                                                                                          NA/AT/AtELF5/NA/PRP4KA/NA/NA
GO:1901606                                                                                                             ALDH5F1/NA/ARGAH1/AtTAT1/PDH2/LIP1/LKR/GAD2/THA2/ISS1/CHY1/AGT2/DELTA-OAT
GO:0048646 TMKL1/AMT1;1/ANQ1/QRT3/PAPS1/AtMYB88/AJH1/ATLOX5/CuAOζ-zeta/AHL16/AtPLAIVB/HEC3/ALE2/CSI1/LACS2/BOP2/NA/ATMPK13/NA/ATHB-2/NA/GAT/AtMYB117/NA/CYP703/MEE48/LRS1/TCP3/ATBRM/CENP-C
GO:0009065                                                                                                                                                    ALDH5F1/ARGAH1/PDH2/GAD2/DELTA-OAT
           Count
GO:0045292     7
GO:1901606    13
GO:0048646    30
GO:0009065     5
> 
> # generate plots
> library(enrichplot)  #needed for function "pairwise_termsim"
> 
> barplot(results.GOBP.ora, showCategory=10)
> dotplot(results.GOBP.ora, showCategory=10)
> cnetplot(results.GOBP.ora, showCategory=10)
> heatplot(results.GOBP.ora, showCategory=10)
> emapplot( pairwise_termsim(results.GOBP.ora), node_scale=1.5,layout="kk" )
> 
>
>
>
> # now GSEA with GO categories
> # for GSEA a per-gene metric is required that is used for ranking.
> # assume dataset consists of the 27.4k TAIR ids.
> # make up fold changes (= ranking metric) for the TAIR ids ranging from -15 to 15.
> sorted_dfx <- runif(n = length(genes.all), min = -15, max = 15)
> names(sorted_dfx) <- genes.all 
> # for GSEA, sort on FC
> sorted_dfx <- sort(sorted_dfx, decreasing = TRUE)
> 
> #check
> head(sorted_dfx)
AT2G38740 AT2G35030 AT1G03060 AT3G21120 AT4G11290 AT5G12420 
 14.99932  14.99798  14.99436  14.99171  14.99072  14.98952 
> 
> # run GSEA with GO categories
> # again, no significant cutoff is used!
> results.GOALL.gsea <- gseGO(geneList=sorted_dfx, 
+   ont ="ALL", 
+   keyType = "TAIR", 
+   minGSSize = 3, 
+   maxGSSize = 800, 
+   pvalueCutoff = 1, 
+   verbose = TRUE, 
+   OrgDb = "org.At.tair.db",
+   by = "fgsea",
+   pAdjustMethod = "none")
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> # convert TAIR ids into Symbols
> results.GOALL.gsea <- setReadable(results.GOALL.gsea, "org.At.tair.db", "TAIR")
> 
> #check
> as.data.frame(results.GOALL.gsea)[1:4,]
           ONTOLOGY         ID
GO:0033962       BP GO:0033962
GO:0015086       MF GO:0015086
GO:0009979       MF GO:0009979
GO:0080102       MF GO:0080102
                                                     Description setSize
GO:0033962                                       P-body assembly      15
GO:0015086        cadmium ion transmembrane transporter activity      13
GO:0009979 16:0 monogalactosyldiacylglycerol desaturase activity       9
GO:0080102 3-methylthiopropyl glucosinolate S-oxygenase activity       3
           enrichmentScore       NES       pvalue     p.adjust   qvalues
GO:0033962       0.6494014  2.104057 0.0002644794 0.0002644794 0.9099991
GO:0015086       0.6615598  2.041589 0.0009006453 0.0009006453 0.9099991
GO:0009979      -0.7234794 -2.006594 0.0014432835 0.0014432835 0.9099991
GO:0080102      -0.9365204 -1.688523 0.0015708730 0.0015708730 0.9099991
           rank                   leading_edge
GO:0033962 4449 tags=53%, list=16%, signal=45%
GO:0015086 3331 tags=46%, list=12%, signal=41%
GO:0009979 7104 tags=78%, list=26%, signal=58%
GO:0080102  682 tags=100%, list=2%, signal=98%
                                           core_enrichment
GO:0033962  BCHA1/NA/LSM3B/AtPAT1H2/PAT1/emb1644/NA/DCP5-L
GO:0015086 ATNRAMP5/ATNRAMP2/ATNRAMP1/ATHMA2/ATHMA4/ATEIN2
GO:0009979                  NA/ADS2/NA/NA/ADS3/ADS1/ADS1.2
GO:0080102                                         FMO/FMO
> 
> 
> #generate plots
> library(ggplot2)  #needed for function "facet_grid"
> # The plot that gave an error with you:
> dotplot(results.GOALL.gsea, showCategory=10, split=".sign", font.size = 8) + facet_grid(.~.sign)
> 
> # other plots also work!
> cnetplot(results.GOALL.gsea, foldChange=sorted_dfx, showCategory=10,
+     cex_label_category = 0.8, cex_label_gene = 0.6)
> heatplot(results.GOALL.gsea, foldChange=sorted_dfx, showCategory=10)
> emapplot( pairwise_termsim(results.GOALL.gsea), node_scale=1.5,layout="kk" )
> 
>
> # Lastly, based on your code snippets you also tried to perform Disease Ontology (DO)
> # enrichment analysis. Since the diseases in DO are all human (and not plant!),
> # this does noty give any results, hence an error is reported! :-)
> # This is a different one than you reported, though...
> library(DOSE)
> edo2dot <- gseDO(sorted_dfx)
preparing geneSet collections...
--> Expected input gene ID: 3856,4316,8419,7422,5445,56717
Error in check_gene_id(geneList, geneSets) : 
  --> No gene can be mapped....
>

As example, the dotplot you tried to generate: enter image description here

Also the cnetplot from the GSEA run: enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 451 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6