"No genes can be mapped...." using enrichGO in clusterProfiler
1
0
Entering edit mode
abby-s • 0
@7f69115f
Last seen 3.6 years ago
United States

I am attempting to use enrichGO in clusterProfiler to identify gene ontologies for three gene clusters (diff_genes) relative to the full set of 7000 genes (all_genes). I am getting a warning 'no genes can be mapped' that appears to relate to the format of the locus linked IDs. However, I have removed any IDs that are of a different format. I have found limited help on this topic so far at the following:

https://www.biostars.org/p/305258/ and https://github.com/YuLab-SMU/ReactomePA/issues/24

As such, I have tried raising the p and q value cutoffs just to see if I could get anything, and do not. My locus link IDs do call up genes in the Arabidopsis TAIR database (www.arabidopsis.org), some of which are uncharacterized proteins, so it's possible my genes are simply not in org.At.tair.db, but this seems surprising for about 300 genes total across the three clusters.

It seems as if I cannot attached a file, so I've copied in LocusLinkIDs-lightcyan.txt, one of the gene clusters. all_genes contains 7000 locus-linked IDs in the same format.

Any insight would be greatly appreciated!

require(org.At.tair.db)
require(clusterProfiler)

genes_list = list.files(pattern="LocusLinkIDs-*")
genes_list

all_genes <- read.csv(genes_list[1],header = TRUE)

#Removed genes not having the TAIR format


for (m in c(2:4)) {

diff_genes = read.csv(genes_list[m],header = TRUE)   

GO_analysis <- enrichGO(gene = diff_genes, 
                            universe = all_genes, 
                            OrgDb = org.At.tair.db,  # contains the TAIR/Ensembl id to GO correspondence for A. thaliana
                            keyType = "TAIR",
                            ont = "ALL",              # either "BP", "CC" or "MF",
                            pAdjustMethod = "BH",
                            pvalueCutoff = 0.05, 
                            qvalueCutoff = 0.05,
                            readable = TRUE, 
                            pool = TRUE)

GO_analysis

}

# Results
#--> No gene can be mapped....
#--> Expected input gene ID: AT5G44780,AT4G36600,AT3G57860,AT4G02020,AT5G35750,AT5G51170
#--> return NULL...
AT1G06140
AT1G06143
AT1G06150
AT1G06680
AT1G06740
AT1G08820
AT1G10000
AT1G10000
AT1G12470
AT1G15270
AT1G15890
AT1G15950
AT1G16330
AT1G20710
AT1G21110
AT1G21280
AT1G21500
AT1G22330
AT1G22340
AT1G22460
AT1G22650
AT1G22700
AT1G23060
AT1G27260
AT1G28695
AT1G29025
AT1G31150
AT1G31150
AT1G32550
AT1G32570
AT1G32700
AT1G33760
AT1G37020
AT1G40104
AT1G40104
AT1G40129
AT1G40390
AT1G40390
AT1G42190
AT1G42190
AT1G43760
AT1G43760
AT1G47056
AT1G47657
AT1G50440
AT1G50470
AT1G50750
AT1G50830
AT1G52300
AT1G52830
AT1G53300
AT1G53440
AT1G53470
AT1G55000
AT1G55150
AT1G55310
AT1G55330
AT1G55370
AT1G55380
AT1G56010
AT1G60720
AT1G63150
AT1G63280
AT1G63290
AT1G64480
AT1G65850
AT1G67110
AT1G70110
AT1G70230
AT1G70920
AT1G71470
AT1G71490
AT1G71520
AT1G71870
AT1G72860
AT1G75260
AT1G76480
AT2G07981
AT2G08986
AT2G08986
AT2G08986
AT2G08986
AT2G08986
AT2G10615
AT2G14000
AT2G14395
AT2G14765
AT2G20530
AT2G20550
AT2G20580
AT2G20585
AT2G20590
AT2G20850
AT2G20860
AT2G20870
AT2G21930
AT2G22950
AT2G25360
AT2G26060
AT2G26350
AT2G31305
AT2G33200
AT2G33470
AT2G41560
AT2G42390
AT2G43220
AT2G43220
AT2G43330
AT2G43370
AT2G46550
AT3G02810
AT3G03260
AT3G03272
AT3G03670
AT3G06280
AT3G06960
AT3G07450
AT3G08900
AT3G10990
AT3G11000
AT3G14970
AT3G16510
AT3G17410
AT3G24255
AT3G24255
AT3G26040
AT3G26500
AT3G26600
AT3G26720
AT3G28950
AT3G31910
AT3G42050
AT3G44300
AT3G44480
AT3G52105
AT3G52290
AT3G52310
AT3G52320
AT3G52440
AT3G52480
AT3G52530
AT3G52540
AT3G61400
AT4G00020
AT4G00020
AT4G00750
AT4G01360
AT4G01940
AT4G03740
AT4G04650
AT4G04690
AT4G08840
AT4G10200
AT4G11510
AT4G13050
AT4G13130
AT4G15690
AT4G15950
AT4G16410
AT4G17860
AT4G18395
AT4G20350
AT4G21110
AT4G22140
AT4G22190
AT4G22212
AT4G23260
AT4G24220
AT4G24972
AT4G25380
AT4G25910
AT4G26440
AT4G33310
AT4G33870
AT4G34170
AT4G35500
AT4G35910
AT4G38150
AT5G01470
AT5G06150
AT5G08630
AT5G10695
AT5G15750
AT5G16050
AT5G16060
AT5G16190
AT5G16230
AT5G16240
AT5G19010
AT5G19260
AT5G19470
AT5G19485
AT5G22794
AT5G28400
AT5G35090
AT5G38840
AT5G38900
AT5G39640
AT5G39770
AT5G39775
AT5G39775
AT5G43080
AT5G44620
AT5G44750
AT5G46210
AT5G50770
AT5G50800
AT5G52700
AT5G54820
AT5G55630
AT5G58660
AT5G59800
AT5G60290
R TAIR clusterProfiler enrichGO Arabidopsis • 11k views
ADD COMMENT
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 7 days ago
Wageningen University, Wageningen, the …

It must be something at your end, because for me it is working. I think it is either due to a wrong input, or because your are 'looping'.

Please note / check that both all_genes and diff_genes are (just) character vectors.

Using the 213 IDs you pasted above as universe, and the first 75 as being differentially expressed: Note the ('unrestricted') values of the arguments that I used: pAdjustMethod = "none", pvalueCutoff = 1 and qvalueCutoff = 1. I used these to make sure I would get results!

library(org.At.tair.db)
library(clusterProfiler)

diff_genes <- all_genes[1:75]

class(all_genes)
#[1] "character"
class(diff_genes)
#[1] "character"
length(all_genes)
#[1] 213
length(diff_genes)
#[1] 75


head(all_genes)
#[1] "AT1G06140" "AT1G06143" "AT1G06150" "AT1G06680" "AT1G06740" "AT1G08820"
head(diff_genes)
#[1] "AT1G06140" "AT1G06143" "AT1G06150" "AT1G06680" "AT1G06740" "AT1G08820" 
GO_analysis <- enrichGO(gene = diff_genes, 
                            universe = all_genes, 
                            OrgDb = org.At.tair.db,  # contains the TAIR/Ensembl id to GO correspondence for A. thaliana
                            keyType = "TAIR",
                            ont = "ALL",              # either "BP", "CC" or "MF",
                            pAdjustMethod = "none",
                            pvalueCutoff = 1, 
                            qvalueCutoff = 1,
                            readable = TRUE, 
                            pool = TRUE)


 head(GO_analysis)
           ONTOLOGY         ID                                     Description
GO:0140110       MF GO:0140110                transcription regulator activity
GO:0003700       MF GO:0003700       DNA-binding transcription factor activity
GO:0019438       BP GO:0019438          aromatic compound biosynthetic process
GO:0005634       CC GO:0005634                                         nucleus
GO:0044271       BP GO:0044271 cellular nitrogen compound biosynthetic process
GO:0098772       MF GO:0098772                    molecular function regulator
           GeneRatio BgRatio      pvalue    p.adjust    qvalue
GO:0140110      9/66  12/192 0.003754295 0.003754295 0.4880583
GO:0003700      8/66  11/192 0.008948761 0.008948761 0.5765610
GO:0019438     11/66  20/192 0.038110509 0.038110509 0.5765610
GO:0005634     34/66  82/192 0.051575960 0.051575960 0.5765610
GO:0044271     11/66  21/192 0.057470018 0.057470018 0.5765610
GO:0098772     10/66  19/192 0.067841184 0.067841184 0.5765610
                                                                                                                                                                    geneID
GO:0140110                                                                                                               EMB1444/WOX10/NA/NA/ERF022/IAA6/anac021/ATHB18/NA
GO:0003700                                                                                                                  EMB1444/WOX10/NA/ERF022/IAA6/anac021/ATHB18/NA
GO:0019438                                                                                                EMB1444/ATCCR1/WOX10/IGMT3/NA/NA/ERF022/IAA6/anac021/CYP735A2/NA
GO:0005634 EMB1444/NA/CYCB3;1/WOX10/IGMT3/NA/NA/NA/A/N-InvD/PYG7/MDP40/NA/NA/NA/NA/ERF022/NA/NA/NA/NA/NA/NA/IAA6/TTL1/MSL4/AtRH20/At-SCL33/NA/anac021/NA/AXY4/ATHB18/NA/NA
GO:0044271                                                                                                       EMB1444/NA/WOX10/NA/NA/ERF022/NA/IAA6/anac021/CYP735A2/NA
GO:0098772                                                                                                       EMB1444/CYCB3;1/WOX10/NA/NA/ERF022/IAA6/anac021/ATHB18/NA
           Count
GO:0140110     9
GO:0003700     8
GO:0019438    11
GO:0005634    34
GO:0044271    11
GO:0098772    10
> 
ADD COMMENT

Login before adding your answer.

Traffic: 330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6