Question

Non-model organism , CluserProfiler message "No gene can be mapped"

0

Entering edit mode

yuhang • 0

@1b2a94a4

Last seen 22 months ago

Hong Kong

Hi, I am using clusterProfiler to do an enrichment analysis. However, I get the warning ' No gene can be mapped.... --> Expected input gene ID: ND6,ND4,COX3,ND1,ND5,ATP6 --> return NULL...'. I am studying a non-model organism, the 'Vulpes lagopus'. My script is as follows,

mycounts<-read.csv("-B5_Counts-Arctic-fat.csv",row.names = 1)

head(mycounts)
dim(mycounts)
library(dplyr)
library(DESeq2)
mycounts <- mycounts %>%mutate_if(is.factor, ~as.numeric(as.character(.)))
mymeta<-read.csv("-B5_mymeta.csv",stringsAsFactors = T)
mycountsGroups <- mymeta$Group
colData <- data.frame(row.names=colnames(mycounts),
                      group_list=mycountsGroups)
dds <- DESeqDataSetFromMatrix(countData = mycounts,
                              colData = colData,
                              design = ~group_list,
                              tidy = F)
keep <- rowSums(counts(dds)>10) >4
dds.keep <- dds[keep, ]
dim(dds.keep)
degobj <- DESeq(dds.keep);degobj
countdds <- counts(degobj, normalized = T) 
deg.res1 <- results(degobj, contrast=c("group_list","HFB","LFB"), lfcThreshold = 0,
                    pAdjustMethod = 'fdr')
dim(deg.res1)
apply(deg.res1,2, function(x) sum(is.na(x)))
deg.res1$padj[is.na(deg.res1$padj)] <- 1
table(is.na(deg.res1$padj))
summary(deg.res1) 
library(dplyr)
res_1<-data.frame(deg.res1)
res_1 %>%
  mutate(group = case_when(log2FoldChange >0.5 & padj <= 0.05 ~ "UP",log2FoldChange <-0.5 & padj <= 0.05 ~ "DOWN",TRUE ~ "NOT_CHANGE")) -> res_2
table(res_2$group)
upsig <- res_2[grep(pattern = "UP",res_2[,7]),]
downsig <- res_2[grep(pattern = "DOWN",res_2[,7]),]
deg <- rbind(upsig,downsig)
# non-model, but have the genome
BiocManager::install("AnnotationHub")
BiocManager::install("biomaRt")
# load package
library(AnnotationHub)
library(biomaRt)
# make a orgDb
hub <- AnnotationHub::AnnotationHub()
# 
query(hub, "Vulpes lagopus")
Vulpes.OrgDb <- hub[["AH96456"]]
columns(Vulpes.OrgDb)
head(keys(Vulpes.OrgDb,keytype = "ACCNUM"))
head(keys(Vulpes.OrgDb,keytype = "GOALL"))
library(clusterProfiler)
df2 <- bitr(row.names(upsig), fromType = "SYMBOL",
            toType = c("ENTREZID"),
            OrgDb = Vulpes.OrgDb) 
genelist <- df2$ENTREZID
enrich.go.BP = enrichGO(gene       =genelist,
                        OrgDb = Vulpes.OrgDb,
                        keyType      = 'ENTREZID',ont= "CC",
                        pvalueCutoff = 0.05,
                        pAdjustMethod = "BH",
                        qvalueCutoff = 0.05,
                        )

However, I always got the warning as mentioned above.

Looking forward to some help! Thank you!

organism non-model • 2.2k views

ADD COMMENT • link updated 3.3 years ago by James W. MacDonald 68k • written 3.3 years ago by yuhang • 0

0

Entering edit mode

Please format your post so that the code / output you provide is human readable. Use the buttons blockquote and/or Code sample for this.

ADD REPLY • link 3.3 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

mycounts<-read.csv("-B5_Counts-Arctic-fat.csv",row.names = 1)
head(mycounts)
dim(mycounts)
library(dplyr) 
library(DESeq2) 
mycounts <- mycounts %>%mutate_if(is.factor, ~as.numeric(as.character(.)))
mymeta<-read.csv("-B5_mymeta.csv",stringsAsFactors = T)
mycountsGroups <- mymeta$Group
colData <- data.frame(row.names=colnames(mycounts), group_list=mycountsGroups) 
dds <- DESeqDataSetFromMatrix(countData = mycounts, colData = colData, design = ~group_list, tidy = F) 
keep <- rowSums(counts(dds)>10) >4
dds.keep <- dds[keep, ]
 dim(dds.keep)
degobj <- DESeq(dds.keep);degobj 
countdds <- counts(degobj, normalized = T) 
deg.res1 <- results(degobj, contrast=c("group_list","HFB","LFB"), lfcThreshold = 0, pAdjustMethod = 'fdr') 
dim(deg.res1)
 apply(deg.res1,2, function(x) sum(is.na(x))) 
deg.res1$padj[is.na(deg.res1$padj)] <- 1 
table(is.na(deg.res1$padj)) 
summary(deg.res1) 
library(dplyr)
 res_1<-data.frame(deg.res1)
 res_1 %>% mutate(group = case_when(log2FoldChange >0.5 & padj <= 0.05 ~ "UP",log2FoldChange <-0.5 & padj <= 0.05 ~ "DOWN",TRUE ~ "NOT_CHANGE")) -> res_2 
table(res_2$group) 
upsig <- res_2[grep(pattern = "UP",res_2[,7]),] 
downsig <- res_2[grep(pattern = "DOWN",res_2[,7]),] 
deg <- rbind(upsig,downsig)
BiocManager::install("AnnotationHub")
BiocManager::install("biomaRt")
# load package
library(AnnotationHub)
library(biomaRt)
# make a orgDb
hub <- AnnotationHub::AnnotationHub()
query(hub, "Vulpes lagopus")
Vulpes.OrgDb <- hub[["AH96456"]]
columns(Vulpes.OrgDb)
library(clusterProfiler)
df2 <- bitr(row.names(upsig), fromType = "SYMBOL",
            toType = c("ENTREZID"),
            OrgDb = Vulpes.OrgDb) 
genelist <- df2$ENTREZID
enrich.go.BP = enrichGO(gene       =genelist,
                        OrgDb = Vulpes.OrgDb,
                        keyType      = 'ENTREZID',ont= "BP",
                        pvalueCutoff = 0.05,
                        pAdjustMethod = "BH",
                        qvalueCutoff = 0.05,
                        )


--> No gene can be mapped....
--> Expected input gene ID: 23629751,23629755,23629750,23629748,23629754,23629750
--> return NULL...

ADD REPLY • link updated 3.3 years ago by James W. MacDonald 68k • written 3.3 years ago by yuhang • 0

0

Entering edit mode

Please show an example of your genelist. It would also be helpful if you could share row.names(upsig)[1:5] so that we could reproduce your issue.

ADD REPLY • link 3.3 years ago Basti ▴ 780

0

Entering edit mode

> row.names(upsig)[1:5]

[1] "ZMYND15" "ZBTB32" "WNT4" "WDFY4" "VMP1"

ADD REPLY • link 3.3 years ago yuhang • 0

0

Entering edit mode

Thanks for your advice, I have rewrote the code.

ADD REPLY • link 3.3 years ago yuhang • 0

0

Entering edit mode

Great, thanks!

You posted a lot of code... To start troubleshooting: what are the outputs from:

row.names(upsig)[1:5]
head(df2)
head(deg)
head(genelist)
class(genelist)

ADD REPLY • link 3.3 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thanks！

 > row.names(upsig)[1:5]
    [1] "ZMYND15" "ZBTB32"  "WNT4"    "WDFY4"   "VMP1" 
    > head(df2)
       SYMBOL  ENTREZID
    1 ZMYND15 121500402
    2  ZBTB32 121484911
    3    WNT4 121498205
    4   WDFY4 121484518
    5    VMP1 121473000
    6    VCAN 121489248
    > head(deg)
              baseMean log2FoldChange     lfcSE     stat       pvalue
    ZMYND15   570.1401      0.8550006 0.2218661 3.853679 0.0001163560
    ZBTB32    229.8653      1.7846968 0.5289862 3.373806 0.0007413652
    WNT4      130.0891      1.3260271 0.3984658 3.327832 0.0008752472
    WDFY4    1550.4838      0.8511954 0.2691608 3.162405 0.0015647180
    VMP1     5758.9703      0.3475775 0.1014625 3.425674 0.0006132768
    VCAN    11576.0138      1.4005786 0.3950761 3.545085 0.0003924862
                   padj group
    ZMYND15 0.006223121    UP
    ZBTB32  0.023868556    UP
    WNT4    0.026927607    UP
    WDFY4   0.040697786    UP
    VMP1    0.020593861    UP
    VCAN    0.014342618    UP
    > head(genelist)
    [1] "121500402" "121484911" "121498205" "121484518" "121473000" "121489248"
    > class(genelist)
    [1] "character"

ADD REPLY • link 3.3 years ago yuhang • 0

score 1 · Answer 1 · 2022-09-28

While the output to your query to enrichGO has 'bp' in the name, you are actually querying on 'CC'. And that's a pretty sparse thing to query on. Also, you are using an old version of R/Bioc, so you should upgrade

> library(AnnotationHub)
.> hub <- AnnotationHub()
snapshotDate(): 2022-04-25
> Vulpes.OrgDb <- hub[["AH96456"]]
Error: AH96456 is an OrgDb resource.
  orgDb resources are generated for specific biocversions.
  Requested resource works with biocversion: 3.14
  To find a resource appropriate for your biocversion try the following query:
      query(ah,'org.Vulpes_lagopus.eg.sqlite')

> Vulpes.OrgDb <- hub[["AH101318"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache

## Now for some inside baseball...
## we first get a connection to the underlying SQLite DB

> library(DBI)
> con <- dbconn(Vulpes.OrgDb)

## Now do a query to get all the GO CC terms (and ancestors) for each gene

> z <- dbGetQuery(con, "select GID, GO from go_cc_all inner join genes using('_id');")
> head(z)
       GID         GO
1 23629747 GO:0016021
2 23629747 GO:0005743
3 23629747 GO:0070469
4 23629747 GO:0005575
5 23629747 GO:0016020
6 23629747 GO:0031224

## How large is this data.frame?
> dim(z)
[1] 283   2

## Ooof. How many unique Gene IDs?
> length(unique(z[,1]))
[1] 13

## double Ooof.
> unique(z[,1])
 [1] "23629747" "23629748" "23629749" "23629750" "23629751" "23629752"
 [7] "23629753" "23629754" "23629755" "23629756" "23629757" "23629758"
[13] "23629759"

## what about BP?
> z <- dbGetQuery(con, "select GID, GO from go_bp_all inner join genes using('_id');")
> dim(z)
[1] 197   2
> length(unique(z[,1]))
[1] 9

Long story short, the annotation package has vanishingly small numbers of genes that are annotated to any GO terms, so you are not going to be able to get reasonable results. You might be able to use blast2go to get a better mapping, but IIRC, that costs money unless there's a free trial period. And I don't use clusterProfiler, so I have no idea if you can feed in a data.frame of ID -> GO mappings. But you could use kegga from the limma package if you had such a thing.