Hello. I am doing ensemble gene set testing using EGSEA (Ensemble of Gene Set Enrichment Analyses) from the edgeR output. I have a list of Ensembl gene IDs from edgeR (dge variable), that I found that after voom transformation, I need to build an index for each gene set collection using the EGSEA indexing functions (buildCustomIdx) which relies on Entrez gene IDs only. So, I tried as follows, but I faced with the below Error which shows there are many ensemble IDs (around 50% of my genes) that do not have corresponding entrez IDs:
> v <- voom(dge, design, plot=FALSE) > library(EGSEA) > library(EGSEAdata) > egsea.data("human") > info = egsea.data("human", returnInfo = TRUE) > gsets = list(info$msigdb$info$collections[c(3, 6)]) > gs.annots <- buildCustomIdx(geneIDs=v$genes$gene.id, gsets=gsets, species = "human")
Error in data.frame(ID = paste0(label, seq(1, length(gsets.idx))), GeneSet = gsets.names) : arguments imply differing number of rows: 2, 0
The class of v$genes$gene.id is "character" and the class of gsets = list(info$msigdb$info$collections[c(3, 6)]) is "list".
I know that different databases have different gene notions and it is expected not to have annotations for all genes. I also tried to convert my ensemble gene IDs to entrez gene IDs using biomaRt library as below, but I faced with the similar Error:
> library(biomaRt) > v$genes <- gsub('\\..+$', '', v$genes$gene.id) > ensembl.genes <- v$genes > mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl")) > genes <- getBM(filters = "ensembl_gene_id", attributes = c("ensembl_gene_id","entrezgene_id"), values = ensembl.genes, > library(EGSEA) > entrez_id <- data.frame(genes$entrezgene_id) > gs.annots = buildIdx(entrezIDs= entrez_id, species="human", msigdb.gsets=c("c2", "c5"), go.part = TRUE)
Error in data.frame(ID = gsets.ids, GeneSet = gsets.names, NumGenes = paste0(sapply(gsets, : arguments imply differing number of rows: 0, 1 mart = mart)
I searched a lot, but I could not find how to fix this Error. I would highly appreciate if you could help me how to build an index from the list of my Ensembl gene IDs in order to perform EGSEA. Many thanks.