toTable(org.Dr.egGO) into a data.frame with unique gene_id and all ensembl_ids in one column
1
0
Entering edit mode
@mehmet-ilyas-cosacak-9020
Last seen 6.2 years ago
Germany/Dresden/ CRTD - DZNE

Hi,

I am trying to generate a data frame as below from toTable(org.Dr.egENSEMBL2EG).

as an example convert the following rows into a single row:

         gene_id         ensembl_id

16939 100000783 ENSDARG00000093071
16940 100000783 ENSDARG00000103015
16941 100000783 ENSDARG00000086233
16942 100000783 ENSDARG00000099123
16943 100000783 ENSDARG00000086304
16944 100000783 ENSDARG00000086591
16945 100000783 ENSDARG00000051736

as below:

           gene_id  ensembl_id

1       100000783 "ENSDARG00000093071,ENSDARG00000103015,ENSDARG00000086233,ENSDARG00000099123,ENSDARG00000086304,ENSDARG00000086591,ENSDARG00000051736"

 

my code is as below but it takes long time to generate the data.frame that I want to generate.

library(org.Dr.eg.db)
nDf <- toTable(org.Dr.egENSEMBL2EG)
d <- duplicated(nDf[,1])
nDb <- nDf[!d,]
tmp1 <- nDf[d,]
for(i in 1:length(nDb[,1])){
    idxs <- which(tmp1[,1] == nDb[i,1])
    nDb[i,2] <- paste(nDb[i,2], paste(tmp1[c(idxs),2], collapse = ","), sep = ",")
}

best,

ilyas.
 

org.Dr.egGO org.Dr.eg.db ToTable • 984 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

You should be using mapIds for this sort of thing. There are any number of ways you could do what you want, and probably better ways to present the data than a comma separated vector.

> z <- mapIds(org.Dr.eg.db, keys(org.Dr.eg.db), "ENSEMBL", "ENTREZID",multiVals="CharacterList")
> zz <- DataFrame(ENTREZID = names(z), ENSEMBL = z)
> zz
DataFrame with 37241 rows and 2 columns
             ENTREZID            ENSEMBL
          <character>    <CharacterList>
30037           30037 ENSDARG00000021948
30038           30038 ENSDARG00000010770
30065           30065 ENSDARG00000101744
30066           30066 ENSDARG00000077840
30067           30067 ENSDARG00000019588
...               ...                ...
105751184   105751184                 NA
105751185   105751185                 NA
106023290   106023290                 NA
106144553   106144553                 NA
106144554   106144554                 NA

> zzz <- data.frame(ENTREZID = names(z), ENSEMBL = sapply(z, paste, collapse = ", "))
> head(zzz)
      ENTREZID            ENSEMBL
30037    30037 ENSDARG00000021948
30038    30038 ENSDARG00000010770
30065    30065 ENSDARG00000101744
30066    30066 ENSDARG00000077840
30067    30067 ENSDARG00000019588
30068    30068 ENSDARG00000104702
> head(zzz[sapply(z, length) > 1,])
      ENTREZID                                                    ENSEMBL
30163    30163                     ENSDARG00000079402, ENSDARG00000045011
30217    30217                     ENSDARG00000097238, ENSDARG00000089087
30478    30478                     ENSDARG00000009702, ENSDARG00000101628
30491    30491                     ENSDARG00000087359, ENSDARG00000052207
30593    30593                     ENSDARG00000086522, ENSDARG00000090237
30597    30597 ENSDARG00000089475, ENSDARG00000089124, ENSDARG00000088330
ADD COMMENT
0
Entering edit mode

Thank you very much James! Sometimes I need a data.frame or an input that as above, e.g., for topGO, an input file with ensembl_id in first column and all go_id s in the second column. That is one of the aim that I am trying to learn a quicker way to generate the data.frame that has multiple mappings on another column.

best,

ilyas.

ADD REPLY

Login before adding your answer.

Traffic: 1037 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6