org.Hs.eg.db output has more lines than the input- I need to combine the input and output
1
0
Entering edit mode
dp0618 • 0
@dp0618-8713
Last seen 7.9 years ago

I'd like to combine data2 and the output from org.Hs.eg.db. But I realized org.Hs.eg.db added 3 extra lines. Do you have any suggestions to fix it?

head(data2)

   genes shp400_FC     T58A_FC      omo_FC
1   AVIL  2.730086 -1.10721312 -1.21584380
2   BAI2  3.104302 -2.17959085  0.16769160
3    CA9 -3.208643  0.03214854 -1.05244810
5   CNN3  2.121578 -0.36076018  0.53284659
6  CPNE6  4.477493 -0.39318830 -1.10612350
7 DNAH17  4.196555 -0.43432671  0.02131942

genes2 <- as.character(data2$genes)

entrez <- select(org.Hs.eg.db, keys = genes2, columns=c("ENTREZID"), 
                 keytype="SYMBOL")

head(entrez)

  SYMBOL ENTREZID
1   AVIL    10677
2   BAI2     <NA>
3    CA9      768
4   CNN3     1266
5  CPNE6     9362
6 DNAH17     8632

data2 <- cbind(data2, entrez)

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 13128, 13131
org.Hs.eg.db annotation bioconductor • 1.2k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 6 hours ago
The city by the bay

The extra rows are probably due to duplicate mappings between SYMBOL and ENTREZID, i.e., some gene symbols are used by multiple Entrez IDs. I'm not aware of any way to coerce a 1:1 mapping from select, though I think the development version will at least tell you if there's 1-to-many mappings. Anyway, for your current problem, you can resolve this by picking the first Entrez ID for each gene symbol:

pick.first <- entrez[match(genes2, entrez$SYMBOL),]
cbind(data2, pick.first)

Alternatively, you can subset on !duplicated(entrez$SYMBOL), it should give the same results.

ADD COMMENT
1
Entering edit mode

mapIds() implements this and other strategies.

ADD REPLY
0
Entering edit mode

Thanks! It works now

ADD REPLY

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6