Question

Annotating RNAseq with GO (gene ontology) returns #NA for (nearly) all Entrez ids.

0

Entering edit mode

michael.willcockson • 0

@michaelwillcockson-8375

Last seen 8.8 years ago

United States

Hi all,

I've been following the RNAseqGene vignette and am having trouble adding annotations for Gene Ontology using the org.Mm.eg.db. Adding gene symbols and Entrez id's worked fine but nearly all GO id's return as NA whether I used the ensembl ID's or entrez ID's as the database key.

Thanks for the help.

Michael

convertIDs <- function( ids, from, to, db, ifMultiple=c("putNA", "useFirst")) {
stopifnot( inherits( db, "AnnotationDb" ) )
ifMultiple <- match.arg( ifMultiple )
suppressWarnings( selRes <- AnnotationDbi::select(
db, keys=ids, keytype=from, columns=c(from,to) ) )
if ( ifMultiple == "putNA" ) {
duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ]
selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ]
}
return( selRes[ match( ids, selRes[,1] ), 2 ] )
}

# These first two worked
res_bfue_cfue$hgnc_symbol <- convertIDs(row.names(res_bfue_cfue), "ENSEMBLTRANS", "SYMBOL", org.Mm.eg.db)
res_bfue_cfue$entrezgene <- convertIDs(row.names(res_bfue_cfue), "ENSEMBLTRANS", "ENTREZID", org.Mm.eg.db)

# Both of these failed
res_bfue_cfue$GO <- convertIDs(row.names(res_bfue_cfue), "ENSEMBLTRANS", "ENTREZID", org.Mm.eg.db)
res_bfue_cfue$GO <- convertIDs(res_bfue_cfue$entrezgene, "ENTREZID", "GO", org.Mm.egGO)

> head(res_bfue_cfue$entrezgene)
[1] "14679" "54192" "15417" "12544" "16002" "11818"
> head(res_bfue_cfue$GO)
[1] NA NA NA NA NA NA

org.mm.eg.db annotation rnaseqgene • 1.8k views

ADD COMMENT • link updated 8.8 years ago by James W. MacDonald 65k • written 8.8 years ago by michael.willcockson • 0

score 3 · Accepted Answer · 2015-07-13

You have made a mistake, and have a misunderstanding. Here is the code I am talking about:

# Both of these failed
res_bfue_cfue$GO <- convertIDs(row.names(res_bfue_cfue), "ENSEMBLTRANS", "ENTREZID", org.Mm.eg.db)
res_bfue_cfue$GO <- convertIDs(res_bfue_cfue$entrezgene, "ENTREZID", "GO", org.Mm.egGO)

The first row above is just doing the same thing as

res_bfue_cfue$entrezgene <- convertIDs(row.names(res_bfue_cfue), "ENSEMBLTRANS", "ENTREZID", org.Mm.eg.db)

So if it worked once, it will work again. That is sort of a mistake, I suppose, but it isn't the one I am talking about. The second line has one error, and one misunderstanding on your part. First the error. In all other calls to convertIDs(), the last argument you passed in was org.Mm.eg.db, but in the second row you have substituted org.Mm.egGO, which is a completely different thing. In fact that won't even work:

> convertIDs(c("14679", "54192", "15417", "12544", "16002", "11818"), "ENTREZID", "GOID", org.Mm.egGO)
Error: inherits(db, "AnnotationDb") is not TRUE

So the code you have supplied isn't what you originally ran, because it would have returned an error. Your misunderstanding has to do with the argument 'ifMultiple'. The default is 'putNA', which I can translate to "If a given input ID has more than one output ID appended to it, then just return NA". So you get exactly what you should expect (except for the fact that this is supposed to be a convenience function to do stuff, that comes without an instruction manual, so maybe you shouldn't expect it, but I do).

So if you run it correctly, you get this:

> convertIDs(c("14679", "54192", "15417", "12544", "16002", "11818"), "ENTREZID", "GO", org.Mm.eg.db)
[1] NA NA NA NA NA NA

Which is as you should expect, because you are implicitly using 'putNA' for 'ifMultiple'. You could instead do this

> convertIDs(c("14679", "54192", "15417", "12544", "16002", "11818"), "ENTREZID", "GO", org.Mm.eg.db, "useFirst")
[1] "GO:0000166" "GO:0005215" "GO:0003677" "GO:0000727" "GO:0001503"
[6] "GO:0001937"

But that is still not quite right, IMO, because you are just choosing the first of all available mappings. The 'real' way to do this sort of thing is using select(), which is what the authors of that workflow were trying to avoid, as explaining how to annotate is beyond the scope of the workflow.

> z <- select(org.Mm.eg.db, c("14679", "54192", "15417", "12544", "16002", "11818"), "GOID","ENTREZID")

> head(z)
  ENTREZID EVIDENCE ONTOLOGY       GOID
1    14679      IEA       MF GO:0000166
2    14679      IBA       MF GO:0003924
3    14679      TAS       MF GO:0003924
4    14679      IBA       MF GO:0004871
5    14679      IPI       MF GO:0005515
6    14679      IEA       MF GO:0005525
> nrow(z)
[1] 142
> table(z$ENTREZID)
11818 12544 14679 15417 16002 54192
   31    19    33    16    39     4

But this is sort of inconvenient for annotating a matrix of data, where you have one row per gene, and you have upwards of 39 GOIDs per gene, which is why convertIDs() only gives you the choice of 'putNA' and 'useFirst'.

Long story short, if you want to annotate data in more than a rudimentary way, you will have to learn how to use the AnnotationDbi package, which has its own workflow.