"adding gene annotations" in edgeR doesn't work with select()
Entering edit mode
mjavad2012 • 0
Last seen 11 months ago

I'm trying to add gene annotation to my DGEList object in edgeR package but it giving me error as following:

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

Here is the code I'm using:

genes <- select(org.Hs.eg.db, keys=row.names(x), columns=c("SYMBOL", "TXCHROM"), keytype="GENEID")

I think the problem is that I don't know which Human db can cover my geneIDs, I've tried several different dbs but none of the could give me gene symbols. Here is the row names (geneIDs) which I honestly do not even know what keytype they are:

> head(geneid)
[1] "ENSCGRG00001004858" "ENSCGRG00001007305" "ENSCGRG00001010983" "ENSCGRG00001012374" "ENSCGRG00001013169" "ENSCGRG00001013944"

Any help would be appreciated, MJ

edgeR DEG • 373 views
Entering edit mode
Last seen 8 hours ago
United States

It's not clear if you know this or not, but those are Ensembl gene IDs for Chinese hamster. The annotation package you are trying to use is for Humans, which you are aware of, and the fact that there is an 'eg' in the name is intended to indicate that the central key for the underlying database is Entrez Gene, which is what NCBI used to call their Gene IDs.

So you are trying to use an NCBI-based annotation of human genes to annotate Chinese hamster data for which you have Ensembl IDs, which obviously isn't going to work. If you have Ensembl IDs, you should do all your annotations within the EBI/EBML world, rather than trying to do NCBI -> Ensembl mappings. That means you use biomaRt. The trick here is that you don't know a priori what arguments you need to use for various functions, so you need to figure out what you need.

## load biomaRt and get a 'base' mart. I use the useast mirror because I'm in the Eastern part of the US
> library(biomaRt)
> mart <- useEnsembl("ensembl", mirror = "useast")
## get a listing of the available data sets
> z <- listDatasets(mart)

## what are the mart names for Chinese hamster?
> z[grep("Chinese hamster", z[,2]),]
                    dataset                                  description
32 cgchok1gshd_gene_ensembl Chinese hamster CHOK1GS genes (CHOK1GS_HDv1)
33    cgcrigri_gene_ensembl    Chinese hamster CriGri genes (CriGri_1.0)
35      cgpicr_gene_ensembl     Chinese hamster PICR genes (CriGri-PICR)
33   CriGri_1.0
35  CriGri-PICR

## I know from doing a search at ensembl.org that these are CHOK1GS gene IDs, so we use that
> mart <- useEnsembl("ensembl", "cgchok1gshd_gene_ensembl", mirror = "useast")
> ids <- c("ENSCGRG00001004858", "ENSCGRG00001007305", "ENSCGRG00001010983", "ENSCGRG00001012374", "ENSCGRG00001013169", "ENSCGRG00001013944")

## now we need to know what to look for.

> z <- listAttributes(mart)
> head(z)
                           name                  description         page
1               ensembl_gene_id               Gene stable ID feature_page
2       ensembl_gene_id_version       Gene stable ID version feature_page
3         ensembl_transcript_id         Transcript stable ID feature_page
4 ensembl_transcript_id_version Transcript stable ID version feature_page
5            ensembl_peptide_id            Protein stable ID feature_page
6    ensembl_peptide_id_version    Protein stable ID version feature_page

## I know that ensembl_gene_id is what those IDs are, because I've done this before. 
## We want to get the gene symbols though
> z[grep("symbol", z$description, ignore.case = TRUE),]
                name                description         page
45        mgi_symbol                 MGI symbol feature_page
65 uniprot_gn_symbol UniProtKB Gene Name symbol feature_page

## when you query biomaRt you always want to ask for the thing you are querying on, as the results
## are returned in random order, and you need to be able to re-sort. Plus if there isn't a return value you need
## to have a blank returned.

## the arguments for getBM are, in essence, 'what I want', 'what I have', 'the IDs themselves', 'the mart object'
## so for the first argument (the attributes argument, if you care to know), we ask for both what you want,
## and what you already have.

> getBM(c("mgi_symbol","ensembl_gene_id"), "ensembl_gene_id", ids, mart)
  mgi_symbol    ensembl_gene_id
1            ENSCGRG00001004858
2      Gpm6a ENSCGRG00001007305
3      Wdr17 ENSCGRG00001010983
4     Spata4 ENSCGRG00001012374
5       Asb5 ENSCGRG00001013169
6      Spcs3 ENSCGRG00001013944

I'll leave it up to you to figure out how to re-order those data to match your DGEList rownames, and note that you should read the biomaRt vignette and help pages if you have questions.

Entering edit mode

I should also note that mgi_symbol worked, but that was probably happenstance. I know from long experience that what you really want are the external_gene_name, but that's not what you would naively search for.

> getBM(c("external_gene_name","ensembl_gene_id", "mgi_symbol"), "ensembl_gene_id", ids, mart)
Cache found
  external_gene_name    ensembl_gene_id mgi_symbol
1                    ENSCGRG00001004858           
2              Gpm6a ENSCGRG00001007305      Gpm6a
3              Wdr17 ENSCGRG00001010983      Wdr17
4             Spata4 ENSCGRG00001012374     Spata4
5               Asb5 ENSCGRG00001013169       Asb5
6              Spcs3 ENSCGRG00001013944      Spcs3
Entering edit mode

Thank you so much James. I have just a quick question. As, somebody else done the read alignment to genome, transcript assembly and featureCount, do you think they used Chinese hamster annotation file instead of Human annotation?? OR it's possible to have Chinese hamster and Human annotation the same gene naming.


Entering edit mode

I have no idea. Can you not ask whomever did the alignment?

Entering edit mode

Thank you James. Is it also possible to extract the chromosome names?

Best, MJ

Entering edit mode

Yes. At this point you should read the vignette.


Login before adding your answer.

Traffic: 290 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6