Question: "adding gene annotations" in edgeR doesn't work with select()
gravatar for mjavad2012
4 weeks ago by
mjavad20120 wrote:

I'm trying to add gene annotation to my DGEList object in edgeR package but it giving me error as following:

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

Here is the code I'm using:

genes <- select(, keys=row.names(x), columns=c("SYMBOL", "TXCHROM"), keytype="GENEID")

I think the problem is that I don't know which Human db can cover my geneIDs, I've tried several different dbs but none of the could give me gene symbols. Here is the row names (geneIDs) which I honestly do not even know what keytype they are:

> head(geneid)
[1] "ENSCGRG00001004858" "ENSCGRG00001007305" "ENSCGRG00001010983" "ENSCGRG00001012374" "ENSCGRG00001013169" "ENSCGRG00001013944"

Any help would be appreciated, MJ

edger deg • 70 views
ADD COMMENTlink modified 4 weeks ago by James W. MacDonald51k • written 4 weeks ago by mjavad20120
Answer: "adding gene annotations" in edgeR doesn't work with select()
gravatar for James W. MacDonald
4 weeks ago by
United States
James W. MacDonald51k wrote:

It's not clear if you know this or not, but those are Ensembl gene IDs for Chinese hamster. The annotation package you are trying to use is for Humans, which you are aware of, and the fact that there is an 'eg' in the name is intended to indicate that the central key for the underlying database is Entrez Gene, which is what NCBI used to call their Gene IDs.

So you are trying to use an NCBI-based annotation of human genes to annotate Chinese hamster data for which you have Ensembl IDs, which obviously isn't going to work. If you have Ensembl IDs, you should do all your annotations within the EBI/EBML world, rather than trying to do NCBI -> Ensembl mappings. That means you use biomaRt. The trick here is that you don't know a priori what arguments you need to use for various functions, so you need to figure out what you need.

## load biomaRt and get a 'base' mart. I use the useast mirror because I'm in the Eastern part of the US
> library(biomaRt)
> mart <- useEnsembl("ensembl", mirror = "useast")
## get a listing of the available data sets
> z <- listDatasets(mart)

## what are the mart names for Chinese hamster?
> z[grep("Chinese hamster", z[,2]),]
                    dataset                                  description
32 cgchok1gshd_gene_ensembl Chinese hamster CHOK1GS genes (CHOK1GS_HDv1)
33    cgcrigri_gene_ensembl    Chinese hamster CriGri genes (CriGri_1.0)
35      cgpicr_gene_ensembl     Chinese hamster PICR genes (CriGri-PICR)
33   CriGri_1.0
35  CriGri-PICR

## I know from doing a search at that these are CHOK1GS gene IDs, so we use that
> mart <- useEnsembl("ensembl", "cgchok1gshd_gene_ensembl", mirror = "useast")
> ids <- c("ENSCGRG00001004858", "ENSCGRG00001007305", "ENSCGRG00001010983", "ENSCGRG00001012374", "ENSCGRG00001013169", "ENSCGRG00001013944")

## now we need to know what to look for.

> z <- listAttributes(mart)
> head(z)
                           name                  description         page
1               ensembl_gene_id               Gene stable ID feature_page
2       ensembl_gene_id_version       Gene stable ID version feature_page
3         ensembl_transcript_id         Transcript stable ID feature_page
4 ensembl_transcript_id_version Transcript stable ID version feature_page
5            ensembl_peptide_id            Protein stable ID feature_page
6    ensembl_peptide_id_version    Protein stable ID version feature_page

## I know that ensembl_gene_id is what those IDs are, because I've done this before. 
## We want to get the gene symbols though
> z[grep("symbol", z$description, = TRUE),]
                name                description         page
45        mgi_symbol                 MGI symbol feature_page
65 uniprot_gn_symbol UniProtKB Gene Name symbol feature_page

## when you query biomaRt you always want to ask for the thing you are querying on, as the results
## are returned in random order, and you need to be able to re-sort. Plus if there isn't a return value you need
## to have a blank returned.

## the arguments for getBM are, in essence, 'what I want', 'what I have', 'the IDs themselves', 'the mart object'
## so for the first argument (the attributes argument, if you care to know), we ask for both what you want,
## and what you already have.

> getBM(c("mgi_symbol","ensembl_gene_id"), "ensembl_gene_id", ids, mart)
  mgi_symbol    ensembl_gene_id
1            ENSCGRG00001004858
2      Gpm6a ENSCGRG00001007305
3      Wdr17 ENSCGRG00001010983
4     Spata4 ENSCGRG00001012374
5       Asb5 ENSCGRG00001013169
6      Spcs3 ENSCGRG00001013944

I'll leave it up to you to figure out how to re-order those data to match your DGEList rownames, and note that you should read the biomaRt vignette and help pages if you have questions.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by James W. MacDonald51k

I should also note that mgi_symbol worked, but that was probably happenstance. I know from long experience that what you really want are the external_gene_name, but that's not what you would naively search for.

> getBM(c("external_gene_name","ensembl_gene_id", "mgi_symbol"), "ensembl_gene_id", ids, mart)
Cache found
  external_gene_name    ensembl_gene_id mgi_symbol
1                    ENSCGRG00001004858           
2              Gpm6a ENSCGRG00001007305      Gpm6a
3              Wdr17 ENSCGRG00001010983      Wdr17
4             Spata4 ENSCGRG00001012374     Spata4
5               Asb5 ENSCGRG00001013169       Asb5
6              Spcs3 ENSCGRG00001013944      Spcs3
ADD REPLYlink written 4 weeks ago by James W. MacDonald51k

Thank you so much James. I have just a quick question. As, somebody else done the read alignment to genome, transcript assembly and featureCount, do you think they used Chinese hamster annotation file instead of Human annotation?? OR it's possible to have Chinese hamster and Human annotation the same gene naming.


ADD REPLYlink written 4 weeks ago by mjavad20120

I have no idea. Can you not ask whomever did the alignment?

ADD REPLYlink written 4 weeks ago by James W. MacDonald51k

Thank you James. Is it also possible to extract the chromosome names?

Best, MJ

ADD REPLYlink written 4 weeks ago by mjavad20120

Yes. At this point you should read the vignette.

ADD REPLYlink written 4 weeks ago by James W. MacDonald51k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 422 users visited in the last hour