Errors in chromosome annotation-biomaRt
1
0
Entering edit mode
landenshanie ▴ 40
@landenshanie-23570
Last seen 6 months ago
Australia

I am using the newest version of the rat genome assembly version 8 to align my reads in an RNA-seq experiment. When I map my gene IDs to chromosome name in biomaRt a lot of the genes are wrongly annotated (for example Ddx3y is annotated to chromosome 13 when it should be chromosome Y, Xist is NA when it should be X, Dusp1 is NA and it should be chromosome 5, etc).

This is my code:


mart <- useMart(biomart = "ensembl", dataset = "rnorvegicus_gene_ensembl")

gene_ids <- rownames(dds)
biomart_anno <- getBM(attributes = c("entrezgene_description",
                                     "entrezgene_accession",
                                     "entrezgene_id",
                                     "chromosome_name"),
                                 filters = 'external_gene_name',
                                 values = gene_ids,
                                 mart = mart)

What is the best way to access the updated chromsomes annotations (different package, manual updated list?)?

Thanks Shanie

org.Rn.eg.db biomaRt • 530 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

If you are just using HGNC symbols for the lookup, and you are trying to get NCBI IDs, then I wouldn't use biomaRt because it's based on Ensembl IDs, and you are doing a trans-annotation service mapping you should probably seek to avoid. Instead, you can use NCBI/UCSC based mappings.

> library(Rattus.norvegicus)
## default is rn5, so switch to rn7
> library(TxDb.Rnorvegicus.UCSC.rn7.refGene)
> TxDb(Rattus.norvegicus) <- TxDb.Rnorvegicus.UCSC.rn7.refGene
## map stuff
> select(Rattus.norvegicus, c("Xist","Dusp1","Ddx3y"), c("GENENAME","ENTREZID","SYMBOL", "TXCHROM"), "SYMBOL")
'select()' returned 1:1 mapping
between keys and columns
  SYMBOL  ENTREZID
1   Xist 100911498
2  Dusp1    114856
3  Ddx3y 100312982
                        GENENAME
1 X inactive specific transcript
2 dual specificity phosphatase 1
3  DEAD box helicase 3, Y-linked
  TXCHROM
1    chrX
2   chr10
3    chrY

I should also point out that biomaRt just does a straight SQL type query, and there is no reason to expect that the order of your results will follow the inputs. So you should always ask for the input data (gene symbols in this case) to be returned as well, so you can reorder.

0
Entering edit mode

Thank you, I ran your exact code and it correct for Xist, however Ddx3y is still wrongly annotated to chr13 and, like your results, Dusp1 is wrongly annotated to chr10 (should be 5). Moreover, I only get three genes annotated to the Y chromosome from my whole RNA-seq experiment.

ADD REPLY

Login before adding your answer.

Traffic: 549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6