Search
Question: Extracting all possible annotations between two genomic coordinates
0
gravatar for K
9 months ago by
K40
United States
K40 wrote:

Hello,

I have a list of several base pair locations (each have a start and end base pair). Eg: Chr:17,  BasePair1: 26804211 , BasePair2: 26818676

And I am looking to find all possible annotations including gene name, known SNPs , microRNA etc between these two base pair locations (the region need not be in the exonic region, it can be any region in the genome) 

I'm trying to sort of get a comprehensive understanding of the region from the annotation.

I can create a GRanges object with my input, and can start with the "TxDb.Hsapiens.UCSC.hg19.knownGene" to get gene names.

Can anyone suggest similar packages for me to get any other annotation information ?

Thanks, K

 

 

ADD COMMENTlink modified 9 months ago • written 9 months ago by K40
0
gravatar for Mike Smith
9 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

You could try using the biomaRt package to query the various ensembl databases.

For example, to find genes in a region, and their GC content, you can do something like this:

library(biomaRt)
genes_mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
getBM(mart = mart,
      attributes = c('ensembl_gene_id', 'percentage_gc_content'),
      filters = c('chromosomal_region'),
      values = "17:26804211-26818676")

To retrieve SNP information on a region, you have to query a different dataset, but it would be something like this:

snp_mart = useMart(biomart = "ENSEMBL_MART_SNP", dataset="hsapiens_snp",
                  host = "asia.ensembl.org")
getBM(mart = snp_mart,
      attributes = c('refsnp_id', 'allele'),
      filters = c('chromosomal_region'),
      values = "17:26804211-26818676")

Your example region appears to be in the centromere of chromosome 17, so I wouldn't expect to find much annotation there, but if that happens to be an unfortunate example this approach might be useful for other regions.  I would recommend reading the biomaRt vignette here to get a better idea of what you can do with the package, and looking at the listAttributes() function to understand what information is available for a particular data set.

 

ADD COMMENTlink written 9 months ago by Mike Smith2.1k

Quick follow up question

I believe using this command below, it links to the latest genome reference GrCh38.

genes_mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")

Could you tell me how to change setting to the GRch37 Ensembl ?

ADD REPLYlink written 9 months ago by K40

I believe this is it:

grch37 = useEnsembl(biomart="ensembl",GRCh=37,dataset="hsapiens_gene_ensembl")
ADD REPLYlink written 9 months ago by K40

Yes, that should do it.  You can also do this using the standard useMart() function, and specifying one of the ensembl archives as the host.  This is a little more flexible than useEnsembl() So for GRCh37 you would use:

useMart(biomart = 'ENSEMBL_MART_ENSEMBL', 
        dataset="hsapiens_gene_ensembl", 
        host = "grch37.ensembl.org")

You could also query the oldest mirror availabe (from May 2009) with:

useMart(biomart = 'ENSEMBL_MART_ENSEMBL', 
        dataset="hsapiens_gene_ensembl", 
        host = "may2009.archive.ensembl.org")
ADD REPLYlink written 9 months ago by Mike Smith2.1k
0
gravatar for K
9 months ago by
K40
United States
K40 wrote:

Thank you ! This is helpful -  I will start with Biomart. That's a good start. 

ADD COMMENTlink written 9 months ago by K40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 116 users visited in the last hour