Search
Question: Get cytoband from refsnp ID using biomaRt in R
0
gravatar for anajacintafernandes
10 weeks ago by
CBMR, Portugal
anajacintafernandes0 wrote:

Hi, 

Do any of you know how to get the cytoband location out of a refsnp_id through biomaRt?

I'm currently using the following Marts:

hsapiens <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

I know that the attribute band only exists for the hsapiens mart.

Do you know of a way I can get it for snps? 

Thanks for your time.

Ana

ADD COMMENTlink modified 10 weeks ago by Mike Smith2.1k • written 10 weeks ago by anajacintafernandes0
1
gravatar for Mike Smith
10 weeks ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

I think if you're lucky enough that your SNPs fall within genes you can do this with the getLDS() function.  In this example we'll used three SNPs based identified by rsID:

library(biomaRt)
hsapiens <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

snps <-  c('rs905327004', 'rs945943648', 'rs765779967')

Then we query both datasets and ask for the chromosomal location from the variation dataset and the cytoband from the genes dataset:

getLDS(attributes = c('refsnp_id',"chr_name", "chrom_start", "chrom_end"),
       filters = 'snp_filter',
       values = snps,
       mart = snpmart,
       attributesL = c('band'),
       filtersL = '',
       valuesL = '',
       martL = hsapiens, bmHeader = FALSE)
    refsnp_id chr_name chrom_start chrom_end   band
1 rs765779967       16    60000059  60000059    q21
2 rs905327004        1       10199     10199 p36.33

However you can see there's only two results.  The missing SNP is intergenic, and I think because the genes dataset is gene-centric it doesn't find any results.  This is obviously not very satisfactory, it feels like there should be a straightforward solution to getting the cytoband from a coordinate.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Mike Smith2.1k

How about this function, which combines biomaRt and AnnotationHub to get the results.  

library(AnnotationHub)
library(GenomicRanges)
library(biomaRt)
library(dplyr)

snp2cyto <- function(snp_ids) {
  
  ## query ensembl snp mart to find coordinates for each SNP
  snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
  snp_locs <- getBM(attributes = c('refsnp_id', "chr_name", 
                                   "chrom_start", "chrom_end"), 
                    filters = 'snp_filter',
                    values = snps,
                    mart = snpmart)
  
  ## load annotation hub and select hg38 genome cytobands
  ah <- AnnotationHub()
  cyto_hg38 <- query(ah, 'AHCytoBands')[['AH53178']]
  
  ## add "chr" to names, and create GRanges for all SNPs
  snp_chrom <- paste0('chr', snp_locs[,2])
  snp_granges <- GRanges(seqnames = snp_chrom, 
                         ranges = IRanges(start = snp_locs[,3], 
                                          end = snp_locs[,4]))
  
  ## match SNPs to cyto bands
  hits <- cyto_hg38[findOverlaps(query = snp_granges, 
                            subject = cyto_hg38, 
                            select = "first")]
  
  ## return values
  data_frame(snp_id = snp_locs[,1],
             chrom = snp_chrom,
             cyto_band = mcols(hits)$name)
  
}

Then if we try the same set of SNPs as before:

snps <-  c('rs905327004', 'rs945943648', 'rs765779967')
snp2cyto(snps)
# A tibble: 3 x 3
       snp_id chrom cyto_band
        <chr> <chr>    <fctr>
1 rs765779967 chr16       q21
2 rs905327004  chr1    p36.33
3 rs945943648  chr8     p23.1

I'm sure there's an AnnotationHub only way of doing it, but that seems to get where I think you need to be.

ADD REPLYlink modified 9 weeks ago • written 10 weeks ago by Mike Smith2.1k

Thanks, this is amazing!

Just a few comments:

1) I needed to remove CHR annotations, otherwise I would get an error. 

snp_locs <- snp_locs[grep("CHR", snp_locs$chr_name, invert = TRUE),]

2) Typo error :p

 ## match SNPs to cyto bands
  hits <- cyto[findOverlaps(query = snp_granges, 
                            subject = cyto_hg38, 
                            select = "first")]

 

it should be 

hits <- cyto_hg38[findOverlaps(query = snp_granges, 
                            subject = cyto_hg38, 
                            select = "first")]

 

 

Thank you so much for your help!!

 

 

ADD REPLYlink written 9 weeks ago by anajacintafernandes0

Glad it works for you.  I've updated my post to fix the typo, this shows why you should clear your environment before testing a solution!

Can you give an rsID example that produces the CHR annotations?  I guess that doesn't show up with my 3 examples, so I'm just curious to know where they're coming from.  I presume it's a SNP on on an unplaced contig?

ADD REPLYlink written 9 weeks ago by Mike Smith2.1k

Here's an example of a SNP that get the CHR annotation:  rs7931342

snp_locs
    refsnp_id           chr_name chrom_start chrom_end
1  rs7931342                 11    69227030  69227030
2  rs7931342 CHR_HSCHR11_1_CTG3    69230683  69230683

then, when you move to hits you get the following error:

Error: subscript contains NAs
In addition: Warning message:
In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrCHR_HSCHR11_1_CTG3
  - in 'y': chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, chr1_GL383518v1_alt, chr1_GL383519v1_alt, chr1_GL383520v2_alt, chr1_KI270706v1_random, chr1_KI270707v1_random, chr1_KI270708v1_random, chr1_KI270709v1_random, chr1_KI270710v1_random, chr1_KI270711v1_random, chr1_KI270712v1_random, chr1_KI270713v1_random, chr1_KI270714v1_random, chr1_KI270759v1_alt, chr1_KI270760v1_alt, chr1_KI270761v1_alt, chr1_KI270762v1_alt, chr1_KI270763v1_alt, chr1_KI270764v1_alt, chr1_KI270765v1_alt, chr1_KI270766v1_alt, chr1_KI270892v1_alt, chr2_GL383521v1_alt, chr2_GL383522v1_alt, chr2_GL582966v2_alt, chr2_KI270715v1_random, chr2_KI270716v1_random, chr2_KI270767v1_alt, chr2_KI270768v1_alt, chr2_KI270769v1_alt, chr2_KI270770v1_alt, chr2_KI270771v1_alt, chr2_KI270772v1_alt, chr2_KI270773v1_alt, chr [... truncated]

 

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by anajacintafernandes0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 167 users visited in the last hour