Get cytoband from refsnp ID using biomaRt in R
1
0
Entering edit mode
@anajacintafernandes-11899
Last seen 6.6 years ago
CBMR, Portugal

Hi, 

Do any of you know how to get the cytoband location out of a refsnp_id through biomaRt?

I'm currently using the following Marts:

hsapiens <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

I know that the attribute band only exists for the hsapiens mart.

Do you know of a way I can get it for snps? 

Thanks for your time.

Ana

biomart R • 2.8k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 22 hours ago
EMBL Heidelberg

I think if you're lucky enough that your SNPs fall within genes you can do this with the getLDS() function.  In this example we'll used three SNPs based identified by rsID:

library(biomaRt)
hsapiens <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

snps <-  c('rs905327004', 'rs945943648', 'rs765779967')

Then we query both datasets and ask for the chromosomal location from the variation dataset and the cytoband from the genes dataset:

getLDS(attributes = c('refsnp_id',"chr_name", "chrom_start", "chrom_end"),
       filters = 'snp_filter',
       values = snps,
       mart = snpmart,
       attributesL = c('band'),
       filtersL = '',
       valuesL = '',
       martL = hsapiens, bmHeader = FALSE)
    refsnp_id chr_name chrom_start chrom_end   band
1 rs765779967       16    60000059  60000059    q21
2 rs905327004        1       10199     10199 p36.33

However you can see there's only two results.  The missing SNP is intergenic, and I think because the genes dataset is gene-centric it doesn't find any results.  This is obviously not very satisfactory, it feels like there should be a straightforward solution to getting the cytoband from a coordinate.

ADD COMMENT
0
Entering edit mode

How about this function, which combines biomaRt and AnnotationHub to get the results.  

library(AnnotationHub)
library(GenomicRanges)
library(biomaRt)
library(dplyr)

snp2cyto <- function(snp_ids) {
  
  ## query ensembl snp mart to find coordinates for each SNP
  snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
  snp_locs <- getBM(attributes = c('refsnp_id', "chr_name", 
                                   "chrom_start", "chrom_end"), 
                    filters = 'snp_filter',
                    values = snps,
                    mart = snpmart)
  
  ## load annotation hub and select hg38 genome cytobands
  ah <- AnnotationHub()
  cyto_hg38 <- query(ah, 'AHCytoBands')[['AH53178']]
  
  ## add "chr" to names, and create GRanges for all SNPs
  snp_chrom <- paste0('chr', snp_locs[,2])
  snp_granges <- GRanges(seqnames = snp_chrom, 
                         ranges = IRanges(start = snp_locs[,3], 
                                          end = snp_locs[,4]))
  
  ## match SNPs to cyto bands
  hits <- cyto_hg38[findOverlaps(query = snp_granges, 
                            subject = cyto_hg38, 
                            select = "first")]
  
  ## return values
  data_frame(snp_id = snp_locs[,1],
             chrom = snp_chrom,
             cyto_band = mcols(hits)$name)
  
}

Then if we try the same set of SNPs as before:

snps <-  c('rs905327004', 'rs945943648', 'rs765779967')
snp2cyto(snps)
# A tibble: 3 x 3
       snp_id chrom cyto_band
        <chr> <chr>    <fctr>
1 rs765779967 chr16       q21
2 rs905327004  chr1    p36.33
3 rs945943648  chr8     p23.1

I'm sure there's an AnnotationHub only way of doing it, but that seems to get where I think you need to be.

ADD REPLY
0
Entering edit mode

Thanks, this is amazing!

Just a few comments:

1) I needed to remove CHR annotations, otherwise I would get an error. 

snp_locs <- snp_locs[grep("CHR", snp_locs$chr_name, invert = TRUE),]

2) Typo error :p

 ## match SNPs to cyto bands
  hits <- cyto[findOverlaps(query = snp_granges, 
                            subject = cyto_hg38, 
                            select = "first")]

 

it should be 

hits <- cyto_hg38[findOverlaps(query = snp_granges, 
                            subject = cyto_hg38, 
                            select = "first")]

 

 

Thank you so much for your help!!

 

 

ADD REPLY
0
Entering edit mode

Glad it works for you.  I've updated my post to fix the typo, this shows why you should clear your environment before testing a solution!

Can you give an rsID example that produces the CHR annotations?  I guess that doesn't show up with my 3 examples, so I'm just curious to know where they're coming from.  I presume it's a SNP on on an unplaced contig?

ADD REPLY
0
Entering edit mode

Here's an example of a SNP that get the CHR annotation:  rs7931342

snp_locs
    refsnp_id           chr_name chrom_start chrom_end
1  rs7931342                 11    69227030  69227030
2  rs7931342 CHR_HSCHR11_1_CTG3    69230683  69230683

then, when you move to hits you get the following error:

Error: subscript contains NAs
In addition: Warning message:
In .Seqinfo.mergexy(x, y) :
  Each of the 2 combined objects has sequence levels not in the other:
  - in 'x': chrCHR_HSCHR11_1_CTG3
  - in 'y': chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, chr1_GL383518v1_alt, chr1_GL383519v1_alt, chr1_GL383520v2_alt, chr1_KI270706v1_random, chr1_KI270707v1_random, chr1_KI270708v1_random, chr1_KI270709v1_random, chr1_KI270710v1_random, chr1_KI270711v1_random, chr1_KI270712v1_random, chr1_KI270713v1_random, chr1_KI270714v1_random, chr1_KI270759v1_alt, chr1_KI270760v1_alt, chr1_KI270761v1_alt, chr1_KI270762v1_alt, chr1_KI270763v1_alt, chr1_KI270764v1_alt, chr1_KI270765v1_alt, chr1_KI270766v1_alt, chr1_KI270892v1_alt, chr2_GL383521v1_alt, chr2_GL383522v1_alt, chr2_GL582966v2_alt, chr2_KI270715v1_random, chr2_KI270716v1_random, chr2_KI270767v1_alt, chr2_KI270768v1_alt, chr2_KI270769v1_alt, chr2_KI270770v1_alt, chr2_KI270771v1_alt, chr2_KI270772v1_alt, chr2_KI270773v1_alt, chr [... truncated]

 

ADD REPLY

Login before adding your answer.

Traffic: 553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6