I think if you're lucky enough that your SNPs fall within genes you can do this with the getLDS() function. In this example we'll used three SNPs based identified by rsID:
library(biomaRt)
hsapiens <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
snps <- c('rs905327004', 'rs945943648', 'rs765779967')
Then we query both datasets and ask for the chromosomal location from the variation dataset and the cytoband from the genes dataset:
getLDS(attributes = c('refsnp_id',"chr_name", "chrom_start", "chrom_end"),
filters = 'snp_filter',
values = snps,
mart = snpmart,
attributesL = c('band'),
filtersL = '',
valuesL = '',
martL = hsapiens, bmHeader = FALSE)
refsnp_id chr_name chrom_start chrom_end band
1 rs765779967 16 60000059 60000059 q21
2 rs905327004 1 10199 10199 p36.33
However you can see there's only two results. The missing SNP is intergenic, and I think because the genes dataset is gene-centric it doesn't find any results. This is obviously not very satisfactory, it feels like there should be a straightforward solution to getting the cytoband from a coordinate.
How about this function, which combines biomaRt and AnnotationHub to get the results.
library(AnnotationHub) library(GenomicRanges) library(biomaRt) library(dplyr) snp2cyto <- function(snp_ids) { ## query ensembl snp mart to find coordinates for each SNP snpmart <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp") snp_locs <- getBM(attributes = c('refsnp_id', "chr_name", "chrom_start", "chrom_end"), filters = 'snp_filter', values = snps, mart = snpmart) ## load annotation hub and select hg38 genome cytobands ah <- AnnotationHub() cyto_hg38 <- query(ah, 'AHCytoBands')[['AH53178']] ## add "chr" to names, and create GRanges for all SNPs snp_chrom <- paste0('chr', snp_locs[,2]) snp_granges <- GRanges(seqnames = snp_chrom, ranges = IRanges(start = snp_locs[,3], end = snp_locs[,4])) ## match SNPs to cyto bands hits <- cyto_hg38[findOverlaps(query = snp_granges, subject = cyto_hg38, select = "first")] ## return values data_frame(snp_id = snp_locs[,1], chrom = snp_chrom, cyto_band = mcols(hits)$name) }Then if we try the same set of SNPs as before:
snps <- c('rs905327004', 'rs945943648', 'rs765779967') snp2cyto(snps)# A tibble: 3 x 3 snp_id chrom cyto_band <chr> <chr> <fctr> 1 rs765779967 chr16 q21 2 rs905327004 chr1 p36.33 3 rs945943648 chr8 p23.1I'm sure there's an AnnotationHub only way of doing it, but that seems to get where I think you need to be.
Thanks, this is amazing!
Just a few comments:
1) I needed to remove CHR annotations, otherwise I would get an error.
snp_locs <- snp_locs[grep("CHR", snp_locs$chr_name, invert = TRUE),]2) Typo error :p
## match SNPs to cyto bands hits <- cyto[findOverlaps(query = snp_granges, subject = cyto_hg38, select = "first")]it should be
hits <- cyto_hg38[findOverlaps(query = snp_granges, subject = cyto_hg38, select = "first")]Thank you so much for your help!!
Glad it works for you. I've updated my post to fix the typo, this shows why you should clear your environment before testing a solution!
Can you give an rsID example that produces the CHR annotations? I guess that doesn't show up with my 3 examples, so I'm just curious to know where they're coming from. I presume it's a SNP on on an unplaced contig?
Here's an example of a SNP that get the CHR annotation: rs7931342
then, when you move to
hits you get the following error: