From location to get the SNP rs ID

0

Entering edit mode

Fabrice Tourre ▴ 970

@fabrice-tourre-4394

Last seen 9.6 years ago

Dear list, I have a list of genome positions. Is there is a package to get the snp name at this position? It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608 Thank you.

• 4.6k views

ADD COMMENT • link updated 10.9 years ago by Hervé Pagès 16k • written 10.9 years ago by Fabrice Tourre ▴ 970

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 1 day ago

Seattle, WA, United States

Hi Fabrice, With the loc2rsid() function (see code at the bottom) and using SNPlocs.Hsapiens.dbSNP.20100427 (I don't have SNPlocs.Hsapiens.dbSNP.20120608 installed at the moment): mylocs <- GRanges(Rle(c("ch22", "chM", "ch21", "ch22"), c(2, 2, 1, 1)), IRanges(c(200, 16369861, 146, 73, 9411609, 16051107), width=1)) > mylocs GRanges with 6 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] ch22 [ 200, 200] * [2] ch22 [16369861, 16369861] * [3] chM [ 146, 146] * [4] chM [ 73, 73] * [5] ch21 [ 9411609, 9411609] * [6] ch22 [16051107, 16051107] * --- seqlengths: ch21 ch22 chM NA NA NA Then: library(SNPlocs.Hsapiens.dbSNP.20100427) > loc2rsid(mylocs) CharacterList of length 6 [[1]] character(0) [[2]] rs75258394 rs78180314 [[3]] character(0) [[4]] character(0) [[5]] rs76676778 [[6]] rs6518357 You need to be careful to use the same chromosome naming convention as the SNPlocs package (which uses dbSNP naming convention). For example in the 'mylocs' object, the mitochondrial chromosome is named chM but in the SNPlocs package it's chMT: > getSNPcount() ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10 1369185 1422439 1186807 1191984 1064540 1040203 977275 919207 740516 859433 ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 ch20 855225 822825 615382 543041 499101 556394 464686 482359 375259 450685 ch21 ch22 chX chY chMT 253480 244482 445596 53908 667 This is why the locations on chM are not mapped to anything. But after renaming: > seqlevels(mylocs)[3] <- "chMT" > seqlevels(mylocs) [1] "ch21" "ch22" "chMT" > loc2rsid(mylocs) CharacterList of length 6 [[1]] character(0) [[2]] rs75258394 rs78180314 [[3]] rs72619361 [[4]] rs3087742 [[5]] rs76676778 [[6]] rs6518357 they get mapped. The CharacterList object returned by loc2rsid() can be set as a metadata column of the input GRanges object: > mcols(mylocs)$rsid <- loc2rsid(mylocs) > mylocs GRanges with 6 ranges and 1 metadata column: seqnames ranges strand | rsid <rle> <iranges> <rle> | <characterlist> [1] ch22 [ 200, 200] * | [2] ch22 [16369861, 16369861] * | rs75258394,rs78180314 [3] chMT [ 146, 146] * | rs72619361 [4] chMT [ 73, 73] * | rs3087742 [5] ch21 [ 9411609, 9411609] * | rs76676778 [6] ch22 [16051107, 16051107] * | rs6518357 --- seqlengths: ch21 ch22 chMT NA NA NA Hope this helps, H. ## Maps the genomic locations in 'locs' to the corresponding rs ids. ## 'locs' must be a GRanges object where all ranges are of with 1. ## Because dbSNP can assign distinct ids to SNPs located at the same ## position, the mapping is a 1-to-many relationship. ## Returns a CharacterList of the same length as 'locs'. loc2rsid <- function(locs) { if (!is(locs, "GRanges")) stop("'locs' must be a GRanges object") if (!all(width(locs) == 1L)) stop("all ranges in 'locs' must be of width 1") common_seqlevels <- intersect(seqlevels(locs), names(getSNPcount())) if (length(common_seqlevels) == 0L) stop("chromosome names (a.k.a. seqlevels) in 'locs' don't seem to ", "be\n compatible with the chromosome names in the SNPlocs ", "package. Maybe they\n use a different naming convention? ", "If that's the case then you first need\n to rename the ", "seqlevels in 'locs'. See '?seqlevels' for how to do this.") f <- as.factor(seqnames(locs)) locs_by_chrom <- split(start(locs), f) rsids_by_chrom <- lapply(seq_along(locs_by_chrom), function(i) { seqname <- levels(f)[i] locs2 <- locs_by_chrom[[i]] ans2 <- vector("list", length=length(locs2)) if (length(locs2) == 0L || !(seqname %in% common_seqlevels)) return(ans2) snplocs <- getSNPlocs(seqname, as.GRanges=FALSE) hits <- findMatches(locs2, snplocs$loc) rsids <- paste0("rs", snplocs$RefSNP_id[subjectHits(hits)]) q_hits <- queryHits(hits) tmp <- split(rsids, q_hits) ans2[as.integer(names(tmp))] <- tmp ans2 }) CharacterList(unsplit(rsids_by_chrom, f)) } PS: This functionality will make it to the SNPlocs packages (in this form or another). On 05/30/2013 09:54 PM, Fabrice Tourre wrote: > Dear list, > > I have a list of genome positions. Is there is a package to get the > snp name at this position? > > It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608 > > Thank you. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 10.9 years ago Hervé Pagès 16k

0

Entering edit mode

Hervé Pagès, It helps. Thank you very much. But loc2rsid seems does not in version SNPlocs.Hsapiens.dbSNP.2010042 and SNPlocs.Hsapiens.dbSNP.20120608. On Fri, May 31, 2013 at 3:42 AM, Hervé Pagès <hpages at="" fhcrc.org=""> wrote: > Hi Fabrice, > > With the loc2rsid() function (see code at the bottom) and using > SNPlocs.Hsapiens.dbSNP.20100427 (I don't have > SNPlocs.Hsapiens.dbSNP.20120608 installed at the moment): > > > mylocs <- GRanges(Rle(c("ch22", "chM", "ch21", "ch22"), c(2, 2, 1, 1)), > IRanges(c(200, 16369861, 146, 73, 9411609, 16051107), > width=1)) > > > mylocs > GRanges with 6 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] ch22 [ 200, 200] * > [2] ch22 [16369861, 16369861] * > [3] chM [ 146, 146] * > [4] chM [ 73, 73] * > [5] ch21 [ 9411609, 9411609] * > [6] ch22 [16051107, 16051107] * > --- > seqlengths: > ch21 ch22 chM > NA NA NA > > Then: > > library(SNPlocs.Hsapiens.dbSNP.20100427) > > > loc2rsid(mylocs) > CharacterList of length 6 > [[1]] character(0) > [[2]] rs75258394 rs78180314 > [[3]] character(0) > [[4]] character(0) > [[5]] rs76676778 > [[6]] rs6518357 > > You need to be careful to use the same chromosome naming convention as > the SNPlocs package (which uses dbSNP naming convention). For example > in the 'mylocs' object, the mitochondrial chromosome is named chM but > in the SNPlocs package it's chMT: > > > getSNPcount() > ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 > ch10 > 1369185 1422439 1186807 1191984 1064540 1040203 977275 919207 740516 > 859433 > ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 > ch20 > 855225 822825 615382 543041 499101 556394 464686 482359 375259 > 450685 > ch21 ch22 chX chY chMT > 253480 244482 445596 53908 667 > > This is why the locations on chM are not mapped to anything. But > after renaming: > > > seqlevels(mylocs)[3] <- "chMT" > > seqlevels(mylocs) > [1] "ch21" "ch22" "chMT" > > > loc2rsid(mylocs) > CharacterList of length 6 > [[1]] character(0) > [[2]] rs75258394 rs78180314 > [[3]] rs72619361 > [[4]] rs3087742 > [[5]] rs76676778 > [[6]] rs6518357 > > they get mapped. > > The CharacterList object returned by loc2rsid() can be set as a > metadata column of the input GRanges object: > > > mcols(mylocs)$rsid <- loc2rsid(mylocs) > > mylocs > GRanges with 6 ranges and 1 metadata column: > seqnames ranges strand | rsid > <rle> <iranges> <rle> | <characterlist> > [1] ch22 [ 200, 200] * | > [2] ch22 [16369861, 16369861] * | rs75258394,rs78180314 > [3] chMT [ 146, 146] * | rs72619361 > [4] chMT [ 73, 73] * | rs3087742 > [5] ch21 [ 9411609, 9411609] * | rs76676778 > [6] ch22 [16051107, 16051107] * | rs6518357 > --- > seqlengths: > ch21 ch22 chMT > NA NA NA > > Hope this helps, > H. > > > ## Maps the genomic locations in 'locs' to the corresponding rs ids. > ## 'locs' must be a GRanges object where all ranges are of with 1. > ## Because dbSNP can assign distinct ids to SNPs located at the same > ## position, the mapping is a 1-to-many relationship. > ## Returns a CharacterList of the same length as 'locs'. > loc2rsid <- function(locs) > { > if (!is(locs, "GRanges")) > stop("'locs' must be a GRanges object") > if (!all(width(locs) == 1L)) > stop("all ranges in 'locs' must be of width 1") > common_seqlevels <- intersect(seqlevels(locs), names(getSNPcount())) > if (length(common_seqlevels) == 0L) > stop("chromosome names (a.k.a. seqlevels) in 'locs' don't seem to ", > "be\n compatible with the chromosome names in the SNPlocs ", > "package. Maybe they\n use a different naming convention? ", > "If that's the case then you first need\n to rename the ", > "seqlevels in 'locs'. See '?seqlevels' for how to do this.") > f <- as.factor(seqnames(locs)) > locs_by_chrom <- split(start(locs), f) > rsids_by_chrom <- lapply(seq_along(locs_by_chrom), > function(i) > { > seqname <- levels(f)[i] > locs2 <- locs_by_chrom[[i]] > ans2 <- vector("list", length=length(locs2)) > if (length(locs2) == 0L || !(seqname %in% common_seqlevels)) > return(ans2) > snplocs <- getSNPlocs(seqname, as.GRanges=FALSE) > hits <- findMatches(locs2, snplocs$loc) > rsids <- paste0("rs", snplocs$RefSNP_id[subjectHits(hits)]) > q_hits <- queryHits(hits) > tmp <- split(rsids, q_hits) > ans2[as.integer(names(tmp))] <- tmp > ans2 > }) > CharacterList(unsplit(rsids_by_chrom, f)) > } > > PS: This functionality will make it to the SNPlocs packages (in this > form or another). > > > > On 05/30/2013 09:54 PM, Fabrice Tourre wrote: >> >> Dear list, >> >> I have a list of genome positions. Is there is a package to get the >> snp name at this position? >> >> It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608 >> >> Thank you. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319

ADD REPLY • link 10.9 years ago Fabrice Tourre ▴ 970

0

Entering edit mode

On 05/31/2013 03:31 AM, Fabrice Tourre wrote: > Hervé Pagès, > > It helps. Thank you very much. > > But loc2rsid seems does not in version SNPlocs.Hsapiens.dbSNP.2010042 > and SNPlocs.Hsapiens.dbSNP.20120608. Nope. That's why I sent you the code ;-) H. > > On Fri, May 31, 2013 at 3:42 AM, Hervé Pagès <hpages at="" fhcrc.org=""> wrote: >> Hi Fabrice, >> >> With the loc2rsid() function (see code at the bottom) and using >> SNPlocs.Hsapiens.dbSNP.20100427 (I don't have >> SNPlocs.Hsapiens.dbSNP.20120608 installed at the moment): >> >> >> mylocs <- GRanges(Rle(c("ch22", "chM", "ch21", "ch22"), c(2, 2, 1, 1)), >> IRanges(c(200, 16369861, 146, 73, 9411609, 16051107), >> width=1)) >> >> > mylocs >> GRanges with 6 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] ch22 [ 200, 200] * >> [2] ch22 [16369861, 16369861] * >> [3] chM [ 146, 146] * >> [4] chM [ 73, 73] * >> [5] ch21 [ 9411609, 9411609] * >> [6] ch22 [16051107, 16051107] * >> --- >> seqlengths: >> ch21 ch22 chM >> NA NA NA >> >> Then: >> >> library(SNPlocs.Hsapiens.dbSNP.20100427) >> >> > loc2rsid(mylocs) >> CharacterList of length 6 >> [[1]] character(0) >> [[2]] rs75258394 rs78180314 >> [[3]] character(0) >> [[4]] character(0) >> [[5]] rs76676778 >> [[6]] rs6518357 >> >> You need to be careful to use the same chromosome naming convention as >> the SNPlocs package (which uses dbSNP naming convention). For example >> in the 'mylocs' object, the mitochondrial chromosome is named chM but >> in the SNPlocs package it's chMT: >> >> > getSNPcount() >> ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 >> ch10 >> 1369185 1422439 1186807 1191984 1064540 1040203 977275 919207 740516 >> 859433 >> ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 >> ch20 >> 855225 822825 615382 543041 499101 556394 464686 482359 375259 >> 450685 >> ch21 ch22 chX chY chMT >> 253480 244482 445596 53908 667 >> >> This is why the locations on chM are not mapped to anything. But >> after renaming: >> >> > seqlevels(mylocs)[3] <- "chMT" >> > seqlevels(mylocs) >> [1] "ch21" "ch22" "chMT" >> >> > loc2rsid(mylocs) >> CharacterList of length 6 >> [[1]] character(0) >> [[2]] rs75258394 rs78180314 >> [[3]] rs72619361 >> [[4]] rs3087742 >> [[5]] rs76676778 >> [[6]] rs6518357 >> >> they get mapped. >> >> The CharacterList object returned by loc2rsid() can be set as a >> metadata column of the input GRanges object: >> >> > mcols(mylocs)$rsid <- loc2rsid(mylocs) >> > mylocs >> GRanges with 6 ranges and 1 metadata column: >> seqnames ranges strand | rsid >> <rle> <iranges> <rle> | <characterlist> >> [1] ch22 [ 200, 200] * | >> [2] ch22 [16369861, 16369861] * | rs75258394,rs78180314 >> [3] chMT [ 146, 146] * | rs72619361 >> [4] chMT [ 73, 73] * | rs3087742 >> [5] ch21 [ 9411609, 9411609] * | rs76676778 >> [6] ch22 [16051107, 16051107] * | rs6518357 >> --- >> seqlengths: >> ch21 ch22 chMT >> NA NA NA >> >> Hope this helps, >> H. >> >> >> ## Maps the genomic locations in 'locs' to the corresponding rs ids. >> ## 'locs' must be a GRanges object where all ranges are of with 1. >> ## Because dbSNP can assign distinct ids to SNPs located at the same >> ## position, the mapping is a 1-to-many relationship. >> ## Returns a CharacterList of the same length as 'locs'. >> loc2rsid <- function(locs) >> { >> if (!is(locs, "GRanges")) >> stop("'locs' must be a GRanges object") >> if (!all(width(locs) == 1L)) >> stop("all ranges in 'locs' must be of width 1") >> common_seqlevels <- intersect(seqlevels(locs), names(getSNPcount())) >> if (length(common_seqlevels) == 0L) >> stop("chromosome names (a.k.a. seqlevels) in 'locs' don't seem to ", >> "be\n compatible with the chromosome names in the SNPlocs ", >> "package. Maybe they\n use a different naming convention? ", >> "If that's the case then you first need\n to rename the ", >> "seqlevels in 'locs'. See '?seqlevels' for how to do this.") >> f <- as.factor(seqnames(locs)) >> locs_by_chrom <- split(start(locs), f) >> rsids_by_chrom <- lapply(seq_along(locs_by_chrom), >> function(i) >> { >> seqname <- levels(f)[i] >> locs2 <- locs_by_chrom[[i]] >> ans2 <- vector("list", length=length(locs2)) >> if (length(locs2) == 0L || !(seqname %in% common_seqlevels)) >> return(ans2) >> snplocs <- getSNPlocs(seqname, as.GRanges=FALSE) >> hits <- findMatches(locs2, snplocs$loc) >> rsids <- paste0("rs", snplocs$RefSNP_id[subjectHits(hits)]) >> q_hits <- queryHits(hits) >> tmp <- split(rsids, q_hits) >> ans2[as.integer(names(tmp))] <- tmp >> ans2 >> }) >> CharacterList(unsplit(rsids_by_chrom, f)) >> } >> >> PS: This functionality will make it to the SNPlocs packages (in this >> form or another). >> >> >> >> On 05/30/2013 09:54 PM, Fabrice Tourre wrote: >>> >>> Dear list, >>> >>> I have a list of genome positions. Is there is a package to get the >>> snp name at this position? >>> >>> It is the reverse function as rsid2loc in SNPlocs.Hsapiens.dbSNP.20120608 >>> >>> Thank you. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 10.9 years ago Hervé Pagès 16k

Login before adding your answer.