VariantAnnotation and distance to genes
1
0
Entering edit mode
Gad Abraham ▴ 20
@gad-abraham-5935
Last seen 10.4 years ago
Hi, I'm using VariantAnnotation, which is a very nice package, to annotate some SNPs, based on http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps- and-indels-in-bioconductor . For SNPs annotated as intergenic and therefore falling between two genes (PRECEDEID and FOLLOWID), is there a way of getting the actual distance between the SNPs and these two genes? I'm using: head(m, 3) RS Chr Pos 1 rs2187668 chr6 32713862 2 rs9357152 chr6 32772938 3 rs3099844 chr6 31556955 target <- with(m, GRanges(seqnames=Rle(Chr), ranges=IRanges(Pos, end=Pos, names=RS), strand=Rle(strand("*")) ) ) loc <- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene, AllVariants()) names(loc) <- NULL out <- as.data.frame(loc) head(out, 3) names seqnames start end LOCATION GENEID PRECEDEID FOLLOWID GeneSymbol PrecedeSymbol FollowSymbol 1 rs2187668 chr6 32713862 32713862 intron 3117 <na> <na> HLA-DQA1 <na> <na> 4 rs9357152 chr6 32772938 32772938 intergenic <na> 3119 3117 <na> HLA-DQB1 HLA-DQA1 5 rs3099844 chr6 31556955 31556955 intergenic <na> 4277 352961 <na> MICB HCG26 I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA- DQA1. Thanks, Gad [[alternative HTML version deleted]]
VariantAnnotation VariantAnnotation VariantAnnotation VariantAnnotation • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States
Hi Gad, On 5/12/2013 11:41 PM, Gad Abraham wrote: > Hi, > > I'm using VariantAnnotation, which is a very nice package, to annotate some > SNPs, based on > http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps- and-indels-in-bioconductor > . > > For SNPs annotated as intergenic and therefore falling between two genes > (PRECEDEID and FOLLOWID), is there a way of getting the actual distance > between the SNPs and these two genes? > > I'm using: > > head(m, 3) > RS Chr Pos > 1 rs2187668 chr6 32713862 > 2 rs9357152 chr6 32772938 > 3 rs3099844 chr6 31556955 > > target<- with(m, > GRanges(seqnames=Rle(Chr), > ranges=IRanges(Pos, end=Pos, names=RS), > strand=Rle(strand("*")) > ) > ) > > loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene, > AllVariants()) > names(loc)<- NULL > out<- as.data.frame(loc) > > head(out, 3) > names seqnames start end LOCATION GENEID PRECEDEID FOLLOWID > GeneSymbol PrecedeSymbol FollowSymbol > 1 rs2187668 chr6 32713862 32713862 intron 3117<na> > <na> HLA-DQA1<na> <na> > 4 rs9357152 chr6 32772938 32772938 intergenic<na> 3119 > 3117<na> HLA-DQB1 HLA-DQA1 > 5 rs3099844 chr6 31556955 31556955 intergenic<na> 4277 > 352961<na> MICB HCG26 > > I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA- DQA1. There might be a far superior way to do this, but one way is to assume that cds == genes and go from there: > gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene, columns=c("cds_id","gene_id")) > ind <- sapply(values(gns)$gene_id, function(x) any(x %in% c("3117","3119"))) > sum(ind) [1] 28 > gns[ind,] GRanges with 28 ranges and 2 metadata columns: seqnames ranges strand | cds_id gene_id <rle> <iranges> <rle> | <integer> <characterlist> [1] chr6 [32605236, 32605317] + | 73476 3117 [2] chr6 [32609087, 32609335] + | 73477 3117 [3] chr6 [32609749, 32610030] + | 73478 3117 [4] chr6 [32609749, 32610164] + | 73479 3117 [5] chr6 [32610387, 32610541] + | 73480 3117 ... ... ... ... ... ... ... [24] chr6_qbl_hap6 [3860872, 3860982] - | 233860 3119 [25] chr6_qbl_hap6 [3861407, 3861780] - | 233861 3119 [26] chr6_qbl_hap6 [3861499, 3861780] - | 233862 3119 [27] chr6_qbl_hap6 [3864670, 3864939] - | 233863 3119 [28] chr6_qbl_hap6 [3866378, 3866486] - | 233864 3119 Note that these two genes also map to the extra Chr6 haplotypes. > z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6", IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] Warning message: In .local(x, y, ...) : The behavior of distance() has changed in Bioconductor 2.12. See ?distance for details. > colnames(z) <- c(colnames(z)[1:5], "Distance") > z seqnames start width cds_id gene_id Distance 1 chr6 32605236 82 73476 3117 167620 2 chr6 32609087 249 73477 3117 163602 3 chr6 32609749 282 73478 3117 162907 4 chr6 32609749 416 73479 3117 162773 5 chr6 32610387 155 73480 3117 162396 6 chr6 32628013 14 78928 3119 144911 7 chr6 32628636 22 78929 3119 144280 8 chr6 32629124 111 78930 3119 143703 9 chr6 32629125 110 78931 3119 143703 10 chr6 32629744 282 78932 3119 142912 11 chr6 32629951 44 78933 3119 142943 12 chr6 32632575 244 78934 3119 140119 13 chr6 32632575 270 78935 3119 140093 14 chr6 32634276 109 78936 3119 138553 15 chr6_cox_hap2 4073284 14 227891 3119 NA 16 chr6_cox_hap2 4074344 185 227892 3119 NA 17 chr6_cox_hap2 4074418 111 227893 3119 NA 18 chr6_cox_hap2 4074953 374 227894 3119 NA 19 chr6_cox_hap2 4075045 282 227895 3119 NA 20 chr6_cox_hap2 4078216 270 227896 3119 NA 21 chr6_cox_hap2 4079924 109 227897 3119 NA 22 chr6_qbl_hap6 3859738 14 233858 3119 NA 23 chr6_qbl_hap6 3860798 185 233859 3119 NA 24 chr6_qbl_hap6 3860872 111 233860 3119 NA 25 chr6_qbl_hap6 3861407 374 233861 3119 NA 26 chr6_qbl_hap6 3861499 282 233862 3119 NA 27 chr6_qbl_hap6 3864670 270 233863 3119 NA 28 chr6_qbl_hap6 3866378 109 233864 3119 NA Best, Jim > > Thanks, > Gad > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hello, In the devel branch I've added a distance() method that takes a GRanges and TxDb for this exact purpose. An example is at the bottom of the ?locateVariants man page, I've pasted the main steps below. If you're using devel, you may want to give the new distance() a try. If in release then Jim's solution should work for you. Valerie ## start example from man page: ## Using the 'vcf' and 'txdb' from ?locateVariants man page: region <- IntergenicVariants(upstream=70000, downstream=70000) loc_int <- locateVariants(vcf, txdb, region) ## Each variant can have multiple flanking id's so first ## expand PRECEDEID and the corresponding variant ranges. p_ids <- unlist(loc_int$PRECEDEID, use.names=FALSE) exp_ranges <- rep(loc_int, elementLengths(loc_int$PRECEDEID)) ## Provide the variant GRanges as 'x' and the TxDb annotation as 'y' ## to distance(). The 'id' and 'type' arguments describe an id found ## in the TxDb. See ?'nearest-methods' for details on the method. p_dist <- distance(exp_ranges, txdb, id=p_ids, type="gene") head(p_dist) ## Expanded view of ranges, gene id and distance: exp_ranges$PRECEDE_DIST <- p_dist exp_ranges ## Collapsed view of ranges, gene id and distance: loc_int$PRECEDE_DIST <- relist(p_dist, loc_int$PRECEDEID) loc_int ## end example On 05/13/2013 08:48 AM, James W. MacDonald wrote: > Hi Gad, > > On 5/12/2013 11:41 PM, Gad Abraham wrote: >> Hi, >> >> I'm using VariantAnnotation, which is a very nice package, to annotate >> some >> SNPs, based on >> http://adairama.wordpress.com/2013/02/15/functionally-annotate- snps-and-indels-in-bioconductor >> >> . >> >> For SNPs annotated as intergenic and therefore falling between two genes >> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance >> between the SNPs and these two genes? >> >> I'm using: >> >> head(m, 3) >> RS Chr Pos >> 1 rs2187668 chr6 32713862 >> 2 rs9357152 chr6 32772938 >> 3 rs3099844 chr6 31556955 >> >> target<- with(m, >> GRanges(seqnames=Rle(Chr), >> ranges=IRanges(Pos, end=Pos, names=RS), >> strand=Rle(strand("*")) >> ) >> ) >> >> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene, >> AllVariants()) >> names(loc)<- NULL >> out<- as.data.frame(loc) >> >> head(out, 3) >> names seqnames start end LOCATION GENEID PRECEDEID >> FOLLOWID >> GeneSymbol PrecedeSymbol FollowSymbol >> 1 rs2187668 chr6 32713862 32713862 intron 3117<na> >> <na> HLA-DQA1<na> <na> >> 4 rs9357152 chr6 32772938 32772938 intergenic<na> 3119 >> 3117<na> HLA-DQB1 HLA-DQA1 >> 5 rs3099844 chr6 31556955 31556955 intergenic<na> 4277 >> 352961<na> MICB HCG26 >> >> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA- DQA1. > > There might be a far superior way to do this, but one way is to assume > that cds == genes and go from there: > > > gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene, > columns=c("cds_id","gene_id")) > > ind <- sapply(values(gns)$gene_id, function(x) any(x %in% > c("3117","3119"))) > > sum(ind) > [1] 28 > > gns[ind,] > GRanges with 28 ranges and 2 metadata columns: > seqnames ranges strand | cds_id gene_id > <rle> <iranges> <rle> | <integer> <characterlist> > [1] chr6 [32605236, 32605317] + | 73476 3117 > [2] chr6 [32609087, 32609335] + | 73477 3117 > [3] chr6 [32609749, 32610030] + | 73478 3117 > [4] chr6 [32609749, 32610164] + | 73479 3117 > [5] chr6 [32610387, 32610541] + | 73480 3117 > ... ... ... ... ... ... ... > [24] chr6_qbl_hap6 [3860872, 3860982] - | 233860 > 3119 > [25] chr6_qbl_hap6 [3861407, 3861780] - | 233861 > 3119 > [26] chr6_qbl_hap6 [3861499, 3861780] - | 233862 > 3119 > [27] chr6_qbl_hap6 [3864670, 3864939] - | 233863 > 3119 > [28] chr6_qbl_hap6 [3866378, 3866486] - | 233864 > 3119 > > Note that these two genes also map to the extra Chr6 haplotypes. > > > z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6", > IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] > Warning message: > In .local(x, y, ...) : > The behavior of distance() has changed in Bioconductor 2.12. See > ?distance for details. > > colnames(z) <- c(colnames(z)[1:5], "Distance") > > z > seqnames start width cds_id gene_id Distance > 1 chr6 32605236 82 73476 3117 167620 > 2 chr6 32609087 249 73477 3117 163602 > 3 chr6 32609749 282 73478 3117 162907 > 4 chr6 32609749 416 73479 3117 162773 > 5 chr6 32610387 155 73480 3117 162396 > 6 chr6 32628013 14 78928 3119 144911 > 7 chr6 32628636 22 78929 3119 144280 > 8 chr6 32629124 111 78930 3119 143703 > 9 chr6 32629125 110 78931 3119 143703 > 10 chr6 32629744 282 78932 3119 142912 > 11 chr6 32629951 44 78933 3119 142943 > 12 chr6 32632575 244 78934 3119 140119 > 13 chr6 32632575 270 78935 3119 140093 > 14 chr6 32634276 109 78936 3119 138553 > 15 chr6_cox_hap2 4073284 14 227891 3119 NA > 16 chr6_cox_hap2 4074344 185 227892 3119 NA > 17 chr6_cox_hap2 4074418 111 227893 3119 NA > 18 chr6_cox_hap2 4074953 374 227894 3119 NA > 19 chr6_cox_hap2 4075045 282 227895 3119 NA > 20 chr6_cox_hap2 4078216 270 227896 3119 NA > 21 chr6_cox_hap2 4079924 109 227897 3119 NA > 22 chr6_qbl_hap6 3859738 14 233858 3119 NA > 23 chr6_qbl_hap6 3860798 185 233859 3119 NA > 24 chr6_qbl_hap6 3860872 111 233860 3119 NA > 25 chr6_qbl_hap6 3861407 374 233861 3119 NA > 26 chr6_qbl_hap6 3861499 282 233862 3119 NA > 27 chr6_qbl_hap6 3864670 270 233863 3119 NA > 28 chr6_qbl_hap6 3866378 109 233864 3119 NA > > Best, > > Jim > > > > >> >> Thanks, >> Gad >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi, On Mon, May 13, 2013 at 8:48 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Gad, > > > On 5/12/2013 11:41 PM, Gad Abraham wrote: >> >> Hi, >> >> I'm using VariantAnnotation, which is a very nice package, to annotate >> some >> SNPs, based on >> >> http://adairama.wordpress.com/2013/02/15/functionally-annotate- snps-and-indels-in-bioconductor >> . >> >> For SNPs annotated as intergenic and therefore falling between two genes >> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance >> between the SNPs and these two genes? >> >> I'm using: >> >> head(m, 3) >> RS Chr Pos >> 1 rs2187668 chr6 32713862 >> 2 rs9357152 chr6 32772938 >> 3 rs3099844 chr6 31556955 >> >> target<- with(m, >> GRanges(seqnames=Rle(Chr), >> ranges=IRanges(Pos, end=Pos, names=RS), >> strand=Rle(strand("*")) >> ) >> ) >> >> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene, >> AllVariants()) >> names(loc)<- NULL >> out<- as.data.frame(loc) >> >> head(out, 3) >> names seqnames start end LOCATION GENEID PRECEDEID >> FOLLOWID >> GeneSymbol PrecedeSymbol FollowSymbol >> 1 rs2187668 chr6 32713862 32713862 intron 3117<na> >> <na> HLA-DQA1<na> <na> >> 4 rs9357152 chr6 32772938 32772938 intergenic<na> 3119 >> 3117<na> HLA-DQB1 HLA-DQA1 >> 5 rs3099844 chr6 31556955 31556955 intergenic<na> 4277 >> 352961<na> MICB HCG26 >> >> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA- DQA1. > > > There might be a far superior way to do this, but one way is to assume that > cds == genes and go from there: > >> gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene, >> columns=c("cds_id","gene_id")) >> ind <- sapply(values(gns)$gene_id, function(x) any(x %in% >> c("3117","3119"))) >> sum(ind) > [1] 28 >> gns[ind,] > GRanges with 28 ranges and 2 metadata columns: > seqnames ranges strand | cds_id > gene_id > <rle> <iranges> <rle> | <integer> <characterlist> > [1] chr6 [32605236, 32605317] + | 73476 > 3117 > [2] chr6 [32609087, 32609335] + | 73477 > 3117 > [3] chr6 [32609749, 32610030] + | 73478 > 3117 > [4] chr6 [32609749, 32610164] + | 73479 > 3117 > [5] chr6 [32610387, 32610541] + | 73480 > 3117 > ... ... ... ... ... ... > ... > [24] chr6_qbl_hap6 [3860872, 3860982] - | 233860 > 3119 > [25] chr6_qbl_hap6 [3861407, 3861780] - | 233861 > 3119 > [26] chr6_qbl_hap6 [3861499, 3861780] - | 233862 > 3119 > [27] chr6_qbl_hap6 [3864670, 3864939] - | 233863 > 3119 > [28] chr6_qbl_hap6 [3866378, 3866486] - | 233864 > 3119 > > Note that these two genes also map to the extra Chr6 haplotypes. > >> z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6", >> IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] > Warning message: > In .local(x, y, ...) : > The behavior of distance() has changed in Bioconductor 2.12. See ?distance > for details. >> colnames(z) <- c(colnames(z)[1:5], "Distance") Saving one line of code makes it twice as nice for half the price: R> z <- cbind(as.data.frame(gns[ind,]), Distance=distance(GRanges("chr6", IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] R> head(z) seqnames start width cds_id gene_id Distance 1| chr6 32605236 82 73476 3117 167620 2| chr6 32609087 249 73477 3117 163602 3| chr6 32609749 282 73478 3117 162907 4| chr6 32609749 416 73479 3117 162773 5| chr6 32610387 155 73480 3117 162396 6| chr6 32628013 14 78928 3119 144911 This time, it's free ... this time. -steve -- Steve Lianoglou Computational Biologist Department of Bioinformatics and Computational Biology Genentech
ADD REPLY
0
Entering edit mode
On 5/13/2013 12:23 PM, Steve Lianoglou wrote: > Hi, > > On Mon, May 13, 2013 at 8:48 AM, James W. MacDonald<jmacdon at="" uw.edu=""> wrote: >> Hi Gad, >> >> >> On 5/12/2013 11:41 PM, Gad Abraham wrote: >>> Hi, >>> >>> I'm using VariantAnnotation, which is a very nice package, to annotate >>> some >>> SNPs, based on >>> >>> http://adairama.wordpress.com/2013/02/15/functionally-annotate- snps-and-indels-in-bioconductor >>> . >>> >>> For SNPs annotated as intergenic and therefore falling between two genes >>> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance >>> between the SNPs and these two genes? >>> >>> I'm using: >>> >>> head(m, 3) >>> RS Chr Pos >>> 1 rs2187668 chr6 32713862 >>> 2 rs9357152 chr6 32772938 >>> 3 rs3099844 chr6 31556955 >>> >>> target<- with(m, >>> GRanges(seqnames=Rle(Chr), >>> ranges=IRanges(Pos, end=Pos, names=RS), >>> strand=Rle(strand("*")) >>> ) >>> ) >>> >>> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene, >>> AllVariants()) >>> names(loc)<- NULL >>> out<- as.data.frame(loc) >>> >>> head(out, 3) >>> names seqnames start end LOCATION GENEID PRECEDEID >>> FOLLOWID >>> GeneSymbol PrecedeSymbol FollowSymbol >>> 1 rs2187668 chr6 32713862 32713862 intron 3117<na> >>> <na> HLA-DQA1<na> <na> >>> 4 rs9357152 chr6 32772938 32772938 intergenic<na> 3119 >>> 3117<na> HLA-DQB1 HLA-DQA1 >>> 5 rs3099844 chr6 31556955 31556955 intergenic<na> 4277 >>> 352961<na> MICB HCG26 >>> >>> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA-DQA1. >> >> There might be a far superior way to do this, but one way is to assume that >> cds == genes and go from there: >> >>> gns<- cds(TxDb.Hsapiens.UCSC.hg19.knownGene, >>> columns=c("cds_id","gene_id")) >>> ind<- sapply(values(gns)$gene_id, function(x) any(x %in% >>> c("3117","3119"))) >>> sum(ind) >> [1] 28 >>> gns[ind,] >> GRanges with 28 ranges and 2 metadata columns: >> seqnames ranges strand | cds_id >> gene_id >> <rle> <iranges> <rle> |<integer> <characterlist> >> [1] chr6 [32605236, 32605317] + | 73476 >> 3117 >> [2] chr6 [32609087, 32609335] + | 73477 >> 3117 >> [3] chr6 [32609749, 32610030] + | 73478 >> 3117 >> [4] chr6 [32609749, 32610164] + | 73479 >> 3117 >> [5] chr6 [32610387, 32610541] + | 73480 >> 3117 >> ... ... ... ... ... ... >> ... >> [24] chr6_qbl_hap6 [3860872, 3860982] - | 233860 >> 3119 >> [25] chr6_qbl_hap6 [3861407, 3861780] - | 233861 >> 3119 >> [26] chr6_qbl_hap6 [3861499, 3861780] - | 233862 >> 3119 >> [27] chr6_qbl_hap6 [3864670, 3864939] - | 233863 >> 3119 >> [28] chr6_qbl_hap6 [3866378, 3866486] - | 233864 >> 3119 >> >> Note that these two genes also map to the extra Chr6 haplotypes. >> >>> z<- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6", >>> IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] >> Warning message: >> In .local(x, y, ...) : >> The behavior of distance() has changed in Bioconductor 2.12. See ?distance >> for details. >>> colnames(z)<- c(colnames(z)[1:5], "Distance") > Saving one line of code makes it twice as nice for half the price: > > R> z<- cbind(as.data.frame(gns[ind,]), > Distance=distance(GRanges("chr6", IRanges(start=32772938, > end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] > > R> head(z) > seqnames start width cds_id gene_id Distance > 1| chr6 32605236 82 73476 3117 167620 > 2| chr6 32609087 249 73477 3117 163602 > 3| chr6 32609749 282 73478 3117 162907 > 4| chr6 32609749 416 73479 3117 162773 > 5| chr6 32610387 155 73480 3117 162396 > 6| chr6 32628013 14 78928 3119 144911 > > This time, it's free ... this time. Ugh. And you know these corporate types really mean it when they say it's gonna cost you next time. I am preparing one arm and one leg for removal and shipment, to speed payment when required. ;-D Jim > > -steve > > -- > Steve Lianoglou > Computational Biologist > Department of Bioinformatics and Computational Biology > Genentech -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Thanks James, Valerie, and Steve. I was tripped up by the distance() function in BioC 2.11, which requires both arguments to have same length, but upgrading to 2.12 solved that. Cheers, Gad On 14 May 2013 02:23, Steve Lianoglou <lianoglou.steve@gene.com> wrote: > Hi, > > On Mon, May 13, 2013 at 8:48 AM, James W. MacDonald <jmacdon@uw.edu> > wrote: > > Hi Gad, > > > > > > On 5/12/2013 11:41 PM, Gad Abraham wrote: > >> > >> Hi, > >> > >> I'm using VariantAnnotation, which is a very nice package, to annotate > >> some > >> SNPs, based on > >> > >> > http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps- and-indels-in-bioconductor > >> . > >> > >> For SNPs annotated as intergenic and therefore falling between two genes > >> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance > >> between the SNPs and these two genes? > >> > >> I'm using: > >> > >> head(m, 3) > >> RS Chr Pos > >> 1 rs2187668 chr6 32713862 > >> 2 rs9357152 chr6 32772938 > >> 3 rs3099844 chr6 31556955 > >> > >> target<- with(m, > >> GRanges(seqnames=Rle(Chr), > >> ranges=IRanges(Pos, end=Pos, names=RS), > >> strand=Rle(strand("*")) > >> ) > >> ) > >> > >> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene, > >> AllVariants()) > >> names(loc)<- NULL > >> out<- as.data.frame(loc) > >> > >> head(out, 3) > >> names seqnames start end LOCATION GENEID PRECEDEID > >> FOLLOWID > >> GeneSymbol PrecedeSymbol FollowSymbol > >> 1 rs2187668 chr6 32713862 32713862 intron 3117<na> > >> <na> HLA-DQA1<na> <na> > >> 4 rs9357152 chr6 32772938 32772938 intergenic<na> 3119 > >> 3117<na> HLA-DQB1 HLA-DQA1 > >> 5 rs3099844 chr6 31556955 31556955 intergenic<na> 4277 > >> 352961<na> MICB HCG26 > >> > >> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA-DQA1. > > > > > > There might be a far superior way to do this, but one way is to assume > that > > cds == genes and go from there: > > > >> gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene, > >> columns=c("cds_id","gene_id")) > >> ind <- sapply(values(gns)$gene_id, function(x) any(x %in% > >> c("3117","3119"))) > >> sum(ind) > > [1] 28 > >> gns[ind,] > > GRanges with 28 ranges and 2 metadata columns: > > seqnames ranges strand | cds_id > > gene_id > > <rle> <iranges> <rle> | <integer> <characterlist> > > [1] chr6 [32605236, 32605317] + | 73476 > > 3117 > > [2] chr6 [32609087, 32609335] + | 73477 > > 3117 > > [3] chr6 [32609749, 32610030] + | 73478 > > 3117 > > [4] chr6 [32609749, 32610164] + | 73479 > > 3117 > > [5] chr6 [32610387, 32610541] + | 73480 > > 3117 > > ... ... ... ... ... ... > > ... > > [24] chr6_qbl_hap6 [3860872, 3860982] - | 233860 > > 3119 > > [25] chr6_qbl_hap6 [3861407, 3861780] - | 233861 > > 3119 > > [26] chr6_qbl_hap6 [3861499, 3861780] - | 233862 > > 3119 > > [27] chr6_qbl_hap6 [3864670, 3864939] - | 233863 > > 3119 > > [28] chr6_qbl_hap6 [3866378, 3866486] - | 233864 > > 3119 > > > > Note that these two genes also map to the extra Chr6 haplotypes. > > > >> z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6", > >> IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] > > Warning message: > > In .local(x, y, ...) : > > The behavior of distance() has changed in Bioconductor 2.12. See > ?distance > > for details. > >> colnames(z) <- c(colnames(z)[1:5], "Distance") > > Saving one line of code makes it twice as nice for half the price: > > R> z <- cbind(as.data.frame(gns[ind,]), > Distance=distance(GRanges("chr6", IRanges(start=32772938, > end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)] > > R> head(z) > seqnames start width cds_id gene_id Distance > 1| chr6 32605236 82 73476 3117 167620 > 2| chr6 32609087 249 73477 3117 163602 > 3| chr6 32609749 282 73478 3117 162907 > 4| chr6 32609749 416 73479 3117 162773 > 5| chr6 32610387 155 73480 3117 162396 > 6| chr6 32628013 14 78928 3119 144911 > > This time, it's free ... this time. > > -steve > > -- > Steve Lianoglou > Computational Biologist > Department of Bioinformatics and Computational Biology > Genentech > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6