curious behavior from VariantAnnotation 1.5.36
1
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
I was reannotating some discriminating loci today and found this odd case -- a site is in the 3' UTR of two transcripts for a gene, and threeUTRVariants() finds it, but AllVariants() does not. Code to reproduce the result at an example locus: packageVersion('VariantAnnotation') ## [1] '1.5.36' packageVersion('Homo.sapiens') ## [1] '1.0.0' ## test case for bioc-list: library(Homo.sapiens) library(VariantAnnotation) EIF2C3.site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) EIF2C3.site %over% transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)[[EIF2C3]] ## TRUE allVars <- locateVariants(EIF2C3.site, TxDb.Hsapiens.UCSC.hg19.knownGene, AllVariants()) length(allVars) ## 0 threePrimeVars <- locateVariants(EIF2C3.site, TxDb.Hsapiens.UCSC.hg19.knownGene, ThreeUTRVariants()) length(threePrimeVars) ## 2 Why aren't the 3' UTR matches included in the AllVariants() results? That doesn't seem like the expected behavior, at least not to me. R> sessionInfo() R Under development (unstable) (2013-02-13 r61937) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices datasets utils methods [8] base other attached packages: [1] FDb.InfiniumMethylation.hg19_1.0.6 [2] BSgenome.Hsapiens.UCSC.hg19_1.3.19 [3] BSgenome_1.27.1 [4] Homo.sapiens_1.0.0 [5] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 [6] org.Hs.eg.db_2.8.0 [7] GO.db_2.8.0 [8] RSQLite_0.11.2 [9] DBI_0.2-5 [10] OrganismDbi_1.1.13 [11] VariantAnnotation_1.5.36 [12] Rsamtools_1.11.16 [13] Biostrings_2.27.11 [14] GenomicFeatures_1.11.8 [15] AnnotationDbi_1.21.10 [16] Biobase_2.19.2 [17] chromophobe_0.50 [18] pheatmap_0.7.4 [19] ggplot2_0.9.3 [20] reshape2_1.2.2 [21] GenomicRanges_1.11.29 [22] IRanges_1.17.31 [23] BiocGenerics_0.5.6 [24] BiocInstaller_1.9.6 [25] gtools_2.7.0 [26] devtools_1.1 loaded via a namespace (and not attached): [1] biomaRt_2.15.0 bitops_1.0-5 colorspace_1.2-1 dichromat_2.0-0 [5] digest_0.6.2 evaluate_0.4.3 graph_1.37.5 grid_3.0.0 [9] gtable_0.1.2 httr_0.2 labeling_0.1 lattice_0.20-13 [13] MASS_7.3-23 Matrix_1.0-10 memoise_0.1 munsell_0.4 [17] plyr_1.8 proto_0.3-10 RBGL_1.35.0 RColorBrewer_1.0-5 [21] RCurl_1.95-3 rtracklayer_1.19.9 scales_0.2.3 stats4_3.0.0 [25] stringr_0.6.2 tcltk_3.0.0 tools_3.0.0 whisker_0.1 [29] XML_3.95-0.1 zlibbioc_1.5.0 -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
GO BSgenome BSgenome GO BSgenome BSgenome • 836 views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
Somehow I omitted a line... this should go immediately after library(VariantAnnotation), i.e. library(Homo.sapiens) library(VariantAnnotation) ## this following line is the one that went missing EIF2C3 <- unlist(mget('EIF2C3', org.Hs.egSYMBOL2EG)) EIF2C3.site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) EIF2C3.site %over% transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)[[EIF2C3]] ## ...the rest is as before On Wed, Feb 13, 2013 at 5:09 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > I was reannotating some discriminating loci today and found this odd case > -- a site is in the 3' UTR of two transcripts for a gene, and > threeUTRVariants() finds it, but AllVariants() does not. > > Code to reproduce the result at an example locus: > > packageVersion('VariantAnnotation') > ## [1] '1.5.36' > packageVersion('Homo.sapiens') > ## [1] '1.0.0' > > ## test case for bioc-list: > library(Homo.sapiens) > library(VariantAnnotation) > EIF2C3.site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) > EIF2C3.site %over% > transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)[[EIF2C3]] > ## TRUE > > allVars <- locateVariants(EIF2C3.site, > TxDb.Hsapiens.UCSC.hg19.knownGene, > AllVariants()) > length(allVars) > ## 0 > > threePrimeVars <- locateVariants(EIF2C3.site, > TxDb.Hsapiens.UCSC.hg19.knownGene, > ThreeUTRVariants()) > length(threePrimeVars) > ## 2 > > Why aren't the 3' UTR matches included in the AllVariants() results? > That doesn't seem like the expected behavior, at least not to me. > > > > R> sessionInfo() > R Under development (unstable) (2013-02-13 r61937) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices datasets utils methods > [8] base > > other attached packages: > [1] FDb.InfiniumMethylation.hg19_1.0.6 > [2] BSgenome.Hsapiens.UCSC.hg19_1.3.19 > [3] BSgenome_1.27.1 > [4] Homo.sapiens_1.0.0 > [5] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 > [6] org.Hs.eg.db_2.8.0 > [7] GO.db_2.8.0 > [8] RSQLite_0.11.2 > [9] DBI_0.2-5 > [10] OrganismDbi_1.1.13 > [11] VariantAnnotation_1.5.36 > [12] Rsamtools_1.11.16 > [13] Biostrings_2.27.11 > [14] GenomicFeatures_1.11.8 > [15] AnnotationDbi_1.21.10 > [16] Biobase_2.19.2 > [17] chromophobe_0.50 > [18] pheatmap_0.7.4 > [19] ggplot2_0.9.3 > [20] reshape2_1.2.2 > [21] GenomicRanges_1.11.29 > [22] IRanges_1.17.31 > [23] BiocGenerics_0.5.6 > [24] BiocInstaller_1.9.6 > [25] gtools_2.7.0 > [26] devtools_1.1 > > loaded via a namespace (and not attached): > [1] biomaRt_2.15.0 bitops_1.0-5 colorspace_1.2-1 > dichromat_2.0-0 > [5] digest_0.6.2 evaluate_0.4.3 graph_1.37.5 grid_3.0.0 > > [9] gtable_0.1.2 httr_0.2 labeling_0.1 > lattice_0.20-13 > [13] MASS_7.3-23 Matrix_1.0-10 memoise_0.1 munsell_0.4 > > [17] plyr_1.8 proto_0.3-10 RBGL_1.35.0 > RColorBrewer_1.0-5 > [21] RCurl_1.95-3 rtracklayer_1.19.9 scales_0.2.3 stats4_3.0.0 > > [25] stringr_0.6.2 tcltk_3.0.0 tools_3.0.0 whisker_0.1 > > [29] XML_3.95-0.1 zlibbioc_1.5.0 > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Tim, Thanks for reporting this. There was a typo in the method called for 3UTR's in the AllVariants() code. Now fixed in 1.5.38 and 1.4.9. library(Homo.sapiens) library(VariantAnnotation) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) EIF2C3 <- unlist(mget('EIF2C3', org.Hs.egSYMBOL2EG)) all <- locateVariants(site, txdb, AllVariants()) three <- locateVariants(site, txdb, ThreeUTRVariants()) > identical(all, three) [1] TRUE Valerie On 02/13/2013 10:19 PM, Tim Triche, Jr. wrote: > Somehow I omitted a line... this should go immediately after > library(VariantAnnotation), i.e. > > > library(Homo.sapiens) > library(VariantAnnotation) > > ## this following line is the one that went missing > EIF2C3 <- unlist(mget('EIF2C3', org.Hs.egSYMBOL2EG)) > > EIF2C3.site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) > EIF2C3.site %over% > transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)[[EIF2C3]] > > ## ...the rest is as before > > > On Wed, Feb 13, 2013 at 5:09 PM, Tim Triche, Jr. <tim.triche at="" gmail.com="">wrote: > >> I was reannotating some discriminating loci today and found this odd case >> -- a site is in the 3' UTR of two transcripts for a gene, and >> threeUTRVariants() finds it, but AllVariants() does not. >> >> Code to reproduce the result at an example locus: >> >> packageVersion('VariantAnnotation') >> ## [1] '1.5.36' >> packageVersion('Homo.sapiens') >> ## [1] '1.0.0' >> >> ## test case for bioc-list: >> library(Homo.sapiens) >> library(VariantAnnotation) >> EIF2C3.site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) >> EIF2C3.site %over% >> transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)[[EIF2C3]] >> ## TRUE >> >> allVars <- locateVariants(EIF2C3.site, >> TxDb.Hsapiens.UCSC.hg19.knownGene, >> AllVariants()) >> length(allVars) >> ## 0 >> >> threePrimeVars <- locateVariants(EIF2C3.site, >> TxDb.Hsapiens.UCSC.hg19.knownGene, >> ThreeUTRVariants()) >> length(threePrimeVars) >> ## 2 >> >> Why aren't the 3' UTR matches included in the AllVariants() results? >> That doesn't seem like the expected behavior, at least not to me. >> >> >> >> R> sessionInfo() >> R Under development (unstable) (2013-02-13 r61937) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices datasets utils methods >> [8] base >> >> other attached packages: >> [1] FDb.InfiniumMethylation.hg19_1.0.6 >> [2] BSgenome.Hsapiens.UCSC.hg19_1.3.19 >> [3] BSgenome_1.27.1 >> [4] Homo.sapiens_1.0.0 >> [5] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 >> [6] org.Hs.eg.db_2.8.0 >> [7] GO.db_2.8.0 >> [8] RSQLite_0.11.2 >> [9] DBI_0.2-5 >> [10] OrganismDbi_1.1.13 >> [11] VariantAnnotation_1.5.36 >> [12] Rsamtools_1.11.16 >> [13] Biostrings_2.27.11 >> [14] GenomicFeatures_1.11.8 >> [15] AnnotationDbi_1.21.10 >> [16] Biobase_2.19.2 >> [17] chromophobe_0.50 >> [18] pheatmap_0.7.4 >> [19] ggplot2_0.9.3 >> [20] reshape2_1.2.2 >> [21] GenomicRanges_1.11.29 >> [22] IRanges_1.17.31 >> [23] BiocGenerics_0.5.6 >> [24] BiocInstaller_1.9.6 >> [25] gtools_2.7.0 >> [26] devtools_1.1 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.15.0 bitops_1.0-5 colorspace_1.2-1 >> dichromat_2.0-0 >> [5] digest_0.6.2 evaluate_0.4.3 graph_1.37.5 grid_3.0.0 >> >> [9] gtable_0.1.2 httr_0.2 labeling_0.1 >> lattice_0.20-13 >> [13] MASS_7.3-23 Matrix_1.0-10 memoise_0.1 munsell_0.4 >> >> [17] plyr_1.8 proto_0.3-10 RBGL_1.35.0 >> RColorBrewer_1.0-5 >> [21] RCurl_1.95-3 rtracklayer_1.19.9 scales_0.2.3 stats4_3.0.0 >> >> [25] stringr_0.6.2 tcltk_3.0.0 tools_3.0.0 whisker_0.1 >> >> [29] XML_3.95-0.1 zlibbioc_1.5.0 >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >> > > >
ADD REPLY
0
Entering edit mode
Hi Valerie, Thanks much for the fix! Best, --t On Thu, Feb 14, 2013 at 11:39 AM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > Hi Tim, > > Thanks for reporting this. There was a typo in the method called for > 3UTR's in the AllVariants() code. Now fixed in 1.5.38 and 1.4.9. > > library(Homo.sapiens) > library(VariantAnnotation) > library(TxDb.Hsapiens.UCSC.**hg19.knownGene) > txdb <- TxDb.Hsapiens.UCSC.hg19.**knownGene > > site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) > EIF2C3 <- unlist(mget('EIF2C3', org.Hs.egSYMBOL2EG)) > > all <- locateVariants(site, txdb, AllVariants()) > three <- locateVariants(site, txdb, ThreeUTRVariants()) > > > identical(all, three) > [1] TRUE > > > Valerie > > > > > > On 02/13/2013 10:19 PM, Tim Triche, Jr. wrote: > >> Somehow I omitted a line... this should go immediately after >> library(VariantAnnotation), i.e. >> >> >> library(Homo.sapiens) >> library(VariantAnnotation) >> >> ## this following line is the one that went missing >> EIF2C3 <- unlist(mget('EIF2C3', org.Hs.egSYMBOL2EG)) >> >> EIF2C3.site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) >> EIF2C3.site %over% >> transcriptsBy(TxDb.Hsapiens.**UCSC.hg19.knownGene)[[EIF2C3]] >> >> ## ...the rest is as before >> >> >> On Wed, Feb 13, 2013 at 5:09 PM, Tim Triche, Jr. <tim.triche@gmail.com>> >wrote: >> >> I was reannotating some discriminating loci today and found this odd case >>> -- a site is in the 3' UTR of two transcripts for a gene, and >>> threeUTRVariants() finds it, but AllVariants() does not. >>> >>> Code to reproduce the result at an example locus: >>> >>> packageVersion('**VariantAnnotation') >>> ## [1] '1.5.36' >>> packageVersion('Homo.sapiens') >>> ## [1] '1.0.0' >>> >>> ## test case for bioc-list: >>> library(Homo.sapiens) >>> library(VariantAnnotation) >>> EIF2C3.site <- GRanges(c('chr1'), IRanges(start=36521401, width=1)) >>> EIF2C3.site %over% >>> transcriptsBy(TxDb.Hsapiens.**UCSC.hg19.knownGene)[[EIF2C3]] >>> ## TRUE >>> >>> allVars <- locateVariants(EIF2C3.site, >>> TxDb.Hsapiens.UCSC.hg19.**knownGene, >>> AllVariants()) >>> length(allVars) >>> ## 0 >>> >>> threePrimeVars <- locateVariants(EIF2C3.site, >>> TxDb.Hsapiens.UCSC.hg19.**knownGene, >>> ThreeUTRVariants()) >>> length(threePrimeVars) >>> ## 2 >>> >>> Why aren't the 3' UTR matches included in the AllVariants() results? >>> That doesn't seem like the expected behavior, at least not to me. >>> >>> >>> >>> R> sessionInfo() >>> R Under development (unstable) (2013-02-13 r61937) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices datasets utils methods >>> [8] base >>> >>> other attached packages: >>> [1] FDb.InfiniumMethylation.hg19_**1.0.6 >>> [2] BSgenome.Hsapiens.UCSC.hg19_1.**3.19 >>> [3] BSgenome_1.27.1 >>> [4] Homo.sapiens_1.0.0 >>> [5] TxDb.Hsapiens.UCSC.hg19.**knownGene_2.8.0 >>> [6] org.Hs.eg.db_2.8.0 >>> [7] GO.db_2.8.0 >>> [8] RSQLite_0.11.2 >>> [9] DBI_0.2-5 >>> [10] OrganismDbi_1.1.13 >>> [11] VariantAnnotation_1.5.36 >>> [12] Rsamtools_1.11.16 >>> [13] Biostrings_2.27.11 >>> [14] GenomicFeatures_1.11.8 >>> [15] AnnotationDbi_1.21.10 >>> [16] Biobase_2.19.2 >>> [17] chromophobe_0.50 >>> [18] pheatmap_0.7.4 >>> [19] ggplot2_0.9.3 >>> [20] reshape2_1.2.2 >>> [21] GenomicRanges_1.11.29 >>> [22] IRanges_1.17.31 >>> [23] BiocGenerics_0.5.6 >>> [24] BiocInstaller_1.9.6 >>> [25] gtools_2.7.0 >>> [26] devtools_1.1 >>> >>> loaded via a namespace (and not attached): >>> [1] biomaRt_2.15.0 bitops_1.0-5 colorspace_1.2-1 >>> dichromat_2.0-0 >>> [5] digest_0.6.2 evaluate_0.4.3 graph_1.37.5 grid_3.0.0 >>> >>> [9] gtable_0.1.2 httr_0.2 labeling_0.1 >>> lattice_0.20-13 >>> [13] MASS_7.3-23 Matrix_1.0-10 memoise_0.1 munsell_0.4 >>> >>> [17] plyr_1.8 proto_0.3-10 RBGL_1.35.0 >>> RColorBrewer_1.0-5 >>> [21] RCurl_1.95-3 rtracklayer_1.19.9 scales_0.2.3 >>> stats4_3.0.0 >>> >>> [25] stringr_0.6.2 tcltk_3.0.0 tools_3.0.0 whisker_0.1 >>> >>> [29] XML_3.95-0.1 zlibbioc_1.5.0 >>> >>> >>> -- >>> *A model is a lie that helps you see the truth.* >>> * >>> * >>> Howard Skipper<http: cancerres.**aacrjournals.org="" content="" 31="" 9="" **="">>> 1173.full.pdf<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.="" full.pdf=""> >>> > >>> >>> >> >> >> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 422 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6