VariantAnnotation - Accessing specific genomic ranges without needing a tabix file
1
0
Entering edit mode
@fong-chun-chan-5706
Last seen 9.6 years ago
Hi, I have question regarding accessing particular ranges of a VCF file that I've loaded into R. The VCF file that I've load is relatively small (~8MB). And I would like to take subsets of the vcf file that falls into a particular genomic range. Reading the tutorial shows that how I may use the "params" parameter in the readVcf() function for this. But this requires me to generate a tabix file for the vcf. I was under the impression that this is useful for big vcf situations. But given the fact that I've been able to load the entire VCF file into memory already, I don't see an easy way to just generate a subset of the vcf file given a set of genomic ranges. Is there no way to do this without having to first generate a tabix file? Thanks, R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.8.10 Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 reshape2_1.2.2 ggplot2_0.9.3.1 dplyr_0.1 mclust_4.2 stringr_0.6.2 plyr_1.8 xtable_1.7-1 [15] fields_6.9.1 maps_2.3-6 spam_0.40-0 knitr_1.5 argparse_0.5.3 proto_0.3-10 vimcom.plus_0.9-92 setwidth_1.0-3 colorout_0.9-9 loaded via a namespace (and not attached): [1] AnnotationDbi_1.24.0 assertthat_0.1 Biobase_2.22.0 biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.4 evaluate_0.5.1 formatR_0.10 GenomicFeatures_1.14.2 getopt_1.20.0 gtable_0.1.2 [16] labeling_0.2 MASS_7.3-29 munsell_0.4.2 RColorBrewer_1.0-5 Rcpp_0.10.6 RCurl_1.95-4.1 rjson_0.2.13 RSQLite_0.11.4 rtracklayer_1.22.3 scales_0.2.3 stats4_3.0.2 tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 [[alternative HTML version deleted]]
• 1.4k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
subsetByOverlaps(vcf, myRanges) On Wed, Feb 5, 2014 at 2:22 PM, Fong Chun Chan <fongchun@alumni.ubc.ca>wrote: > Hi, > > I have question regarding accessing particular ranges of a VCF file that > I've loaded into R. > > The VCF file that I've load is relatively small (~8MB). And I would like to > take subsets of the vcf file that falls into a particular genomic range. > Reading the tutorial shows that how I may use the "params" parameter in the > readVcf() function for this. But this requires me to generate a tabix file > for the vcf. I was under the impression that this is useful for big vcf > situations. > > But given the fact that I've been able to load the entire VCF file into > memory already, I don't see an easy way to just generate a subset of the > vcf file given a set of genomic ranges. Is there no way to do this without > having to first generate a tabix file? > > Thanks, > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 > LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel grid stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] VariantAnnotation_1.8.10 Rsamtools_1.14.2 Biostrings_2.30.1 > GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.6 > BiocGenerics_0.8.0 reshape2_1.2.2 ggplot2_0.9.3.1 > dplyr_0.1 mclust_4.2 stringr_0.6.2 > plyr_1.8 xtable_1.7-1 > [15] fields_6.9.1 maps_2.3-6 spam_0.40-0 > knitr_1.5 argparse_0.5.3 proto_0.3-10 > vimcom.plus_0.9-92 setwidth_1.0-3 colorout_0.9-9 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.24.0 assertthat_0.1 Biobase_2.22.0 > biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 > colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 > digest_0.6.4 evaluate_0.5.1 formatR_0.10 > GenomicFeatures_1.14.2 getopt_1.20.0 gtable_0.1.2 > [16] labeling_0.2 MASS_7.3-29 munsell_0.4.2 > RColorBrewer_1.0-5 Rcpp_0.10.6 RCurl_1.95-4.1 > rjson_0.2.13 RSQLite_0.11.4 rtracklayer_1.22.3 > scales_0.2.3 stats4_3.0.2 tools_3.0.2 > XML_3.98-1.1 zlibbioc_1.8.0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Perfect thanks! On Wed, Feb 5, 2014 at 2:46 PM, Michael Lawrence <lawrence.michael@gene.com>wrote: > subsetByOverlaps(vcf, myRanges) > > > > On Wed, Feb 5, 2014 at 2:22 PM, Fong Chun Chan <fongchun@alumni.ubc.ca>wrote: > >> Hi, >> >> I have question regarding accessing particular ranges of a VCF file that >> I've loaded into R. >> >> The VCF file that I've load is relatively small (~8MB). And I would like >> to >> take subsets of the vcf file that falls into a particular genomic range. >> Reading the tutorial shows that how I may use the "params" parameter in >> the >> readVcf() function for this. But this requires me to generate a tabix file >> for the vcf. I was under the impression that this is useful for big vcf >> situations. >> >> But given the fact that I've been able to load the entire VCF file into >> memory already, I don't see an easy way to just generate a subset of the >> vcf file given a set of genomic ranges. Is there no way to do this without >> having to first generate a tabix file? >> >> Thanks, >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 >> LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C >> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel grid stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] VariantAnnotation_1.8.10 Rsamtools_1.14.2 Biostrings_2.30.1 >> GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.6 >> BiocGenerics_0.8.0 reshape2_1.2.2 ggplot2_0.9.3.1 >> dplyr_0.1 mclust_4.2 stringr_0.6.2 >> plyr_1.8 xtable_1.7-1 >> [15] fields_6.9.1 maps_2.3-6 spam_0.40-0 >> knitr_1.5 argparse_0.5.3 proto_0.3-10 >> vimcom.plus_0.9-92 setwidth_1.0-3 colorout_0.9-9 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.24.0 assertthat_0.1 Biobase_2.22.0 >> biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 >> colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 >> digest_0.6.4 evaluate_0.5.1 formatR_0.10 >> GenomicFeatures_1.14.2 getopt_1.20.0 gtable_0.1.2 >> [16] labeling_0.2 MASS_7.3-29 munsell_0.4.2 >> RColorBrewer_1.0-5 Rcpp_0.10.6 RCurl_1.95-4.1 >> rjson_0.2.13 RSQLite_0.11.4 rtracklayer_1.22.3 >> scales_0.2.3 stats4_3.0.2 tools_3.0.2 >> XML_3.98-1.1 zlibbioc_1.8.0 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6