[VariantAnnotation] subsetting VCF objects
1
0
Entering edit mode
@paul-theodor-pyl-5014
Last seen 9.6 years ago
Hi all, I am reading in some .vcf files with the readVcf function and realized that I cannot subset the resulting VCF objects if the info field is empty, see example below. Is there a workaround except for loading the info at least partially? Thanks, Paul The Example: > vcf_full = readVcf("test.vcf.gz", "hg19") > vcf_no_info = readVcf("test.vcf.gz", "hg19", param = ScanVcfParam( geno=c("GT","GQ"), fixed="ALT", info=NA )) vcf_full class: VCF dim: 71128 2 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(22): AC AF ... SB STR geno(5): AD DP GQ GT PL rownames(71128): rs62224610 rs141578542 ... 22:51243743 22:51244332 rowData values names(1): paramRangeID colnames(2): sample_one sample_two colData names(1): Samples > vcf_no_info class: VCF dim: 71128 2 genome: hg19 exptData(1): header fixed(2): REF ALT info(0): geno(2): GQ GT rownames(71128): rs62224610 rs141578542 ... 22:51243743 22:51244332 rowData values names(1): paramRangeID colnames(2): sample_one sample_two colData names(1): Samples > vcf_full[1:10] class: VCF dim: 10 2 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(22): AC AF ... SB STR geno(5): AD DP GQ GT PL rownames(10): rs62224610 rs141578542 ... 22:16058463 rs149413786 rowData values names(1): paramRangeID colnames(2): sample_one sample_two colData names(1): Samples > vcf_no_info[1:10] Error in slot(x, "info")[i, , drop = FALSE] : selecting rows: subscript contains NAs or out of bounds indices > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.4.3 Rsamtools_1.10.2 Biostrings_2.26.2 [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.2 Biobase_2.18.0 biomaRt_2.14.0 [4] bitops_1.0-5 BSgenome_1.26.1 compiler_2.15.2 [7] DBI_0.2-5 GenomicFeatures_1.10.0 parallel_2.15.2 [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.0 [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 [16] zlibbioc_1.4.0
• 1.4k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi Paul, Thanks for the bug report. Now fixed in VariantAnnotation 1.5.14 in devel and 1.4.5 in release. These versions will be available Friday (11/16) 9am PST or immediately from svn. Valerie On 11/14/12 05:41, Paul Theodor Pyl wrote: > Hi all, > > I am reading in some .vcf files with the readVcf function and realized > that I cannot subset the resulting VCF objects if the info field is > empty, see example below. > > Is there a workaround except for loading the info at least partially? > > Thanks, > Paul > > The Example: > > vcf_full = readVcf("test.vcf.gz", "hg19") > > vcf_no_info = readVcf("test.vcf.gz", "hg19", param = ScanVcfParam( > geno=c("GT","GQ"), fixed="ALT", info=NA )) > vcf_full > class: VCF > dim: 71128 2 > genome: hg19 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(22): AC AF ... SB STR > geno(5): AD DP GQ GT PL > rownames(71128): rs62224610 rs141578542 ... 22:51243743 22:51244332 > rowData values names(1): paramRangeID > colnames(2): sample_one sample_two > colData names(1): Samples > > vcf_no_info > class: VCF > dim: 71128 2 > genome: hg19 > exptData(1): header > fixed(2): REF ALT > info(0): > geno(2): GQ GT > rownames(71128): rs62224610 rs141578542 ... 22:51243743 22:51244332 > rowData values names(1): paramRangeID > colnames(2): sample_one sample_two > colData names(1): Samples > > vcf_full[1:10] > class: VCF > dim: 10 2 > genome: hg19 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(22): AC AF ... SB STR > geno(5): AD DP GQ GT PL > rownames(10): rs62224610 rs141578542 ... 22:16058463 rs149413786 > rowData values names(1): paramRangeID > colnames(2): sample_one sample_two > colData names(1): Samples > > vcf_no_info[1:10] > Error in slot(x, "info")[i, , drop = FALSE] : > selecting rows: subscript contains NAs or out of bounds indices > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] VariantAnnotation_1.4.3 Rsamtools_1.10.2 Biostrings_2.26.2 > [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.2 Biobase_2.18.0 biomaRt_2.14.0 > [4] bitops_1.0-5 BSgenome_1.26.1 compiler_2.15.2 > [7] DBI_0.2-5 GenomicFeatures_1.10.0 parallel_2.15.2 > [10] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.0 > [13] stats4_2.15.2 tools_2.15.2 XML_3.95-0.1 > [16] zlibbioc_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6