Subsetting "sites only" VCF objects
1
0
Entering edit mode
@richard-pearson-5255
Last seen 9.6 years ago
Hi It is wonderful that we can create subsets of VariantAnnotation VCF objects using [ but I have found that this doesn't work for VCFs that are "sites only", i.e. have no information in geno(vcf): > geno(vcf) SimpleList of length 0 > passVcf <- vcf[values(fixed(vcf))[["FILTER"]] == "PASS", ] Error in colData(x)[j, , drop = FALSE] : selecting rows: subscript out of bounds In these cases I can create subsets, e.g. using: passVcf <- VCF( rowData = rowData(vcf)[values(fixed(vcf))[["FILTER"]] == "PASS"], colData = colData(vcf), exptData = exptData(vcf), fixed = values(fixed(vcf))[values(fixed(vcf))[["FILTER"]] == "PASS", -(1)], info = values(info(vcf))[values(fixed(vcf))[["FILTER"]] == "PASS", -(1)] ) But it would be great if I could also do this using [. Any chance this functionality could be included in a future version of VariantAnnotation? Thanks Richard > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] VEDA_0.0.1 ggplot2_0.9.1 VariantAnnotation_1.2.10 Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.12 IRanges_1.14.4 BiocGenerics_0.2.0 malariagen_0.0.1 loaded via a namespace (and not attached): [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 DBI_0.2-5 dichromat_1.2-4 digest_0.5.2 GenomicFeatures_1.8.2 [11] grid_2.15.0 labeling_0.1 lattice_0.20-6 MASS_7.3-20 Matrix_1.0-6 memoise_0.1 munsell_0.3 plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 [21] RCurl_1.91-1 reshape2_1.2.1 RSQLite_0.11.1 rtracklayer_1.16.3 scales_0.2.1 snpStats_1.6.0 splines_2.15.0 stats4_2.15.0 stringr_0.6.1 survival_2.36-14 [31] tools_2.15.0 XML_3.9-4 zlibbioc_1.2.0
VariantAnnotation VariantAnnotation VariantAnnotation VariantAnnotation • 1.4k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.2 years ago
United States
Hi Richard, Thanks for reporting this bug. The problem was that the VCF class could not be subset by row when the object had no columns. This has been fixed for both SummarizedExperiment and the VCF class. fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") vcf <- readVcf(fl, "hg19") > dim(vcf) [1] 5 3 ## remove 'geno' and 'colData' to give the object 0 columns geno(vcf) <- SimpleList() colData(vcf) <- DataFrame() > dim(vcf) [1] 5 0 > vcf[1:3, ] class: VCF dim: 3 0 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(6): NS DP ... DB H2 geno(0): rownames(3): rs6054257 20:17330 rs6040355 rowData values names(1): paramRangeID colnames: NULL colData names(0): Fixes have been made in the following versions, Release : GenomicRanges 1.8.13 VariantAnnotation 1.2.11 Devel : GenomicRanges 1.9.56 VariantAnnotation 1.3.24 Valerie On 08/24/2012 04:10 AM, Richard Pearson wrote: > Hi > > It is wonderful that we can create subsets of VariantAnnotation VCF > objects using [ but I have found that this doesn't work for VCFs that > are "sites only", i.e. have no information in geno(vcf): > > geno(vcf) > SimpleList of length 0 > > passVcf <- vcf[values(fixed(vcf))[["FILTER"]] == "PASS", ] > Error in colData(x)[j, , drop = FALSE] : > selecting rows: subscript out of bounds > > In these cases I can create subsets, e.g. using: > passVcf <- VCF( > rowData = rowData(vcf)[values(fixed(vcf))[["FILTER"]] == "PASS"], > colData = colData(vcf), > exptData = exptData(vcf), > fixed = values(fixed(vcf))[values(fixed(vcf))[["FILTER"]] == > "PASS", -(1)], > info = values(info(vcf))[values(fixed(vcf))[["FILTER"]] == "PASS", > -(1)] > ) > > But it would be great if I could also do this using [. Any chance this > functionality could be included in a future version of VariantAnnotation? > > Thanks > > Richard > > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] VEDA_0.0.1 ggplot2_0.9.1 VariantAnnotation_1.2.10 > Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.12 > IRanges_1.14.4 BiocGenerics_0.2.0 malariagen_0.0.1 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0 > bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 > DBI_0.2-5 dichromat_1.2-4 digest_0.5.2 > GenomicFeatures_1.8.2 > [11] grid_2.15.0 labeling_0.1 lattice_0.20-6 > MASS_7.3-20 Matrix_1.0-6 memoise_0.1 > munsell_0.3 plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 > [21] RCurl_1.91-1 reshape2_1.2.1 RSQLite_0.11.1 > rtracklayer_1.16.3 scales_0.2.1 snpStats_1.6.0 > splines_2.15.0 stats4_2.15.0 stringr_0.6.1 > survival_2.36-14 > [31] tools_2.15.0 XML_3.9-4 zlibbioc_1.2.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6