writeVcf bug
1
0
Entering edit mode
@richard-pearson-5255
Last seen 9.6 years ago
Hi I've noticed a bug with writeVcf when trying to write out relatively simple vcf files, e.g. a file containing only GT in the FORMAT fields. The following is a (hopefully) reproducible example based on examples in ?writeVcf library(VariantAnnotation) fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") out1.vcf <- tempfile() out2.vcf <- tempfile() in1 <- readVcf(fl, "hg19") writeVcf(in1, out1.vcf) in2 <- readVcf(fl, "hg19", ScanVcfParam(geno="GT")) writeVcf(in2, out2.vcf) This last line gives me: Error in row(genoMat) : a matrix-like object is required as argument to 'row' Looking at the following code snippet from .makeVcfGeno, it seems genoMat is first getting created as a matrix, but then changed to something that is not a matrix by "genoMat <- genoMatFlat". The error then comes from the "row(genoMat)" on the last line shown. I don't really understand what is going on in this code so can't offer a patch I'm afraid. genoMat <- matrix(unlist(as.list(geno), use.names = FALSE, recursive = FALSE), nsub * nrec, length(geno)) genoMatFlat <- as.character(unlist(genoMat)) genoMatFlat[is.na(genoMatFlat)] <- "." if (is.list(genoMat)) { genoMatList <- relist(genoMatFlat, PartitioningByEnd(genoMat)) genoMatFlat <- .pasteCollapse(genoMatList, ",") genoMat <- matrix(genoMatFlat, nrow(genoMat), ncol(genoMat)) } else genoMat <- genoMatFlat formatMatPerSub <- matrix(rep(t(formatMat), nsub), nsub * nrec, length(geno), byrow = TRUE) keep <- !is.na(formatMatPerSub) genoListBySub <- seqsplit(genoMat[keep], row(genoMat)[keep]) FWIW, as a temporary fix I made an extra, slightly more complex FORMAT field using the following: geno(in2)[["DUMMY"]] <- matrix(sapply(as.vector(geno(in2)[["GT"]]), function(x) list(c(x, x))), ncol=ncol(geno(in2)[["GT"]]), dimnames=dimnames(geno(in2)[["GT"]])) writeVcf(in2, out2.vcf) I've checked and it seems this bug is also present in latest devel. Any chance of a fix? Thanks Richard > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 devtools_0.8 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 digest_0.6.0 evaluate_0.4.2 GenomicFeatures_1.10.1 httr_0.2 [11] memoise_0.1 parallel_2.15.1 plyr_1.7.1 RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.1 stringr_0.6.1 tools_2.15.1 whisker_0.1 [21] XML_3.95-0.1 zlibbioc_1.4.0
• 1.0k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
I am pretty sure this bug has been fixed in devel. Would you mind pasting your devel session info? Thanks, Michael On Thu, Nov 29, 2012 at 8:44 AM, Richard Pearson <rpearson@well.ox.ac.uk>wrote: > Hi > > I've noticed a bug with writeVcf when trying to write out relatively > simple vcf files, e.g. a file containing only GT in the FORMAT fields. The > following is a (hopefully) reproducible example based on examples in > ?writeVcf > > library(VariantAnnotation) > fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") > out1.vcf <- tempfile() > out2.vcf <- tempfile() > in1 <- readVcf(fl, "hg19") > writeVcf(in1, out1.vcf) > in2 <- readVcf(fl, "hg19", ScanVcfParam(geno="GT")) > writeVcf(in2, out2.vcf) > > This last line gives me: > > Error in row(genoMat) : > a matrix-like object is required as argument to 'row' > > Looking at the following code snippet from .makeVcfGeno, it seems genoMat > is first getting created as a matrix, but then changed to something that is > not a matrix by "genoMat <- genoMatFlat". The error then comes from the > "row(genoMat)" on the last line shown. I don't really understand what is > going on in this code so can't offer a patch I'm afraid. > > genoMat <- matrix(unlist(as.list(geno), use.names = FALSE, > recursive = FALSE), nsub * nrec, length(geno)) > genoMatFlat <- as.character(unlist(genoMat)) > genoMatFlat[is.na(genoMatFlat)**] <- "." > if (is.list(genoMat)) { > genoMatList <- relist(genoMatFlat, PartitioningByEnd(genoMat)) > genoMatFlat <- .pasteCollapse(genoMatList, ",") > genoMat <- matrix(genoMatFlat, nrow(genoMat), ncol(genoMat)) > } > else genoMat <- genoMatFlat > formatMatPerSub <- matrix(rep(t(formatMat), nsub), nsub * > nrec, length(geno), byrow = TRUE) > keep <- !is.na(formatMatPerSub) > genoListBySub <- seqsplit(genoMat[keep], row(genoMat)[keep]) > > FWIW, as a temporary fix I made an extra, slightly more complex FORMAT > field using the following: > > geno(in2)[["DUMMY"]] <- matrix(sapply(as.vector(geno(**in2)[["GT"]]), > function(x) list(c(x, x))), ncol=ncol(geno(in2)[["GT"]]), > dimnames=dimnames(geno(in2)[["**GT"]])) > writeVcf(in2, out2.vcf) > > I've checked and it seems this bug is also present in latest devel. Any > chance of a fix? > > Thanks > > Richard > > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 > LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 > GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > devtools_0.8 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 digest_0.6.0 > evaluate_0.4.2 GenomicFeatures_1.10.1 httr_0.2 > [11] memoise_0.1 parallel_2.15.1 plyr_1.7.1 > RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.1 > stringr_0.6.1 tools_2.15.1 whisker_0.1 > [21] XML_3.95-0.1 zlibbioc_1.4.0 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Michael My devel sessionInfo is below. It seems biocLite has given me version 1.5.17of the package, but I see that .makeVcfGenolooks different in the latest from SVN (1.5.19) - so I'm sure you're right and I just need to wait for things to filter through. Apologies for the noise. Every time I upgrade VariantAnnotation it is an even better package than before - many thanks for all the hard work! Richard > sessionInfo() R Under development (unstable) (2012-11-27 r61172) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.5.17 Rsamtools_1.11.11 Biostrings_2.27.7 GenomicRanges_1.11.15 IRanges_1.17.15 BiocGenerics_0.5.1 devtools_0.8 BiocInstaller_1.9.4 loaded via a namespace (and not attached): [1] AnnotationDbi_1.21.7 Biobase_2.19.1 biomaRt_2.15.0 bitops_1.0-5 BSgenome_1.27.1 DBI_0.2-5 digest_0.6.0 evaluate_0.4.2 GenomicFeatures_1.11.5 httr_0.2 [11] memoise_0.1 plyr_1.7.1 RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.19.6 stats4_2.16.0 stringr_0.6.1 tools_2.16.0 whisker_0.1 XML_3.95-0.1 [21] zlibbioc_1.5.0 On 29/11/2012 17:22, Michael Lawrence wrote: > I am pretty sure this bug has been fixed in devel. Would you mind > pasting your devel session info? > > Thanks, > Michael > > > On Thu, Nov 29, 2012 at 8:44 AM, Richard Pearson > <rpearson@well.ox.ac.uk <mailto:rpearson@well.ox.ac.uk="">> wrote: > > Hi > > I've noticed a bug with writeVcf when trying to write out > relatively simple vcf files, e.g. a file containing only GT in the > FORMAT fields. The following is a (hopefully) reproducible example > based on examples in ?writeVcf > > library(VariantAnnotation) > fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") > out1.vcf <- tempfile() > out2.vcf <- tempfile() > in1 <- readVcf(fl, "hg19") > writeVcf(in1, out1.vcf) > in2 <- readVcf(fl, "hg19", ScanVcfParam(geno="GT")) > writeVcf(in2, out2.vcf) > > This last line gives me: > > Error in row(genoMat) : > a matrix-like object is required as argument to 'row' > > Looking at the following code snippet from .makeVcfGeno, it seems > genoMat is first getting created as a matrix, but then changed to > something that is not a matrix by "genoMat <- genoMatFlat". The > error then comes from the "row(genoMat)" on the last line shown. I > don't really understand what is going on in this code so can't > offer a patch I'm afraid. > > genoMat <- matrix(unlist(as.list(geno), use.names = FALSE, > recursive = FALSE), nsub * nrec, length(geno)) > genoMatFlat <- as.character(unlist(genoMat)) > genoMatFlat[is.na <http: is.na="">(genoMatFlat)] <- "." > if (is.list(genoMat)) { > genoMatList <- relist(genoMatFlat, PartitioningByEnd(genoMat)) > genoMatFlat <- .pasteCollapse(genoMatList, ",") > genoMat <- matrix(genoMatFlat, nrow(genoMat), ncol(genoMat)) > } > else genoMat <- genoMatFlat > formatMatPerSub <- matrix(rep(t(formatMat), nsub), nsub * > nrec, length(geno), byrow = TRUE) > keep <- !is.na <http: is.na="">(formatMatPerSub) > genoListBySub <- seqsplit(genoMat[keep], row(genoMat)[keep]) > > FWIW, as a temporary fix I made an extra, slightly more complex > FORMAT field using the following: > > geno(in2)[["DUMMY"]] <- > matrix(sapply(as.vector(geno(in2)[["GT"]]), function(x) list(c(x, > x))), ncol=ncol(geno(in2)[["GT"]]), > dimnames=dimnames(geno(in2)[["GT"]])) > writeVcf(in2, out2.vcf) > > I've checked and it seems this bug is also present in latest > devel. Any chance of a fix? > > Thanks > > Richard > > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 > LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 > LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 > GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > devtools_0.8 BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 digest_0.6.0 > evaluate_0.4.2 GenomicFeatures_1.10.1 httr_0.2 > [11] memoise_0.1 parallel_2.15.1 plyr_1.7.1 > RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.18.1 > stats4_2.15.1 stringr_0.6.1 tools_2.15.1 whisker_0.1 > [21] XML_3.95-0.1 zlibbioc_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6