rsamtools scanBcf segfault
0
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi Rob, I wanted to point out some new functionality in Rsamtools (in devel) for reading vcf files, scanVcfHeader scanVcf unpackVcf and one function in the VariantAnnotation package (also in devel), readVcf scanVcf returns a list of the 9 elements in a VCF file and unpackVcf further parses the INFO and GENO fields. readVcf() in VariantAnnotation puts the data into a SummarizedExperiment object. Feedback on these functions is welcome. Valerie On 07/28/11 06:36, Martin Morgan wrote: > On 07/28/2011 12:08 AM, Rob Syme wrote: >>> Thanks Rob for the nice reproducible example. The problem is when >>> you open >>> the BcfFile; 'mode' should be 'rb' to indicate that you want to read a >>> binary file. Or leave it unspecified and the heuristic will do the >>> right >>> thing. > > Thanks that'll be corrected. > >> >> Thanks Martin, the manual at >> http://www.bioconductor.org/packages/2.8/bioc/manuals/Rsamtools/man /Rsamtools.pdf >> >> includes a minor error that threw me off. The "Usage" section is >> correct (and I should have read this more closely), but in the >> "Arguments" section, the mode argument is described as: >> "A character(1) vector; mode="rw" indicates a binary (BCF) ?le, >> mode="r" a text (VCF) ?le" >> when it should probably be >> "A character(1) vector; mode="rb" indicates a binary (BCF) ?le, >> mode="r" a text (VCF) ?le" >> >> >> I have another error - and I've had a closer read of the manual this >> time ;) >> I have generated a filtered vcf using freebayes, but the following >> commands throw an error: >> >> wget >> http://gist.github.com/raw/1111101/f8629dfa20b15893684c78a586f321db c65c6b6c/subset.vcf >> grep -v "^#" subset.vcf | cut -f 1 | sort -u> subset.dict >> bcftools view -bSD subset.dict subset.vcf> subset.bcf >> bcftools index subset.bcf >> wget >> http://gist.github.com/raw/1111101/5684d61541b39fb8c02e765d5f7891d9 d122244e/scanBcf_error.R >> Rscript scanBcf_error.R >> >> # This gives the error: >> Error: scanBcf: failed to find fmt encoded as '21057' >> path:<<redacted>>/subset.bcf >> Execution halted > > The problem is that the current implementation isn't really general, > it implements the formats supported (computed on, rather than just > echoed) by bcftools -- PL, DP, GQ, SP, GT, GL -- whereas vcf is really > much more flexible than that. This is cryptically documented on > ?scanBcf, but I'd like to make this more flexible and will work on this. > >> # Calling scanBcfHeader also gives an error: >> scanBcfHeader(bcf) >> Error in mapply(f, ..., SIMPLIFY = FALSE) : >> 'names' attribute [5] must be the same length as the vector [4] > > This has been addressed previously in the devel branch. > > Thanks Rob for the reports. > > Martin > >> >> >> I can't see anything wrong with the VCF, am I missing something >> obvious again? > >
VariantAnnotation Rsamtools VariantAnnotation VariantAnnotation Rsamtools VariantAnnotation • 947 views
ADD COMMENT

Login before adding your answer.

Traffic: 691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6