rsamtools scanBcf segfault
1
0
Entering edit mode
Rob Syme ▴ 20
@rob-syme-4776
Last seen 2.8 years ago
Canada
Hi all, I'm looking to read in a bcf file, but R keeps segfaulting out. I'd love advice or pointers if anyone has seen this before. Environment: Rsamtools version 1.4.2 R version 2.13.1 (2011-07-08) Code to reproduce is?gist.github.com/1109210 reproduced below: wget 08ae7fe9927/subset.vcf grep -v "^#" subset.vcf | cut -f 1 | sort -u > subset.dict bcftools view -D subset.dict -bS subset.vcf > subset.bcf bcftools index subset.bcf wget 3cdd942c083/segfault.R Rscript segfault.R ## END BASH I get: $ Rscript segfault.R ... *** caught segfault *** address (nil), cause 'memory not mapped' Traceback: 1: .Call(func, .extptr(file), space, tmpl) 2: doTryCatch(return(expr), name, parentenv, handler) 3: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 4: tryCatchList(expr, classes, parentenv, handlers) 5: tryCatch({ .Call(func, .extptr(file), space, tmpl)}, error = function(err) { stop("scanBcf: ", conditionMessage(err), "\n path: ", path(file), call. = FALSE)}) 6: .io_bcf(.scan_bcf, file, ..., param = param) 7: .local(file, ...) 8: scanBcf(bcf, param = param) 9: scanBcf(bcf, param = param) aborting ... Segmentation fault
• 1.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States
On 07/27/2011 05:17 AM, Rob Syme wrote: > Hi all, > I'm looking to read in a bcf file, but R keeps segfaulting out. I'd > love advice or pointers if anyone has seen this before. > Environment: > Rsamtools version 1.4.2 > R version 2.13.1 (2011-07-08) > > Code to reproduce is gist.github.com/1109210 reproduced below: > > wget 6608ae7fe9927/subset.vcf > grep -v "^#" subset.vcf | cut -f 1 | sort -u> subset.dict > bcftools view -D subset.dict -bS subset.vcf> subset.bcf > bcftools index subset.bcf > wget 813cdd942c083/segfault.R > Rscript segfault.R > ## END BASH > I get: Thanks Rob for the nice reproducible example. The problem is when you open the BcfFile; 'mode' should be 'rb' to indicate that you want to read a binary file. Or leave it unspecified and the heuristic will do the right thing. I'll try to get a fix in so that R doesn't seg fault. Martin > > $ Rscript segfault.R > ... > *** caught segfault *** > address (nil), cause 'memory not mapped' > > Traceback: > 1: .Call(func, .extptr(file), space, tmpl) > 2: doTryCatch(return(expr), name, parentenv, handler) > 3: tryCatchOne(expr, names, parentenv, handlers[[1L]]) > 4: tryCatchList(expr, classes, parentenv, handlers) > 5: tryCatch({ .Call(func, .extptr(file), space, tmpl)}, error = > function(err) { stop("scanBcf: ", conditionMessage(err), "\n path: > ", path(file), call. = FALSE)}) > 6: .io_bcf(.scan_bcf, file, ..., param = param) > 7: .local(file, ...) > 8: scanBcf(bcf, param = param) > 9: scanBcf(bcf, param = param) > aborting ... > Segmentation fault > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
> Thanks Rob for the nice reproducible example. The problem is when you open > the BcfFile; 'mode' should be 'rb' to indicate that you want to read a > binary file. Or leave it unspecified and the heuristic will do the right > thing. Thanks Martin, the manual at http://www.bioconductor.org/packages/2.8/bioc/manuals/Rsamtools/man/Rs amtools.pdf includes a minor error that threw me off. The "Usage" section is correct (and I should have read this more closely), but in the "Arguments" section, the mode argument is described as: "A character(1) vector; mode="rw" indicates a binary (BCF) ?le, mode="r" a text (VCF) ?le" when it should probably be "A character(1) vector; mode="rb" indicates a binary (BCF) ?le, mode="r" a text (VCF) ?le" I have another error - and I've had a closer read of the manual this time ;) I have generated a filtered vcf using freebayes, but the following commands throw an error: wget http://gist.github.com/raw/1111101/f8629dfa20b15893684c78a586f321 dbc65c6b6c/subset.vcf grep -v "^#" subset.vcf | cut -f 1 | sort -u > subset.dict bcftools view -bSD subset.dict subset.vcf > subset.bcf bcftools index subset.bcf wget http://gist.github.com/raw/1111101/5684d61541b39fb8c02e765d5f7891 d9d122244e/scanBcf_error.R Rscript scanBcf_error.R # This gives the error: Error: scanBcf: failed to find fmt encoded as '21057' path: <<redacted>>/subset.bcf Execution halted # Calling scanBcfHeader also gives an error: scanBcfHeader(bcf) Error in mapply(f, ..., SIMPLIFY = FALSE) : 'names' attribute [5] must be the same length as the vector [4] I can't see anything wrong with the VCF, am I missing something obvious again?
ADD REPLY
0
Entering edit mode
On 07/28/2011 12:08 AM, Rob Syme wrote: >> Thanks Rob for the nice reproducible example. The problem is when you open >> the BcfFile; 'mode' should be 'rb' to indicate that you want to read a >> binary file. Or leave it unspecified and the heuristic will do the right >> thing. Thanks that'll be corrected. > > Thanks Martin, the manual at > http://www.bioconductor.org/packages/2.8/bioc/manuals/Rsamtools/man/ Rsamtools.pdf > includes a minor error that threw me off. The "Usage" section is > correct (and I should have read this more closely), but in the > "Arguments" section, the mode argument is described as: > "A character(1) vector; mode="rw" indicates a binary (BCF) ?le, > mode="r" a text (VCF) ?le" > when it should probably be > "A character(1) vector; mode="rb" indicates a binary (BCF) ?le, > mode="r" a text (VCF) ?le" > > > I have another error - and I've had a closer read of the manual this time ;) > I have generated a filtered vcf using freebayes, but the following > commands throw an error: > > wget http://gist.github.com/raw/1111101/f8629dfa20b15893684c78a586f3 21dbc65c6b6c/subset.vcf > grep -v "^#" subset.vcf | cut -f 1 | sort -u> subset.dict > bcftools view -bSD subset.dict subset.vcf> subset.bcf > bcftools index subset.bcf > wget http://gist.github.com/raw/1111101/5684d61541b39fb8c02e765d5f78 91d9d122244e/scanBcf_error.R > Rscript scanBcf_error.R > > # This gives the error: > Error: scanBcf: failed to find fmt encoded as '21057' > path:<<redacted>>/subset.bcf > Execution halted The problem is that the current implementation isn't really general, it implements the formats supported (computed on, rather than just echoed) by bcftools -- PL, DP, GQ, SP, GT, GL -- whereas vcf is really much more flexible than that. This is cryptically documented on ?scanBcf, but I'd like to make this more flexible and will work on this. > # Calling scanBcfHeader also gives an error: > scanBcfHeader(bcf) > Error in mapply(f, ..., SIMPLIFY = FALSE) : > 'names' attribute [5] must be the same length as the vector [4] This has been addressed previously in the devel branch. Thanks Rob for the reports. Martin > > > I can't see anything wrong with the VCF, am I missing something obvious again? -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6