Rsamtools failing to parse PEDIGREE header
1
0
Entering edit mode
@jonathan-ellis-7560
Last seen 9.0 years ago
Australia

Version 1.18.3 of Rsamtools introduced support for parsing a PEDIGREE header line from a BCF file in the function .bcfHeaderAsSimpleList; however, I'm no longer able to parse any of the VCFs I have with a PEDIGREE header.  My VCFs have headers such as

PEDIGREE=<Derived=ID1,Original=ID2>

If I use the function readVcf (from the VariantAnnotation package) I get the following error:

Error in FUN(X[[1L]], ...) : subscript out of bounds

traceback() reveals this error is originating from .bcfHeaderAsSimpleList.  Is this a bug in Rsamtools or am I misunderstanding how the PEDIGREE header should be used (I've taken the PEDIGREE line directly from the VCF file format specification).

Regards,

Jonathan

Rsamtools • 1.2k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States

Hi Jonathan,

Thanks for reporting the bug. The fix is in Rsamtools 1.19.51 and VariantAnnotation 1.13.48.

PEDIGREE was being treated like INFO and FORMAT where parsing depends on the presence of ID, Number, Type and Description, etc. PEDIGREE doesn't have these fields and should not have been included - not sure why I did that. It now gets properly parsed into meta(VCFHeader)$PEDIGREE.

Let me know if you run into other problems.

Valerie

ADD COMMENT
0
Entering edit mode
Hi Valerie, Thanks for your reply. I installed 1.19.51 and it does parse the PEDIGREE header as you said; however, out of interest, I also added two SAMPLE header lines (as these also seem to have been added recently to .bcfHeaderAsSimpleList). The two lines I added were: ##SAMPLE=<id=lc1_blood,genomes=germline,mixture=1.,description="patient germline="" genome"=""> ##SAMPLE=<id=lc1_tumour_a,genomes=germline;tumour,mixture=.3;.7,description="patient germline="" genome;patient="" tumour="" genome"=""> These are taken directly from the VCF file format specification. Rsamtools does not appear to be parsing these headers correctly. If I run: x <- meta(header(readVcf("test.vcf", "hg19"))) x$SAMPLE I get: DataFrame with 2 rows and 1 column Description <character> Blood,Genomes Patient germline genome TissueSample,Genomes Patient germline genome;Patient tumour genome whereas I would expect a DataFrame with 3 columns. Regards, Jonathan On Wed, 2015-04-08 at 03:14 +0000, Valerie Obenchain [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > > User Valerie Obenchain wrote Answer: Rsamtools failing to parse > PEDIGREE header: > > Hi Jonathan, > > Thanks for reporting the bug. The fix is in Rsamtools 1.19.51 and > VariantAnnotation 1.13.48. > > PEDIGREE was being treated like INFO and FORMAT where parsing depends > on the presence of ID, Number, Type and Description, etc. PEDIGREE > doesn't have these fields and should not have been included - not sure > why I did that. It now gets properly parsed into > meta(VCFHeader)$PEDIGREE. > > Let me know if you run into other problems. > > Valerie > > > ______________________________________________________________________ > > You may reply via email or visit > A: Rsamtools failing to parse PEDIGREE header >
ADD REPLY
0
Entering edit mode

I think there are a couple of problems here. SAMPLE was not parsed correctly and it looks like your SAMPLE lines are not valid. As per the specs, the value of 'description' should have enclosing quotes with a semicolon separating the 2 values. Maybe the quotes got mangled or this was a cut and paste error? If not, please show me where they came from.

I took these sample lines

##SAMPLE=<ID=Blood,Genomes=Germline,Mixture=1.,Description="Patient germline genome">
##SAMPLE=<ID=TissueSample,Genomes=Germline;Tumor,Mixture=.3;.7,Description="Patient germline genome;Patient tumor genome">

from page 18 of the 4.2 specs, http://samtools.github.io/hts-specs/VCFv4.2.pdf, and added them to the extdata/ex2.vcf sample file in VariantAnnotation.

With Rsamtools 1.19.52  and VariantAnnotation 1.13.48:

> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> hdr <- scanVcfHeader(fl)
> meta(hdr)$SAMPLE
DataFrame with 2 rows and 3 columns
                    Genomes     Mixture
                <character> <character>
Blood              Germline          1.
TissueSample Germline;Tumor       .3;.7
                                              Description
                                              <character>
Blood                             Patient germline genome
TissueSample Patient germline genome;Patient tumor genome

Valerie

ADD REPLY
0
Entering edit mode

Yes, that appears to be a cut and paste error: my actual VCF does contain correctly formatted lines. Sorry about that. Thanks for the fixes.

Jonathan

ADD REPLY

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6