Feature request in readVcf
2
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
Hi, Val. Is there in interest in simply ignoring unknown INFO and GENOTYPE fields when parsing VCF files, perhaps by issuing a warning instead of an error? There are LOTS of malformed VCF files out there. In some cases, they are not useable, but in this case, they can be perfectly useable if these unknown fields are simply ignored. > dat = readVcf('tmp.gatk.vcf',genome='hg19') Error: scanVcf: record 22 INFO 'KGPilot123' not found path: /Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vcf Thanks, Sean [[alternative HTML version deleted]]
• 682 views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.7 years ago
United States
+1 thanks, --t On Fri, Sep 21, 2012 at 10:51 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > Hi, Val. > > Is there in interest in simply ignoring unknown INFO and GENOTYPE fields > when parsing VCF files, perhaps by issuing a warning instead of an error? > There are LOTS of malformed VCF files out there. In some cases, they are > not useable, but in this case, they can be perfectly useable if these > unknown fields are simply ignored. > > > dat = readVcf('tmp.gatk.vcf',genome='hg19') > Error: scanVcf: record 22 INFO 'KGPilot123' not found > path: > /Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vcf > > Thanks, > Sean > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
Just in case you don't know, it is possible to work around this using the "info" slot of ScanVcfParam. But yes having a simple option to ignore unknown INFO and FORMAT fields would be convenient. The same should also apply to FILTER and ALT (in the case of structural variants). Michael On Fri, Sep 21, 2012 at 10:51 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > Hi, Val. > > Is there in interest in simply ignoring unknown INFO and GENOTYPE fields > when parsing VCF files, perhaps by issuing a warning instead of an error? > There are LOTS of malformed VCF files out there. In some cases, they are > not useable, but in this case, they can be perfectly useable if these > unknown fields are simply ignored. > > > dat = readVcf('tmp.gatk.vcf',genome='hg19') > Error: scanVcf: record 22 INFO 'KGPilot123' not found > path: > /Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vcf > > Thanks, > Sean > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 09/22/2012 04:54 AM, Michael Lawrence wrote: > Just in case you don't know, it is possible to work around this using the > "info" slot of ScanVcfParam. But yes having a simple option to ignore > unknown INFO and FORMAT fields would be convenient. The same should also > apply to FILTER and ALT (in the case of structural variants). unknown INFO and FORMAT fields now (v. 1.3.30) generate a warning and don't get parsed. Seems like FILTER and ALT don't currently get checked; are you looking for a warning on invalid value? Martin > > Michael > > On Fri, Sep 21, 2012 at 10:51 AM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > >> Hi, Val. >> >> Is there in interest in simply ignoring unknown INFO and GENOTYPE fields >> when parsing VCF files, perhaps by issuing a warning instead of an error? >> There are LOTS of malformed VCF files out there. In some cases, they are >> not useable, but in this case, they can be perfectly useable if these >> unknown fields are simply ignored. >> >>> dat = readVcf('tmp.gatk.vcf',genome='hg19') >> Error: scanVcf: record 22 INFO 'KGPilot123' not found >> path: >> /Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vcf >> >> Thanks, >> Sean >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
On Sat, Sep 22, 2012 at 9:16 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 09/22/2012 04:54 AM, Michael Lawrence wrote: > >> Just in case you don't know, it is possible to work around this using the >> "info" slot of ScanVcfParam. But yes having a simple option to ignore >> unknown INFO and FORMAT fields would be convenient. The same should also >> apply to FILTER and ALT (in the case of structural variants). >> > > unknown INFO and FORMAT fields now (v. 1.3.30) generate a warning and > don't get parsed. Seems like FILTER and ALT don't currently get checked; > are you looking for a warning on invalid value? > > Sure, for the sake of consistency with the INFO/FORMAT fields. Michael > Martin > > > >> Michael >> >> On Fri, Sep 21, 2012 at 10:51 AM, Sean Davis <sdavis2@mail.nih.gov> >> wrote: >> >> Hi, Val. >>> >>> Is there in interest in simply ignoring unknown INFO and GENOTYPE fields >>> when parsing VCF files, perhaps by issuing a warning instead of an error? >>> There are LOTS of malformed VCF files out there. In some cases, they >>> are >>> not useable, but in this case, they can be perfectly useable if these >>> unknown fields are simply ignored. >>> >>> dat = readVcf('tmp.gatk.vcf',genome=**'hg19') >>>> >>> Error: scanVcf: record 22 INFO 'KGPilot123' not found >>> path: >>> /Volumes/CCRBioinfo/projects/**RosenbergImmuneStudy/staging/** >>> tmp.gatk.vcf >>> >>> Thanks, >>> Sean >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6