readVcf bgzip error
1
0
Entering edit mode
Davis, Brian ▴ 40
@davis-brian-5165
Last seen 9.6 years ago
I'm seeing an error when I read in a compress vcf, but not when I read in the uncompressed vcf. Can anyone point me in the right direction to figure out what I'm doing wrong? I've tried this on 3 different vcfs with the same error (different record fails). > # read in a complete file > fl <- "first10K.vcf" > vcf <- readVcf(fl, "hg19") > vcf class: VCF dim: 9934 998 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(39): NS DP ... HD HP geno(6): GT VR ... GQ FT rownames(9934): 1:69270 1:69360 ... 1:19597392 1:19597396 rowData values names(1): paramRangeID colnames(998): A00003 A00057 ... A16457 '' colData names(1): Samples > > # now try again but compress it first > fl <- "first10K.vcf" > compressVcf <- bgzip(fl, tempfile()) > idx <- indexTabix(compressVcf, "vcf") > tab <- TabixFile(compressVcf, idx) > vcf <- readVcf(tab, "hg19") Error: scanVcf: record 4370 INFO '0/0:.:130:131:.:.' not found path: C:\Users\bdavis2\AppData\Local\Temp\RtmpwrXcST\file1dc84cff4177 > > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.2.10 Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.13 [5] IRanges_1.14.4 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 [5] BSgenome_1.24.0 DBI_0.2-5 GenomicFeatures_1.8.3 grid_2.15.1 [9] lattice_0.20-10 Matrix_1.0-9 RCurl_1.95-1.1 RSQLite_0.11.2 [13] rtracklayer_1.16.3 snpStats_1.6.0 splines_2.15.1 stats4_2.15.1 [17] survival_2.36-14 tools_2.15.1 XML_3.95-0.1 zlibbioc_1.2.0 Brian [[alternative HTML version deleted]]
• 1.7k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi Brian, I'm not sure what's going on here. Can you point me to where you got this file or is it small enough to send? Valerie On 10/09/2012 12:06 PM, Davis, Brian wrote: > I'm seeing an error when I read in a compress vcf, but not when I read in the uncompressed vcf. Can anyone point me in the right direction to figure out what I'm doing wrong? I've tried this on 3 different vcfs with the same error (different record fails). > >> # read in a complete file >> fl<- "first10K.vcf" >> vcf<- readVcf(fl, "hg19") >> vcf > class: VCF > dim: 9934 998 > genome: hg19 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(39): NS DP ... HD HP > geno(6): GT VR ... GQ FT > rownames(9934): 1:69270 1:69360 ... 1:19597392 1:19597396 > rowData values names(1): paramRangeID > colnames(998): A00003 A00057 ... A16457 '' > colData names(1): Samples >> # now try again but compress it first >> fl<- "first10K.vcf" >> compressVcf<- bgzip(fl, tempfile()) >> idx<- indexTabix(compressVcf, "vcf") >> tab<- TabixFile(compressVcf, idx) >> vcf<- readVcf(tab, "hg19") > Error: scanVcf: record 4370 INFO '0/0:.:130:131:.:.' not found > path: C:\Users\bdavis2\AppData\Local\Temp\RtmpwrXcST\file1dc84cff4177 >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] VariantAnnotation_1.2.10 Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.13 > [5] IRanges_1.14.4 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 > [5] BSgenome_1.24.0 DBI_0.2-5 GenomicFeatures_1.8.3 grid_2.15.1 > [9] lattice_0.20-10 Matrix_1.0-9 RCurl_1.95-1.1 RSQLite_0.11.2 > [13] rtracklayer_1.16.3 snpStats_1.6.0 splines_2.15.1 stats4_2.15.1 > [17] survival_2.36-14 tools_2.15.1 XML_3.95-0.1 zlibbioc_1.2.0 > > Brian > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Valerie, This is human subject data so I'll have to work on getting permissions to share on my end. In the mean time I'll try to reproduce with 1000 genomes data. Brain -----Original Message----- From: Valerie Obenchain [mailto:vobencha@fhcrc.org] Sent: Tuesday, October 09, 2012 6:19 PM To: Davis, Brian Cc: bioconductor at r-project.org Subject: Re: [BioC] readVcf bgzip error Hi Brian, I'm not sure what's going on here. Can you point me to where you got this file or is it small enough to send? Valerie On 10/09/2012 12:06 PM, Davis, Brian wrote: > I'm seeing an error when I read in a compress vcf, but not when I read in the uncompressed vcf. Can anyone point me in the right direction to figure out what I'm doing wrong? I've tried this on 3 different vcfs with the same error (different record fails). > >> # read in a complete file >> fl<- "first10K.vcf" >> vcf<- readVcf(fl, "hg19") >> vcf > class: VCF > dim: 9934 998 > genome: hg19 > exptData(1): header > fixed(4): REF ALT QUAL FILTER > info(39): NS DP ... HD HP > geno(6): GT VR ... GQ FT > rownames(9934): 1:69270 1:69360 ... 1:19597392 1:19597396 rowData > values names(1): paramRangeID > colnames(998): A00003 A00057 ... A16457 '' > colData names(1): Samples >> # now try again but compress it first >> fl<- "first10K.vcf" >> compressVcf<- bgzip(fl, tempfile()) >> idx<- indexTabix(compressVcf, "vcf") >> tab<- TabixFile(compressVcf, idx) >> vcf<- readVcf(tab, "hg19") > Error: scanVcf: record 4370 INFO '0/0:.:130:131:.:.' not found > path: > C:\Users\bdavis2\AppData\Local\Temp\RtmpwrXcST\file1dc84cff4177 >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] VariantAnnotation_1.2.10 Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.13 > [5] IRanges_1.14.4 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 > [5] BSgenome_1.24.0 DBI_0.2-5 GenomicFeatures_1.8.3 grid_2.15.1 > [9] lattice_0.20-10 Matrix_1.0-9 RCurl_1.95-1.1 RSQLite_0.11.2 > [13] rtracklayer_1.16.3 snpStats_1.6.0 splines_2.15.1 stats4_2.15.1 > [17] survival_2.36-14 tools_2.15.1 XML_3.95-0.1 zlibbioc_1.2.0 > > Brian > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Brian, Thanks for sending the sample data. I've tested the file on Windows and am not able to reproduce the error. It looks like you've got some old packages in your installation. A new version of Bioconductor (2.11) was just released last week. To update your packages you can follow the instructions here, http://bioconductor.org/install/ Let me know if you still get the error after updating. Valerie Output for testing in devel : ## Read in the uncompressed file > library(VariantAnnotation) > fl <- "vcftest.vcf" > vcf <- readVcf(fl, "hg19") > vcf class: VCF dim: 700 997 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(39): NS DP ... HD HP geno(6): GT VR ... GQ FT rownames(700): 1:6711146 1:6711190 ... 1:9662200 1:9662234 rowData values names(1): paramRangeID colnames(997): A00003 A00057 ... A16420 A16457 colData names(1): Samples ## Compress and read in > compressVcf <- bgzip(fl, tempfile()) > idx <- indexTabix(compressVcf, "vcf") > tab <- TabixFile(compressVcf, idx) > cmp <- readVcf(tab, "hg19") > cmp class: VCF dim: 700 997 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(39): NS DP ... HD HP geno(6): GT VR ... GQ FT rownames(700): 1:6711146 1:6711190 ... 1:9662200 1:9662234 rowData values names(1): paramRangeID colnames(997): A00003 A00057 ... A16420 A16457 colData names(1): Samples > sessionInfo() R Under development (unstable) (2012-10-05 r60879) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.5.3 Rsamtools_1.11.1 Biostrings_2.27.2 [4] GenomicRanges_1.11.2 IRanges_1.17.2 BiocGenerics_0.5. loaded via a namespace (and not attached): [1] AnnotationDbi_1.21.2 Biobase_2.19.0 biomaRt_2.15.0 [4] bitops_1.0-4.1 BSgenome_1.27.1 DBI_0.2-5 [7] GenomicFeatures_1.11.1 parallel_2.16.0 RCurl_1.95-1.1 [10] RSQLite_0.11.2 rtracklayer_1.19.0 stats4_2.16.0 [13] tools_2.16.0 XML_3.95-0.1 zlibbioc_1.5.0 On 10/10/2012 06:33 AM, Davis, Brian wrote: > Valerie, > > This is human subject data so I'll have to work on getting permissions to share on my end. In the mean time I'll try to reproduce with 1000 genomes data. > > > Brain > > -----Original Message----- > From: Valerie Obenchain [mailto:vobencha at fhcrc.org] > Sent: Tuesday, October 09, 2012 6:19 PM > To: Davis, Brian > Cc: bioconductor at r-project.org > Subject: Re: [BioC] readVcf bgzip error > > Hi Brian, > > I'm not sure what's going on here. Can you point me to where you got this file or is it small enough to send? > > Valerie > > > > > On 10/09/2012 12:06 PM, Davis, Brian wrote: >> I'm seeing an error when I read in a compress vcf, but not when I read in the uncompressed vcf. Can anyone point me in the right direction to figure out what I'm doing wrong? I've tried this on 3 different vcfs with the same error (different record fails). >> >>> # read in a complete file >>> fl<- "first10K.vcf" >>> vcf<- readVcf(fl, "hg19") >>> vcf >> class: VCF >> dim: 9934 998 >> genome: hg19 >> exptData(1): header >> fixed(4): REF ALT QUAL FILTER >> info(39): NS DP ... HD HP >> geno(6): GT VR ... GQ FT >> rownames(9934): 1:69270 1:69360 ... 1:19597392 1:19597396 rowData >> values names(1): paramRangeID >> colnames(998): A00003 A00057 ... A16457 '' >> colData names(1): Samples >>> # now try again but compress it first >>> fl<- "first10K.vcf" >>> compressVcf<- bgzip(fl, tempfile()) >>> idx<- indexTabix(compressVcf, "vcf") >>> tab<- TabixFile(compressVcf, idx) >>> vcf<- readVcf(tab, "hg19") >> Error: scanVcf: record 4370 INFO '0/0:.:130:131:.:.' not found >> path: >> C:\Users\bdavis2\AppData\Local\Temp\RtmpwrXcST\file1dc84cff4177 >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-pc-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] VariantAnnotation_1.2.10 Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.13 >> [5] IRanges_1.14.4 BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 >> [5] BSgenome_1.24.0 DBI_0.2-5 GenomicFeatures_1.8.3 grid_2.15.1 >> [9] lattice_0.20-10 Matrix_1.0-9 RCurl_1.95-1.1 RSQLite_0.11.2 >> [13] rtracklayer_1.16.3 snpStats_1.6.0 splines_2.15.1 stats4_2.15.1 >> [17] survival_2.36-14 tools_2.15.1 XML_3.95-0.1 zlibbioc_1.2.0 >> >> Brian >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Valerie, Thanks for looking into this. I'm new to Biocondoctor, but not R. I hadn't realized update.package didn't update Bioconductor. Anyways, this did indeed fix my problem. Again, thanks. Brian -----Original Message----- From: Valerie Obenchain [mailto:vobencha@fhcrc.org] Sent: Friday, October 12, 2012 3:07 PM To: Davis, Brian Cc: bioconductor at r-project.org Subject: Re: [BioC] readVcf bgzip error Brian, Thanks for sending the sample data. I've tested the file on Windows and am not able to reproduce the error. It looks like you've got some old packages in your installation. A new version of Bioconductor (2.11) was just released last week. To update your packages you can follow the instructions here, http://bioconductor.org/install/ Let me know if you still get the error after updating. Valerie Output for testing in devel : ## Read in the uncompressed file > library(VariantAnnotation) > fl <- "vcftest.vcf" > vcf <- readVcf(fl, "hg19") > vcf class: VCF dim: 700 997 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(39): NS DP ... HD HP geno(6): GT VR ... GQ FT rownames(700): 1:6711146 1:6711190 ... 1:9662200 1:9662234 rowData values names(1): paramRangeID colnames(997): A00003 A00057 ... A16420 A16457 colData names(1): Samples ## Compress and read in > compressVcf <- bgzip(fl, tempfile()) > idx <- indexTabix(compressVcf, "vcf") > tab <- TabixFile(compressVcf, idx) > cmp <- readVcf(tab, "hg19") > cmp class: VCF dim: 700 997 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(39): NS DP ... HD HP geno(6): GT VR ... GQ FT rownames(700): 1:6711146 1:6711190 ... 1:9662200 1:9662234 rowData values names(1): paramRangeID colnames(997): A00003 A00057 ... A16420 A16457 colData names(1): Samples > sessionInfo() R Under development (unstable) (2012-10-05 r60879) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.5.3 Rsamtools_1.11.1 Biostrings_2.27.2 [4] GenomicRanges_1.11.2 IRanges_1.17.2 BiocGenerics_0.5. loaded via a namespace (and not attached): [1] AnnotationDbi_1.21.2 Biobase_2.19.0 biomaRt_2.15.0 [4] bitops_1.0-4.1 BSgenome_1.27.1 DBI_0.2-5 [7] GenomicFeatures_1.11.1 parallel_2.16.0 RCurl_1.95-1.1 [10] RSQLite_0.11.2 rtracklayer_1.19.0 stats4_2.16.0 [13] tools_2.16.0 XML_3.95-0.1 zlibbioc_1.5.0 On 10/10/2012 06:33 AM, Davis, Brian wrote: > Valerie, > > This is human subject data so I'll have to work on getting permissions to share on my end. In the mean time I'll try to reproduce with 1000 genomes data. > > > Brain > > -----Original Message----- > From: Valerie Obenchain [mailto:vobencha at fhcrc.org] > Sent: Tuesday, October 09, 2012 6:19 PM > To: Davis, Brian > Cc: bioconductor at r-project.org > Subject: Re: [BioC] readVcf bgzip error > > Hi Brian, > > I'm not sure what's going on here. Can you point me to where you got this file or is it small enough to send? > > Valerie > > > > > On 10/09/2012 12:06 PM, Davis, Brian wrote: >> I'm seeing an error when I read in a compress vcf, but not when I read in the uncompressed vcf. Can anyone point me in the right direction to figure out what I'm doing wrong? I've tried this on 3 different vcfs with the same error (different record fails). >> >>> # read in a complete file >>> fl<- "first10K.vcf" >>> vcf<- readVcf(fl, "hg19") >>> vcf >> class: VCF >> dim: 9934 998 >> genome: hg19 >> exptData(1): header >> fixed(4): REF ALT QUAL FILTER >> info(39): NS DP ... HD HP >> geno(6): GT VR ... GQ FT >> rownames(9934): 1:69270 1:69360 ... 1:19597392 1:19597396 rowData >> values names(1): paramRangeID >> colnames(998): A00003 A00057 ... A16457 '' >> colData names(1): Samples >>> # now try again but compress it first >>> fl<- "first10K.vcf" >>> compressVcf<- bgzip(fl, tempfile()) >>> idx<- indexTabix(compressVcf, "vcf") >>> tab<- TabixFile(compressVcf, idx) >>> vcf<- readVcf(tab, "hg19") >> Error: scanVcf: record 4370 INFO '0/0:.:130:131:.:.' not found >> path: >> C:\Users\bdavis2\AppData\Local\Temp\RtmpwrXcST\file1dc84cff4177 >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-pc-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] VariantAnnotation_1.2.10 Rsamtools_1.8.6 Biostrings_2.24.1 GenomicRanges_1.8.13 >> [5] IRanges_1.14.4 BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.18.1 Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 >> [5] BSgenome_1.24.0 DBI_0.2-5 GenomicFeatures_1.8.3 grid_2.15.1 >> [9] lattice_0.20-10 Matrix_1.0-9 RCurl_1.95-1.1 RSQLite_0.11.2 >> [13] rtracklayer_1.16.3 snpStats_1.6.0 splines_2.15.1 stats4_2.15.1 >> [17] survival_2.36-14 tools_2.15.1 XML_3.95-0.1 zlibbioc_1.2.0 >> >> Brian >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 1119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6