Help needed! What wrong with VariantAnnotation and TCGA vcfs
1
0
Entering edit mode
ying chen ▴ 340
@ying-chen-5085
Last seen 10.2 years ago
Hi guys, sorry to bother you again. I am new to VariantAnnotation package and keep having some weird errors when testing with TCGA vcfs. > start.loc <- 55086725 > end.loc <- 55275031 > test.gr <- GRanges("7", IRanges(start.loc, end.loc)) > file <- system.file("vcf", "NA06985_17.vcf.gz", package = "cgdv17") > params <- ScanVcfParam(which=test.gr) > vcf <- readVcf(file, "hg19", params) ## the above run successful with the vcf coming with the VariantAnnotation package ## the following tests the same code with TCGA vcf > dir() [1] "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz" "TCGA-AF- 3913_W_IlluminaGA-DNASeq_exome.vcf.gz.tbi" > vcf <- readVcf("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz", "hg19", params) Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : invalid rownames length > hdr <- scanVcfHeader("TCGA-AF-3913_W_IlluminaGA- DNASeq_exome.vcf.gz") Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : invalid rownames length > I looked at the TCGA vcf and the chr7-sub.vcf included with VariantAnnotation package and could not tell what want wrong, except that TCGA vcf text file has several "PRIMARY" entries. Any suggestion? Thanks a lot for the help! Ying The following is the header and first 2 lines of the TCGA vcf ##fileformat=VCFv4.0 ##fileDate=20110203 ##center=UCSC ##source="bambam pipeline v1.1" ##reference=<id=ncbi-human- build36,source="&lt;a href=" ftp:="" genome.wustl.edu="" pub="" reference="" NCBI-human-"="" rel="nofollow">ftp://genome.wustl.edu/pub/reference//NCBI-human- build36/all_sequences.bam"> ##phasing=none ##INDIVIDUAL=TCGA-AF-3913 ##SAMPLE=<id=normal,individual="tcga-af-3913",description="normal sample",file="/cluster/depot/read/exome/TCGA-AF-3913-11A-01W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131301"> ##SAMPLE=<id=primary,individual="tcga-af-3913",description="primary tumor",file="/cluster/depot/read/exome/TCGA-AF-3913-01A-02W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131293"> ##INFO=<id=db,number=0,type=flag,description="dbsnp membership,="" build="" 130"=""> ##INFO=<id=somatic,number=0,type=flag,description="somatic mutation="" in="" primary"=""> ##INFO=<id=dp,number=1,type=integer,description="total read="" depth="" for="" all="" samples"=""> ##INFO=<id=del,number=1,type=integer,description="deletion x="" bps="" away"=""> ##INFO=<id=ins,number=1,type=integer,description="insertion x="" bps="" away"=""> ##INFO=<id=vt,number=1,type=string,description="somatic variant="" type"=""> ##INFO=<id=protch,number=1,type=string,description="protein change="" due="" to="" somatic="" variant"=""> ##INFO=<id=ss,number=1,type=integer,description="somatic status="" of="" sample"=""> ##FILTER=<id=q10,description="genotype quality="" <="" 10"=""> ##FILTER=<id=blq,description="position overlaps="" 1000="" genomes="" project="" mapping="" quality="" blacklist"=""> ##FILTER=<id=bldp,description="position overlap="" 1000="" genomes="" project="" depth="" blacklist"=""> ##FILTER=<id=ma,description="position in="" germline="" has="" 2+="" support="" for="" 2+="" alleles"=""> ##FILTER=<id=idl10,description="position is="" within="" 10="" bases="" of="" an="" indel"=""> ##FILTER=<id=idls5,description="less than="" 5="" reads="" supporting="" indel="" in="" appropriate="" tissue"=""> ##FILTER=<id=fa20,description="fraction of="" alt="" below="" 20%="" of="" reads"=""> ##FORMAT=<id=gt,number=1,type=string,description="genotype"> ##FORMAT=<id=dp,number=1,type=integer,description="read depth"=""> ##FORMAT=<id=bq,number=1,type=integer,description="average base="" quality"=""> ##FORMAT=<id=fa,number=1,type=float,description="fraction of="" reads="" supporting="" alt"=""> ##tcgaversion=1.0 ##vcfProcessLog=<inputvcf=< inside="" grotto="" bambam="" coad_read="" exome="" tcga-="" af-3913_w_illuminaga-dnaseq_exome.vcf="">,InputVCFSource=<bambam>,InputVC FVer=<1.1>,InputVCFParam=<exome>> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL PRIMARY 1 4770 . A G 28 bldp;blq SS=1;VT=SNP;DB;DP=7 GT:DP:BQ:FA 0/1:3:36:0.333 0/1:4:36:0.5 1 4793 > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] cgdv17_0.0.20 TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.0 [3] GenomicFeatures_1.12.0 GGtools_4.8.0 [5] GGBase_3.22.0 snpStats_1.10.0 [7] Matrix_1.0-12 lattice_0.20-15 [9] survival_2.37-4 org.Hs.eg.db_2.9.0 [11] RSQLite_0.11.2 DBI_0.2-5 [13] AnnotationDbi_1.22.1 Biobase_2.20.0 [15] VariantAnnotation_1.6.1 Rsamtools_1.12.0 [17] Biostrings_2.28.0 GenomicRanges_1.12.1 [19] IRanges_1.18.0 BiocGenerics_0.6.0 [21] BiocInstaller_1.10.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 biomaRt_2.16.0 bit_1.1-10 bitops_1.0-5 BSgenome_1.28.0 [6] ff_2.2-11 genefilter_1.42.0 grid_3.0.0 RCurl_1.95-4.1 rtracklayer_1.20.0 [11] tools_3.0.0 XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0 > [[alternative HTML version deleted]]
VariantAnnotation genomes VariantAnnotation VariantAnnotation genomes VariantAnnotation • 2.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 04/11/2013 07:55 PM, ying chen wrote: > Hi guys, sorry to bother you again. > > I am new to VariantAnnotation package and keep having some weird errors when testing with TCGA vcfs. > >> start.loc <- 55086725 >> end.loc <- 55275031 >> test.gr <- GRanges("7", IRanges(start.loc, end.loc)) >> file <- system.file("vcf", "NA06985_17.vcf.gz", package = "cgdv17") >> params <- ScanVcfParam(which=test.gr) >> vcf <- readVcf(file, "hg19", params) > > ## the above run successful with the vcf coming with the VariantAnnotation package > ## the following tests the same code with TCGA vcf > >> dir() > [1] "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz" "TCGA-AF- 3913_W_IlluminaGA-DNASeq_exome.vcf.gz.tbi" >> vcf <- readVcf("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz", "hg19", params) > Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : > invalid rownames length >> hdr <- scanVcfHeader("TCGA-AF-3913_W_IlluminaGA- DNASeq_exome.vcf.gz") > Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : > invalid rownames length >> > > I looked at the TCGA vcf and the chr7-sub.vcf included with VariantAnnotation package and could not tell what want wrong, except that TCGA vcf text file has several "PRIMARY" entries. > > Any suggestion? Sorry, this was a bug in _Rsamtools_; it is fixed in version 1.12.1 which will be available Saturday after 10am Seattle time. The problems are with the ##SAMPLE lines and the ##vcfProcessLog line. Martin > > Thanks a lot for the help! > > Ying > > The following is the header and first 2 lines of the TCGA vcf > > ##fileformat=VCFv4.0 > ##fileDate=20110203 > ##center=UCSC > ##source="bambam pipeline v1.1" > ##reference=<id=ncbi-human- build36,source="&lt;a href=" ftp:="" genome.wustl.edu="" pub="" reference="" NCBI-human-"="" rel="nofollow">ftp://genome.wustl.edu/pub/reference//NCBI-human- build36/all_sequences.bam"> > ##phasing=none > ##INDIVIDUAL=TCGA-AF-3913 > ##SAMPLE=<id=normal,individual="tcga-af-3913",description="normal sample",file="/cluster/depot/read/exome/TCGA-AF-3913-11A-01W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131301"> > ##SAMPLE=<id=primary,individual="tcga-af-3913",description="primary tumor",file="/cluster/depot/read/exome/TCGA-AF-3913-01A-02W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131293"> > ##INFO=<id=db,number=0,type=flag,description="dbsnp membership,="" build="" 130"=""> > ##INFO=<id=somatic,number=0,type=flag,description="somatic mutation="" in="" primary"=""> > ##INFO=<id=dp,number=1,type=integer,description="total read="" depth="" for="" all="" samples"=""> > ##INFO=<id=del,number=1,type=integer,description="deletion x="" bps="" away"=""> > ##INFO=<id=ins,number=1,type=integer,description="insertion x="" bps="" away"=""> > ##INFO=<id=vt,number=1,type=string,description="somatic variant="" type"=""> > ##INFO=<id=protch,number=1,type=string,description="protein change="" due="" to="" somatic="" variant"=""> > ##INFO=<id=ss,number=1,type=integer,description="somatic status="" of="" sample"=""> > ##FILTER=<id=q10,description="genotype quality="" <="" 10"=""> > ##FILTER=<id=blq,description="position overlaps="" 1000="" genomes="" project="" mapping="" quality="" blacklist"=""> > ##FILTER=<id=bldp,description="position overlap="" 1000="" genomes="" project="" depth="" blacklist"=""> > ##FILTER=<id=ma,description="position in="" germline="" has="" 2+="" support="" for="" 2+="" alleles"=""> > ##FILTER=<id=idl10,description="position is="" within="" 10="" bases="" of="" an="" indel"=""> > ##FILTER=<id=idls5,description="less than="" 5="" reads="" supporting="" indel="" in="" appropriate="" tissue"=""> > ##FILTER=<id=fa20,description="fraction of="" alt="" below="" 20%="" of="" reads"=""> > ##FORMAT=<id=gt,number=1,type=string,description="genotype"> > ##FORMAT=<id=dp,number=1,type=integer,description="read depth"=""> > ##FORMAT=<id=bq,number=1,type=integer,description="average base="" quality"=""> > ##FORMAT=<id=fa,number=1,type=float,description="fraction of="" reads="" supporting="" alt"=""> > ##tcgaversion=1.0 > ##vcfProcessLog=<inputvcf=< inside="" grotto="" bambam="" coad_read="" exome="" tcga-af-3913_w_illuminaga-dnaseq_exome.vcf="">,InputVCFSource=<bambam>,I nputVCFVer=<1.1>,InputVCFParam=<exome>> > #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL PRIMARY > 1 4770 . A G 28 bldp;blq SS=1;VT=SNP;DB;DP=7 GT:DP:BQ:FA 0/1:3:36:0.333 0/1:4:36:0.5 > 1 4793 > >> sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-w64-mingw32/x64 (64-bit) > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] splines stats4 parallel stats graphics grDevices utils datasets methods base > other attached packages: > [1] cgdv17_0.0.20 TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.0 > [3] GenomicFeatures_1.12.0 GGtools_4.8.0 > [5] GGBase_3.22.0 snpStats_1.10.0 > [7] Matrix_1.0-12 lattice_0.20-15 > [9] survival_2.37-4 org.Hs.eg.db_2.9.0 > [11] RSQLite_0.11.2 DBI_0.2-5 > [13] AnnotationDbi_1.22.1 Biobase_2.20.0 > [15] VariantAnnotation_1.6.1 Rsamtools_1.12.0 > [17] Biostrings_2.28.0 GenomicRanges_1.12.1 > [19] IRanges_1.18.0 BiocGenerics_0.6.0 > [21] BiocInstaller_1.10.0 > loaded via a namespace (and not attached): > [1] annotate_1.38.0 biomaRt_2.16.0 bit_1.1-10 bitops_1.0-5 BSgenome_1.28.0 > [6] ff_2.2-11 genefilter_1.42.0 grid_3.0.0 RCurl_1.95-4.1 rtracklayer_1.20.0 > [11] tools_3.0.0 XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0 >> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6