Help needed! What wrong with VariantAnnotation and TCGA vcfs

0

Entering edit mode

ying chen ▴ 340

@ying-chen-5085

Last seen 10.6 years ago

Hi guys, sorry to bother you again. I am new to VariantAnnotation package and keep having some weird errors when testing with TCGA vcfs. > start.loc <- 55086725 > end.loc <- 55275031 > test.gr <- GRanges("7", IRanges(start.loc, end.loc)) > file <- system.file("vcf", "NA06985_17.vcf.gz", package = "cgdv17") > params <- ScanVcfParam(which=test.gr) > vcf <- readVcf(file, "hg19", params) ## the above run successful with the vcf coming with the VariantAnnotation package ## the following tests the same code with TCGA vcf > dir() [1] "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz" "TCGA-AF- 3913_W_IlluminaGA-DNASeq_exome.vcf.gz.tbi" > vcf <- readVcf("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz", "hg19", params) Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : invalid rownames length > hdr <- scanVcfHeader("TCGA-AF-3913_W_IlluminaGA- DNASeq_exome.vcf.gz") Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : invalid rownames length > I looked at the TCGA vcf and the chr7-sub.vcf included with VariantAnnotation package and could not tell what want wrong, except that TCGA vcf text file has several "PRIMARY" entries. Any suggestion? Thanks a lot for the help! Ying The following is the header and first 2 lines of the TCGA vcf ##fileformat=VCFv4.0 ##fileDate=20110203 ##center=UCSC ##source="bambam pipeline v1.1" ##reference=<id=ncbi-human- build36,source="<a href=" ftp:="" genome.wustl.edu="" pub="" reference="" NCBI-human-"="" rel="nofollow">ftp://genome.wustl.edu/pub/reference//NCBI-human- build36/all_sequences.bam"> ##phasing=none ##INDIVIDUAL=TCGA-AF-3913 ##SAMPLE=<id=normal,individual="tcga-af-3913",description="normal sample",file="/cluster/depot/read/exome/TCGA-AF-3913-11A-01W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131301"> ##SAMPLE=<id=primary,individual="tcga-af-3913",description="primary tumor",file="/cluster/depot/read/exome/TCGA-AF-3913-01A-02W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131293"> ##INFO=<id=db,number=0,type=flag,description="dbsnp membership,="" build="" 130"=""> ##INFO=<id=somatic,number=0,type=flag,description="somatic mutation="" in="" primary"=""> ##INFO=<id=dp,number=1,type=integer,description="total read="" depth="" for="" all="" samples"=""> ##INFO=<id=del,number=1,type=integer,description="deletion x="" bps="" away"=""> ##INFO=<id=ins,number=1,type=integer,description="insertion x="" bps="" away"=""> ##INFO=<id=vt,number=1,type=string,description="somatic variant="" type"=""> ##INFO=<id=protch,number=1,type=string,description="protein change="" due="" to="" somatic="" variant"=""> ##INFO=<id=ss,number=1,type=integer,description="somatic status="" of="" sample"=""> ##FILTER=<id=q10,description="genotype quality="" <="" 10"=""> ##FILTER=<id=blq,description="position overlaps="" 1000="" genomes="" project="" mapping="" quality="" blacklist"=""> ##FILTER=<id=bldp,description="position overlap="" 1000="" genomes="" project="" depth="" blacklist"=""> ##FILTER=<id=ma,description="position in="" germline="" has="" 2+="" support="" for="" 2+="" alleles"=""> ##FILTER=<id=idl10,description="position is="" within="" 10="" bases="" of="" an="" indel"=""> ##FILTER=<id=idls5,description="less than="" 5="" reads="" supporting="" indel="" in="" appropriate="" tissue"=""> ##FILTER=<id=fa20,description="fraction of="" alt="" below="" 20%="" of="" reads"=""> ##FORMAT=<id=gt,number=1,type=string,description="genotype"> ##FORMAT=<id=dp,number=1,type=integer,description="read depth"=""> ##FORMAT=<id=bq,number=1,type=integer,description="average base="" quality"=""> ##FORMAT=<id=fa,number=1,type=float,description="fraction of="" reads="" supporting="" alt"=""> ##tcgaversion=1.0 ##vcfProcessLog=<inputvcf=< inside="" grotto="" bambam="" coad_read="" exome="" tcga-="" af-3913_w_illuminaga-dnaseq_exome.vcf="">,InputVCFSource=<bambam>,InputVC FVer=<1.1>,InputVCFParam=<exome>> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL PRIMARY 1 4770 . A G 28 bldp;blq SS=1;VT=SNP;DB;DP=7 GT:DP:BQ:FA 0/1:3:36:0.333 0/1:4:36:0.5 1 4793 > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] cgdv17_0.0.20 TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.0 [3] GenomicFeatures_1.12.0 GGtools_4.8.0 [5] GGBase_3.22.0 snpStats_1.10.0 [7] Matrix_1.0-12 lattice_0.20-15 [9] survival_2.37-4 org.Hs.eg.db_2.9.0 [11] RSQLite_0.11.2 DBI_0.2-5 [13] AnnotationDbi_1.22.1 Biobase_2.20.0 [15] VariantAnnotation_1.6.1 Rsamtools_1.12.0 [17] Biostrings_2.28.0 GenomicRanges_1.12.1 [19] IRanges_1.18.0 BiocGenerics_0.6.0 [21] BiocInstaller_1.10.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 biomaRt_2.16.0 bit_1.1-10 bitops_1.0-5 BSgenome_1.28.0 [6] ff_2.2-11 genefilter_1.42.0 grid_3.0.0 RCurl_1.95-4.1 rtracklayer_1.20.0 [11] tools_3.0.0 XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0 > [[alternative HTML version deleted]]

VariantAnnotation genomes VariantAnnotation VariantAnnotation genomes VariantAnnotation • 2.2k views

ADD COMMENT • link updated 12.0 years ago by Martin Morgan 25k • written 12.0 years ago by ying chen ▴ 340

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 9 weeks ago

United States

On 04/11/2013 07:55 PM, ying chen wrote: > Hi guys, sorry to bother you again. > > I am new to VariantAnnotation package and keep having some weird errors when testing with TCGA vcfs. > >> start.loc <- 55086725 >> end.loc <- 55275031 >> test.gr <- GRanges("7", IRanges(start.loc, end.loc)) >> file <- system.file("vcf", "NA06985_17.vcf.gz", package = "cgdv17") >> params <- ScanVcfParam(which=test.gr) >> vcf <- readVcf(file, "hg19", params) > > ## the above run successful with the vcf coming with the VariantAnnotation package > ## the following tests the same code with TCGA vcf > >> dir() > [1] "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz" "TCGA-AF- 3913_W_IlluminaGA-DNASeq_exome.vcf.gz.tbi" >> vcf <- readVcf("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz", "hg19", params) > Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : > invalid rownames length >> hdr <- scanVcfHeader("TCGA-AF-3913_W_IlluminaGA- DNASeq_exome.vcf.gz") > Error in `rownames<-`(`*tmp*`, value = "PRIMARY") : > invalid rownames length >> > > I looked at the TCGA vcf and the chr7-sub.vcf included with VariantAnnotation package and could not tell what want wrong, except that TCGA vcf text file has several "PRIMARY" entries. > > Any suggestion? Sorry, this was a bug in _Rsamtools_; it is fixed in version 1.12.1 which will be available Saturday after 10am Seattle time. The problems are with the ##SAMPLE lines and the ##vcfProcessLog line. Martin > > Thanks a lot for the help! > > Ying > > The following is the header and first 2 lines of the TCGA vcf > > ##fileformat=VCFv4.0 > ##fileDate=20110203 > ##center=UCSC > ##source="bambam pipeline v1.1" > ##reference=<id=ncbi-human- build36,source="<a href=" ftp:="" genome.wustl.edu="" pub="" reference="" NCBI-human-"="" rel="nofollow">ftp://genome.wustl.edu/pub/reference//NCBI-human- build36/all_sequences.bam"> > ##phasing=none > ##INDIVIDUAL=TCGA-AF-3913 > ##SAMPLE=<id=normal,individual="tcga-af-3913",description="normal sample",file="/cluster/depot/read/exome/TCGA-AF-3913-11A-01W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131301"> > ##SAMPLE=<id=primary,individual="tcga-af-3913",description="primary tumor",file="/cluster/depot/read/exome/TCGA-AF-3913-01A-02W-1073 -09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131293"> > ##INFO=<id=db,number=0,type=flag,description="dbsnp membership,="" build="" 130"=""> > ##INFO=<id=somatic,number=0,type=flag,description="somatic mutation="" in="" primary"=""> > ##INFO=<id=dp,number=1,type=integer,description="total read="" depth="" for="" all="" samples"=""> > ##INFO=<id=del,number=1,type=integer,description="deletion x="" bps="" away"=""> > ##INFO=<id=ins,number=1,type=integer,description="insertion x="" bps="" away"=""> > ##INFO=<id=vt,number=1,type=string,description="somatic variant="" type"=""> > ##INFO=<id=protch,number=1,type=string,description="protein change="" due="" to="" somatic="" variant"=""> > ##INFO=<id=ss,number=1,type=integer,description="somatic status="" of="" sample"=""> > ##FILTER=<id=q10,description="genotype quality="" <="" 10"=""> > ##FILTER=<id=blq,description="position overlaps="" 1000="" genomes="" project="" mapping="" quality="" blacklist"=""> > ##FILTER=<id=bldp,description="position overlap="" 1000="" genomes="" project="" depth="" blacklist"=""> > ##FILTER=<id=ma,description="position in="" germline="" has="" 2+="" support="" for="" 2+="" alleles"=""> > ##FILTER=<id=idl10,description="position is="" within="" 10="" bases="" of="" an="" indel"=""> > ##FILTER=<id=idls5,description="less than="" 5="" reads="" supporting="" indel="" in="" appropriate="" tissue"=""> > ##FILTER=<id=fa20,description="fraction of="" alt="" below="" 20%="" of="" reads"=""> > ##FORMAT=<id=gt,number=1,type=string,description="genotype"> > ##FORMAT=<id=dp,number=1,type=integer,description="read depth"=""> > ##FORMAT=<id=bq,number=1,type=integer,description="average base="" quality"=""> > ##FORMAT=<id=fa,number=1,type=float,description="fraction of="" reads="" supporting="" alt"=""> > ##tcgaversion=1.0 > ##vcfProcessLog=<inputvcf=< inside="" grotto="" bambam="" coad_read="" exome="" tcga-af-3913_w_illuminaga-dnaseq_exome.vcf="">,InputVCFSource=<bambam>,I nputVCFVer=<1.1>,InputVCFParam=<exome>> > #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL PRIMARY > 1 4770 . A G 28 bldp;blq SS=1;VT=SNP;DB;DP=7 GT:DP:BQ:FA 0/1:3:36:0.333 0/1:4:36:0.5 > 1 4793 > >> sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-w64-mingw32/x64 (64-bit) > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] splines stats4 parallel stats graphics grDevices utils datasets methods base > other attached packages: > [1] cgdv17_0.0.20 TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.0 > [3] GenomicFeatures_1.12.0 GGtools_4.8.0 > [5] GGBase_3.22.0 snpStats_1.10.0 > [7] Matrix_1.0-12 lattice_0.20-15 > [9] survival_2.37-4 org.Hs.eg.db_2.9.0 > [11] RSQLite_0.11.2 DBI_0.2-5 > [13] AnnotationDbi_1.22.1 Biobase_2.20.0 > [15] VariantAnnotation_1.6.1 Rsamtools_1.12.0 > [17] Biostrings_2.28.0 GenomicRanges_1.12.1 > [19] IRanges_1.18.0 BiocGenerics_0.6.0 > [21] BiocInstaller_1.10.0 > loaded via a namespace (and not attached): > [1] annotate_1.38.0 biomaRt_2.16.0 bit_1.1-10 bitops_1.0-5 BSgenome_1.28.0 > [6] ff_2.2-11 genefilter_1.42.0 grid_3.0.0 RCurl_1.95-4.1 rtracklayer_1.20.0 > [11] tools_3.0.0 XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0 >> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 12.0 years ago Martin Morgan 25k

Login before adding your answer.