Entering edit mode
ying chen
▴
340
@ying-chen-5085
Last seen 10.2 years ago
Hi guys, sorry to bother you again.
I am new to VariantAnnotation package and keep having some weird
errors when testing with TCGA vcfs.
> start.loc <- 55086725
> end.loc <- 55275031
> test.gr <- GRanges("7", IRanges(start.loc, end.loc))
> file <- system.file("vcf", "NA06985_17.vcf.gz", package = "cgdv17")
> params <- ScanVcfParam(which=test.gr)
> vcf <- readVcf(file, "hg19", params)
## the above run successful with the vcf coming with the
VariantAnnotation package
## the following tests the same code with TCGA vcf
> dir()
[1] "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz" "TCGA-AF-
3913_W_IlluminaGA-DNASeq_exome.vcf.gz.tbi"
> vcf <- readVcf("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz",
"hg19", params)
Error in `rownames<-`(`*tmp*`, value = "PRIMARY") :
invalid rownames length
> hdr <- scanVcfHeader("TCGA-AF-3913_W_IlluminaGA-
DNASeq_exome.vcf.gz")
Error in `rownames<-`(`*tmp*`, value = "PRIMARY") :
invalid rownames length
>
I looked at the TCGA vcf and the chr7-sub.vcf included with
VariantAnnotation package and could not tell what want wrong, except
that TCGA vcf text file has several "PRIMARY" entries.
Any suggestion?
Thanks a lot for the help!
Ying
The following is the header and first 2 lines of the TCGA vcf
##fileformat=VCFv4.0
##fileDate=20110203
##center=UCSC
##source="bambam pipeline v1.1"
##reference=<id=ncbi-human- build36,source="<a href=" ftp:="" genome.wustl.edu="" pub="" reference="" NCBI-human-"="" rel="nofollow">ftp://genome.wustl.edu/pub/reference//NCBI-human-
build36/all_sequences.bam">
##phasing=none
##INDIVIDUAL=TCGA-AF-3913
##SAMPLE=<id=normal,individual="tcga-af-3913",description="normal sample",file="/cluster/depot/read/exome/TCGA-AF-3913-11A-01W-1073
-09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131301">
##SAMPLE=<id=primary,individual="tcga-af-3913",description="primary tumor",file="/cluster/depot/read/exome/TCGA-AF-3913-01A-02W-1073
-09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING" ,platform="Illumina" ,s="" ource="dbGaP" ,accession="SRS131293">
##INFO=<id=db,number=0,type=flag,description="dbsnp membership,="" build="" 130"="">
##INFO=<id=somatic,number=0,type=flag,description="somatic mutation="" in="" primary"="">
##INFO=<id=dp,number=1,type=integer,description="total read="" depth="" for="" all="" samples"="">
##INFO=<id=del,number=1,type=integer,description="deletion x="" bps="" away"="">
##INFO=<id=ins,number=1,type=integer,description="insertion x="" bps="" away"="">
##INFO=<id=vt,number=1,type=string,description="somatic variant="" type"="">
##INFO=<id=protch,number=1,type=string,description="protein change="" due="" to="" somatic="" variant"="">
##INFO=<id=ss,number=1,type=integer,description="somatic status="" of="" sample"="">
##FILTER=<id=q10,description="genotype quality="" <="" 10"="">
##FILTER=<id=blq,description="position overlaps="" 1000="" genomes="" project="" mapping="" quality="" blacklist"="">
##FILTER=<id=bldp,description="position overlap="" 1000="" genomes="" project="" depth="" blacklist"="">
##FILTER=<id=ma,description="position in="" germline="" has="" 2+="" support="" for="" 2+="" alleles"="">
##FILTER=<id=idl10,description="position is="" within="" 10="" bases="" of="" an="" indel"="">
##FILTER=<id=idls5,description="less than="" 5="" reads="" supporting="" indel="" in="" appropriate="" tissue"="">
##FILTER=<id=fa20,description="fraction of="" alt="" below="" 20%="" of="" reads"="">
##FORMAT=<id=gt,number=1,type=string,description="genotype">
##FORMAT=<id=dp,number=1,type=integer,description="read depth"="">
##FORMAT=<id=bq,number=1,type=integer,description="average base="" quality"="">
##FORMAT=<id=fa,number=1,type=float,description="fraction of="" reads="" supporting="" alt"="">
##tcgaversion=1.0
##vcfProcessLog=<inputvcf=< inside="" grotto="" bambam="" coad_read="" exome="" tcga-="" af-3913_w_illuminaga-dnaseq_exome.vcf="">,InputVCFSource=<bambam>,InputVC
FVer=<1.1>,InputVCFParam=<exome>>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL PRIMARY
1 4770 . A G 28 bldp;blq SS=1;VT=SNP;DB;DP=7 GT:DP:BQ:FA
0/1:3:36:0.333 0/1:4:36:0.5
1 4793
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] splines stats4 parallel stats graphics grDevices utils
datasets methods base
other attached packages:
[1] cgdv17_0.0.20
TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.0
[3] GenomicFeatures_1.12.0 GGtools_4.8.0
[5] GGBase_3.22.0 snpStats_1.10.0
[7] Matrix_1.0-12 lattice_0.20-15
[9] survival_2.37-4 org.Hs.eg.db_2.9.0
[11] RSQLite_0.11.2 DBI_0.2-5
[13] AnnotationDbi_1.22.1 Biobase_2.20.0
[15] VariantAnnotation_1.6.1 Rsamtools_1.12.0
[17] Biostrings_2.28.0 GenomicRanges_1.12.1
[19] IRanges_1.18.0 BiocGenerics_0.6.0
[21] BiocInstaller_1.10.0
loaded via a namespace (and not attached):
[1] annotate_1.38.0 biomaRt_2.16.0 bit_1.1-10
bitops_1.0-5 BSgenome_1.28.0
[6] ff_2.2-11 genefilter_1.42.0 grid_3.0.0
RCurl_1.95-4.1 rtracklayer_1.20.0
[11] tools_3.0.0 XML_3.96-1.1 xtable_1.7-1
zlibbioc_1.6.0
>
[[alternative HTML version deleted]]