Hello all I just tried to use with my data PGA package and i had an error in dbCreator:
library(BSgenome.Hsapiens.UCSC.hg19)
dbfile <- dbCreator(gtfFile=gtffile,vcfFile=vcffile,bedFile=bedfile,annotation_path=annotation_path,outfile_name=outfile_name,genome=Hsapiens,outdir=outfile_path)
Error in y[, z] : subscript out of bounds
this is the vcf header:
##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
#CHROM POS ID REF ALT QUAL
chr1 10043516 . A G 0.01
chr1 10044378 . A G 0.01
the gtf header:
iarc@iarc-H8QG6:/New_data/pancreatic/new_data$ head -20 new_panc/merged_asm/merged.gtf
1 Cufflinks exon 852260 857889 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
1 Cufflinks exon 857991 858025 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
1 Cufflinks exon 878110 878438 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "1"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
1 Cufflinks exon 878633 878757 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "2"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
bedfile header:
1 12697 13221 JUNC_0 0 + 12697 13221 255,0,0 2 50,50 0,575
1 14829 14970 JUNC_1 1 - 14829 14970 255,0,0 2 50,50 0,192
1 14829 15021 JUNC_2 0 - 14829 15021 255,0,0 2 50,50 0,243
1 15012 25233 JUNC_3 0 + 15012 25233 255,0,0 2 50,50 0,10272
1 15038 15796 JUNC_4 3 - 15038 15796 255,0,0 2 50,50 0,809
annotation prepared with PrepareAnnotationRefseq2
any thoughts?
Thank you for the patience to format it! and it passed straight from me this incompatibility of chr within files!! lol Thanks I fixed merged gtf and the bed file with chr but still I am getting the same error!
No problem. Okay, the next problem is that your VCF is corrupt and is not valid. A VCF should have the following mandatory headers:
You need to go back a few steps and determine how this corrupt VCF was created