Hello, I am new to using bioconductor and new to genetics in general- sorry if this is an obvious question. I am trying to get started using the SomaticSignatures package. I have several TCGA subjects with .vcf files, and each subject has a .bam sequence file. I understand I need to load both the sequence and the reference to calculate signatures.
First, I converted the .bam file to .fa, and generated an index, using samtools. Then, I loaded the data as follows:
fa_A <- FaFile("sub1.fa")
dat <- readVcfAsVRanges("sub1.vcf", fa_A)
vr_A = mutationContext(dat, fa_A)
The last line returns: Error in value[[3L]](cone) : record 1 (1:12837280-12837282) failed
file: sub1.fa
as a starting point, can anyone tell me what this errors means?
Are you sure you have the reference genome in your bam file? Bam files normally contain sequencing reads and the position the align to in a reference, but not the reference sequence itself.
Given the TCGA samples are human, you can probably use the
BSgenome.Hsapiens.UCSC.hg19
reference package, as in section 4.2 of the SomaticSignatures vignette.thanks for the reply, it was helpful. i was able to get some basic code working using the BSgenome reference package. the SomaticSignatures vignette states that a Fasta file can be used naturally, and i have this for each subject, so i was hoping to incorporate that. maybe i can figure out how to include the reference sequence in the fast file. thanks again for your help.