based on the analysis results of an exome sequencing project (3 patients with paired cancer and normal samples-Small Cell Lung Cancer-Genomic DNA captured using Agilent in-solution enrichment methodology/paired-end 75 bases massively parallel sequencing on Illumina HiSeq4000)-
both for the alignment of fastq files, as also for the variant calling procedure, the following reference genome was selected and utilized from gencode:
https://www.gencodegenes.org/releases/current.html (Genome sequence (GRCh38.p12)-Regions-ALL-fasta format)
For my next step, i wanted to use the R package MutationalPatterns, in order to import the resulted vcf files, and inspect common mutational patterns (for SNPs). However, as this package utilizes the R package BSgenome for loading the reference genome:
which, for human includes the options: "BSgenome.Hsapiens.NCBI.GRCh38" and "BSgenome.Hsapiens.UCSC.hg38"
Thus, my crusial question is:
is possible, to also somehow install, modify and/or utilize also the gencode as a reference genome in the BSgenome R package ? with this structure ? in order to use it as a reference genome for my vcf files, in order to proceed for comparing the mutational patterns (SNPs) of these samples ? as described in the above vignette ?
Or alternatively, i could still use one of these two options ? For example the NCBI reference genome, as the relative from UCSC, has no updates from 2013 ? however, with this approach i could introduce considerable bias, from perhaps different annotations regarding the genomic coordinates ?