Hi there,
I would like to use MutationalPatterns package to extract mutation signatures for the fission yeast Schizosaccharomyces pombe (Spombe). However, as its genome is not in the BSgenome package, I had to include this genome following the How to Forge a BSgenome Data Package vignette (https://www.bioconductor.org/packages//2.7/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf).
This step was successful. However, I got stuck at the first step of analysis when I load my vcfs into a GRangesList.
Here is the code I used:
##Load MutationalPatterns package
library(MutationalPatterns)
###Call the library
library(BSgenome.Spombe.Pombase.ASM294v2)
ref_genome_pombe <- "BSgenome.Spombe.Pombase.ASM294v2"
library(ref_genome_pombe, character.only = TRUE)
##List the vcf files
vcf_files <- list.files(path = "~/my/path/to/my/vcfs/", pattern = ".vcf$", full.names = TRUE)
###Define the sample names
sample_names <- c("sample1","sample2","sample3","sample4")
##Load all of this into a GRangesList
vcfs <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome_pombe)
And here is the error I got:
Error in seqlevelsStyle(seqlevels(x)) :
The style does not have a compatible entry for the species supported by Seqname. Please see genomeStyles() for supported species/style
I have tried to find information about seqlevelsStyle but as I am starting in bioinformatics I feel that this is maybe out of my field of competences and I need help to solve this. I wonder if the problem comes from the package I installed myself. Just in case, here is the code I used for that process. NB: I have 6 .fa files named SP_chr01.fa --> 06 located into seqs folder.
##data.frame for seed
pombe_dataframe <- data.frame(
Package = "BSgenome.Spombe.Pombase.ASM294v2",
Title = "Full genome sequence for Schizosaccharomyces pombe (Pombase version ASM294v2)",
Description = "Full genome sequence for Schizosaccharomyces pombe (fission yeast) as provided by Pombase (ASM294v2)",
Version = "2.0",
organism = "Schizosaccharomyces pombe",
common_name = "Fission yeast",
provider = "Pombase",
provider_version = "ASM294v2",
release_date = "19th November 2007",
release_name = "ASM294v2",
source_url = "ftp://ftp.pombase.org/pombe/genome_sequence_and_features/genome_sequence/",
organism_biocview = "Schizosaccharomyces_pombe",
BSgenomeObjname = "Spombe",
seqnames = 'paste0("SP_chr", sprintf("%02d", 1:6))',
seqs_srcdir = "~/my/path/to/seqs"
)
##create the dcf file
write.dcf(pombe_dataframe, file = "pombe_seed.dcf", append = FALSE, useBytes = FALSE, indent = 0.1*getOption("width"), width = 0.9*getOption("width"), keep.white = NULL)
##Forge the target package; lead the way to the .dcf file
##Load the BSgenome package
library(BSgenome)
##Forge
forgeBSgenomeDataPkg("~/my/path/to/pombe_seed.dcf")
##Build the package
##Switch off RStudio. Open terminal. Go to the directory where the package is being stored. Then write "R CMD build BSgenome.Spombe.Pombase.ASM294v2" and enter. Check if the package is ok by writing "R CMD check Sgenome.Spombe.Pombase.ASM294v2" and enter. If everything is fine, proceed to installation with "R CMD install BSgenome.Spombe.Pombase.ASM294v2"
Thank you very very much in advance for your help, and sorry in advance if my question is a bit stupid.
Thank you very much for your reply and your help! I have incorporated your comments and I am now facing an other issue. But first, about what I have done.
1) As you suggested first I made sure the chromosome nomenclature used on the VCF file and on the BSgenome package that I forged are the same.
Here is the seed I used to forge the genome package:
Then I ran MutationalPatterns (with no further modification) and I get this error:
2) I tried as you suggested to modify readvcfsas_grange to accept a BS genome object in addition to the name of the BSgenome package and unfortunately I get the same error.
Any idea on where I am doing something wrong? Thank you in advance, Sibyl
As mentioned by Herve Pages, the
read_vcfs_as_granges
function tries to ensure that the chromosome naming style is the same between the vcfs and the BSgenome object. This went wrong, but it seems that you have already fixed this. After that the function also tries to filter for a seqlevel group. By default this isauto+sex
. By usinggroup="none"
in your call toread_vcfs_as_granges
, this filtering will be skipped, which should hopefully solve your issue. Cheers, FreekHi Hervé Pagès, I'll ensure that the next version of the package checks whether the call to
seqlevelsStyle
actually works. I really like thegetBSgenome
function, so I'll make sure that will also be included.