Error in seqlevelsStyle after creating a BSgenome package
1
0
Entering edit mode
@sibylbertrand-23174
Last seen 2.5 years ago

Hi there,

I would like to use MutationalPatterns package to extract mutation signatures for the fission yeast Schizosaccharomyces pombe (Spombe). However, as its genome is not in the BSgenome package, I had to include this genome following the How to Forge a BSgenome Data Package vignette (https://www.bioconductor.org/packages//2.7/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf).

This step was successful. However, I got stuck at the first step of analysis when I load my vcfs into a GRangesList.

Here is the code I used:

##Load MutationalPatterns package
library(MutationalPatterns)

###Call the library
library(BSgenome.Spombe.Pombase.ASM294v2)
ref_genome_pombe <- "BSgenome.Spombe.Pombase.ASM294v2"
library(ref_genome_pombe, character.only = TRUE)

##List the vcf files
vcf_files <- list.files(path = "~/my/path/to/my/vcfs/", pattern = ".vcf\$", full.names = TRUE)

###Define the sample names
sample_names <- c("sample1","sample2","sample3","sample4")

##Load all of this into a GRangesList


And here is the error I got:

Error in seqlevelsStyle(seqlevels(x)) :
The style does not have a compatible entry for the species supported by Seqname. Please see genomeStyles() for supported species/style



I have tried to find information about seqlevelsStyle but as I am starting in bioinformatics I feel that this is maybe out of my field of competences and I need help to solve this. I wonder if the problem comes from the package I installed myself. Just in case, here is the code I used for that process. NB: I have 6 .fa files named SP_chr01.fa --> 06 located into seqs folder.

##data.frame for seed
pombe_dataframe <- data.frame(
Package = "BSgenome.Spombe.Pombase.ASM294v2",
Title = "Full genome sequence for Schizosaccharomyces pombe (Pombase version ASM294v2)",
Description = "Full genome sequence for Schizosaccharomyces pombe (fission yeast) as provided by Pombase (ASM294v2)",
Version = "2.0",
organism = "Schizosaccharomyces pombe",
common_name = "Fission yeast",
provider = "Pombase",
provider_version = "ASM294v2",
release_date = "19th November 2007",
release_name = "ASM294v2",
source_url = "ftp://ftp.pombase.org/pombe/genome_sequence_and_features/genome_sequence/",
organism_biocview = "Schizosaccharomyces_pombe",
BSgenomeObjname = "Spombe",
seqnames = 'paste0("SP_chr", sprintf("%02d", 1:6))',
seqs_srcdir = "~/my/path/to/seqs"
)

##create the dcf file
write.dcf(pombe_dataframe, file = "pombe_seed.dcf", append = FALSE, useBytes = FALSE, indent = 0.1*getOption("width"), width = 0.9*getOption("width"), keep.white = NULL)

##Forge the target package; lead the way to the .dcf file
library(BSgenome)

##Forge
forgeBSgenomeDataPkg("~/my/path/to/pombe_seed.dcf")

##Build the package
##Switch off RStudio. Open terminal. Go to the directory where the package is being stored. Then write "R CMD build BSgenome.Spombe.Pombase.ASM294v2" and enter. Check if the package is ok by writing "R CMD check Sgenome.Spombe.Pombase.ASM294v2" and enter. If everything is fine, proceed to installation with "R CMD install BSgenome.Spombe.Pombase.ASM294v2"


Thank you very very much in advance for your help, and sorry in advance if my question is a bit stupid.

bsgenome MutationalPatterns • 814 views
0
Entering edit mode
@herve-pages-1542
Last seen 6 hours ago
Seattle, WA, United States

Note that since you're getting this error while using the MutationalPatterns package, it would have made sense to also tag your post with MutationalPatterns. That would have notified the MutationalPatterns maintainers and they would have been able to help.

The man page for read_vcfs_as_granges says:

In addition to loading the files, this function applies the same seqlevel style to the GRanges objects as the reference genome passed in the 'genome' parameter.

It could be a little bit clearer but the way I understand this is that the BSgenome passed to the genome argument is expected to use the same naming convention (e.g. UCSC or NCBI) for the chromosomes as in the VCF files. And if it's not the case, then read_vcfs_as_granges will try to change the chromosome names of one object to match the naming convention used by the other object. It will do this by calling seqlevelsStyle() defined in the GenomeInfoDb package.

Unfortunately, in your case, the internal call to seqlevelsStyle() fails (and I wish the error message raised by seqlevelsStyle was a little bit less cryptic). My guess is that it fails because it doesn't know how to change the naming style.

seqlevelsStyle() uses some heuristic to detect/change the naming style that is far from being perfect. In particular it only works for the most well-known styles like UCSC, NCBI, and Ensembl, and not for all organisms supported by these organizations. These are known issues and there is some work in progress for improving seqlevelsStyle(). In any case, when using it in package code like in MutationalPatterns::read_vcfs_as_granges, package authors should not assume that it will always work and ideally their code should be able to handle a possible failure in a graceful manner.

Anyway one workaround for your particular use case is that you forge BSgenome.Spombe.Pombase.ASM294v2 again and make sure that it uses the same chromosome naming convention as in your VCF files. Not the most convenient solution, sorry.

One problem with read_vcfs_as_granges is that it seems to only accept the name of a BSgenome package, but not a BSgenome object. If that was the case, you wouldn't need to re-forge BSgenome.Spombe.Pombase.ASM294v2. Instead you could just load the package and rename the chromosomes in the BSgenome object before passing it to read_vcfs_as_granges.

Note that it's very easy to modify read_vcfs_as_granges to accept a BSgenome object in addition to the name of a BSgenome package. They just need to replace this line:

    ref_genome <- base::get(genome)


with:

    ref_genome <- getBSgenome(genome)


If you want to try this you need to modify the source of the MutationalPatterns package and re-install it. If all goes well you could submit this change to the MutationalPatterns authors for their consideration (ideally via a PR on GitHub if the package is maintained there).

Cheers,

H.

0
Entering edit mode

1) As you suggested first I made sure the chromosome nomenclature used on the VCF file and on the BSgenome package that I forged are the same.

Here is the seed I used to forge the genome package:

   ##dataframe for seed
pombe_dataframe <- data.frame (
Package = "BSgenome.Spombe.Pombase.ASM294v2",
Title = "Full genome sequence for Schizosaccharomyces pombe (Pombase version ASM294v2)",
Description = "Full genome sequence for Schizosaccharomyces pombe (fission yeast) as provided by Pombase ASM294v2)",
Version = "2.0",
Author = "The Bioconductor Dev Team",
Maintainer = "Bioconductor Package Maintainer <maintainer@bioconductor.org>",
organism = "Schizosaccharomyces pombe",
common_name = "Fission yeast",
provider = "Pombase",
provider_version = "ASM294v2",
release_date = "19th November 2007",
release_name = "ASM294v2",
source_url = "ftp://ftp.pombase.org/pombe/genome_sequence_and_features/genome_sequence/",
organism_biocview = "Schizosaccharomyces_pombe",
BSgenomeObjname = "Spombe",
seqnames = 'paste0("chr", c("I", "II", "III", "MT", "AB325691"), sep = "")',
seqs_srcdir = "~/path/to/the/seqs")


Then I ran MutationalPatterns (with no further modification) and I get this error:

Error in FUN(X[[i]], ...) :
Error in extractSeqlevelsByGroup(species = ref_organism, style = ref_style,  :
The style specified by 'UCSC' does not have a compatible entry for the species Schizosaccharomyces pombe


2) I tried as you suggested to modify readvcfsas_grange to accept a BS genome object in addition to the name of the BSgenome package and unfortunately I get the same error.

Any idea on where I am doing something wrong? Thank you in advance, Sibyl

0
Entering edit mode

As mentioned by Herve Pages, the read_vcfs_as_granges function tries to ensure that the chromosome naming style is the same between the vcfs and the BSgenome object. This went wrong, but it seems that you have already fixed this. After that the function also tries to filter for a seqlevel group. By default this is auto+sex. By using group="none" in your call to read_vcfs_as_granges, this filtering will be skipped, which should hopefully solve your issue. Cheers, Freek

0
Entering edit mode

Hi Hervé Pagès, I'll ensure that the next version of the package checks whether the call to seqlevelsStyle actually works. I really like the getBSgenome function, so I'll make sure that will also be included.