Question: Error in creating BSGenome package
0
5 weeks ago by
zen0
zen0 wrote:

Hello,

I am creating a BSGenome package for Solanum lycopersicum using this seed file and this package is very new to me:

Package: BSgenome.Slycopersicum.SGN.SL3.00
Title: Full genome sequences for Solanum lycopersicum (SGN version SL3.00)
Description: Full genome sequences for Solanum lycopersicum (tomato) as provided by SGN (v3.0, 2017) and stored in Solanum lycopersicum genome browser
Version: 3.00
organism: Solanum lycopersicum
common_name: Tomato
provider: SGN
provider_version: SL3.00
release_date: Feb. 2017
release_name: SL3.00
source_url: https://solgenomics.net/organism/Solanum_lycopersicum/genome/
organism_biocview: Solanum_lycopersicum
BSgenomeObjname: Slycopersicum
seqnames: paste("chr", c(1:12, "Un", paste(c(1:12, "Un"), "_random", sep=""))
seqs_srcdir:/ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/


But I am getting error:

Error in Biobase::createPackage(x@Package, destdir, template_path, symvals) : directory './BSgenome.Slycopersicum.SGN.SL3.00' exists; use unlink=TRUE to remove it, or choose another destination directory


Thank you for helping me!

bsgenome • 153 views
modified 4 weeks ago by James W. MacDonald51k • written 5 weeks ago by zen0
Answer: Error in creating BSGenome package
2
4 weeks ago by
United States
James W. MacDonald51k wrote:

Looking at the vignette for this package, I see how you might be confused. So, here's a step-by-step.

2.) If you downloaded the tar.gz file, do tar xvfz S_lycopersicum_chromosomes.3.00.fa.tar.gz. Otherwise get the full fasta to begin with.

3.) In R, after loading BSgenome do

> fasta.seqlengths("S_lycopersicum_chromosomes.3.00.fa")
SL3.0ch00 SL3.0ch01  SL3.0ch02  SL3.0ch03  SL3.0ch04  SL3.0ch05  SL3.0ch06
20852292   98455869   55977580   72290146   66557038   66723567   49794276
SL3.0ch07  SL3.0ch08  SL3.0ch09  SL3.0ch10  SL3.0ch11  SL3.0ch12
68175699   65987440   72906345   65633393   56597135   68126176


4.) Note that

> paste0("SL3.0ch", sprintf("%02d", 0:12))
[1] "SL3.0ch00" "SL3.0ch01" "SL3.0ch02" "SL3.0ch03" "SL3.0ch04" "SL3.0ch05"
[7] "SL3.0ch06" "SL3.0ch07" "SL3.0ch08" "SL3.0ch09" "SL3.0ch10" "SL3.0ch11"
[13] "SL3.0ch12"


generates the same chromosome names. This is important!

5.) For FASTA files, you need one FASTA file per chromosome (it says so in the vignette).

> z <- readDNAStringSet("S_lycopersicum_chromosomes.3.00.fa")
> dir.create("S_lycopersicum_chromosomes.3.00")
> for(i in 1:13) writeXStringSet(z[i,], paste0("S_lycopersicum_chromosomes.3.00/", gsub("\\s+", "", names(z)[i], perl = TRUE), ".fa"))
> dir("S_lycopersicum_chromosomes.3.00/")
[1] "SL3.0ch00.fa" "SL3.0ch01.fa" "SL3.0ch02.fa" "SL3.0ch03.fa" "SL3.0ch04.fa"
[6] "SL3.0ch05.fa" "SL3.0ch06.fa" "SL3.0ch07.fa" "SL3.0ch08.fa" "SL3.0ch09.fa"
[11] "SL3.0ch10.fa" "SL3.0ch11.fa" "SL3.0ch12.fa"


6.) Now you need a seed file. It should look like this:

Package: BSgenome.Slycopersicum.SGN.SL3
Title: Full genome sequences for Solanum lycopersicum (SGN version 3)
Description: Full genome sequences for Solanum lycopersicum as provided by SGN.
Version: 0.0.1
Suggests: GenomicFeatures
organism: Solanum lycopersicum
common_name: Tomato
provider: SGN
provider_version: SL3.00
release_date: Feb 2017
release_name: SL3.00
source_url: ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/
organism_biocview: Solanum_lycopersicum
BSgenomeObjname: Slycopersicum
SrcDataFiles: S_lycopersicum_chromosomes.3.00.fa from ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/
seqs_srcdir: C:/Users/jmacdon/Desktop/S_lycopersicum_chromosomes.3.00
seqnames: paste0("SL3.0ch", sprintf("%02d", 0:12))


EDIT Note that the last line has the same R code that generates the chromosome names that I showed in step 4! In addition this is a text file that I saved on my computer as "Slycopersicum-seed".

ALSO NOTE THAT the seqs_srcdir has to point to the directory that you put your FASTA files in! Mine points to a dir on my computer, so don't use that.

7.) Build and install

> forgeBSgenomeDataPkg("Slycopersicum-seed")
Creating package in ./BSgenome.Slycopersicum.SGN.SL3
<snip>
Writing all sequences to './BSgenome.Slycopersicum.SGN.SL3/inst/extdata/single_sequences.2bit' ... DONE
> install.packages("BSgenome.Slycopersicum.SGN.SL3/", repos = NULL, type = "source")
## I'm on Windows so I need to say 'source'
<snip>
* DONE (BSgenome.Slycopersicum.SGN.SL3)
> library(BSgenome.Slycopersicum.SGN.SL3)
> ls(2)
[1] "BSgenome.Slycopersicum.SGN.SL3" "Slycopersicum"
> Slycopersicum
Tomato genome:
# organism: Solanum lycopersicum (Tomato)
# provider: SGN
# provider version: SL3.00
# release date: Feb 2017
# release name: SL3.00
# 13 sequences:
#   SL3.0ch00 SL3.0ch01 SL3.0ch02 SL3.0ch03 SL3.0ch04 SL3.0ch05 SL3.0ch06
#   SL3.0ch07 SL3.0ch08 SL3.0ch09 SL3.0ch10 SL3.0ch11 SL3.0ch12
# (use 'seqnames()' to see all the sequence names, use the '\$' or '[[' operator
# to access a given sequence)


Et voila!

Thank you so much for such a detailed step by step explanation. Although I got a few warnings, it worked and the package is loaded.

Hi Jim,

Looking at the vignette for this package, I see how you might be confused.

I'd love to improve the vignette so if you could provide more details about what you find confusing that would be great. Thanks!

H.

Hi Herve,

I think the confusing part is that there isn't a basic overview to get people oriented. All they need are two files; a genome and a text file that describes it. The tricky part is the acceptable format of the genome and the seed file.

The vignette is complete as is, but it can be TL;DR; if the end user already has a genome in the acceptable format.

For example, if an end user has a 2bit file, they don't need to know anything more about the genome, and can go on to generating the seed file. The same is true if they have a multi-chromosome FASTA file. It's only tricky if they have something different (like the OP) and have to either convert to a multi-chromosome FASTA file or 2bit. If the vignette were HTML you could say what they need, and if they have the 2bit or multi-chromosome FASTA file, provide a link to go to the seed file section. If they don't, then provide a link to go to a section that has more information about generating the correct format for the genome.

It's a bit different for the seed file. You have a whole section that shows all the fields that people could use, and what goes in each field. If you just want to build a basic package and don't need to get fancy, the easiest thing to do is just copy an existing seed file and modify to suit (which is what I did). If the vignette just said to do that, provided code to copy an existing file to the working directory and gave a basic idea of what should go into the fields, that might be sufficient for most. You could then have a link that takes people to the more detailed description of all the fields, for those who want or need to include more detail.

Thanks for the feedback. Very useful. I'll work on that.

H.

Answer: Error in creating BSGenome package
1
5 weeks ago by
United States
James W. MacDonald51k wrote:

So there's three parts to that error message. The first part tells you what function had the error

Error in Biobase::createPackage(x@Package, destdir, template_path, symvals) :


And the second part explains what the problem is

directory './BSgenome.Slycopersicum.SGN.SL3.00' exists


And the third part gives you a couple of helpful suggestions

use unlink=TRUE to remove it, or choose another destination directory


The idea is that you would read that and it would be self-explanatory, and you would then make changes and go ahead with what you are doing. But evidently it wasn't self-explanatory? Can you say what was confusing, so perhaps we could improve?

Thank you! I am not sure where does this directory exists. I checked the available genomes and it is not there.

When you make a BSgenome package, you are generating everything required for the package installation in your working directory. Like an actual directory called BSgenome.Slycopersicum.SGN.SL3.00, that contains a bunch of subdirectories and whatnot. You can then install that package and use it. Presumably you have read the vignette?

What R is telling you is that you have already run forgeBSgenomeDataPkg, and you have generated the package, and you can now install. Which is also described in the vignette.

If you don't know where the directory exists, it's in your working directory! Or maybe you passed a different directory, using the destDir argument? Probably not, in which case you can use getwd to figure out what the current working directory is.

Update: I have got this:

R CMD INSTALL BSgenome.Slycopersicum.SGN.SL3.00_3.00.tar.gz
* installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
* installing *source* package ‘BSgenome.Slycopersicum.SGN.SL3.00’ ...
** using staged installation
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Warning: package ‘S4Vectors’ was built under R version 3.6.1
Warning: package ‘IRanges’ was built under R version 3.6.1
Warning: package ‘GenomicRanges’ was built under R version 3.6.1
Warning: package ‘rtracklayer’ was built under R version 3.6.1
** testing if installed package can be loaded from final location
Warning: package ‘S4Vectors’ was built under R version 3.6.1
Warning: package ‘IRanges’ was built under R version 3.6.1
Warning: package ‘GenomicRanges’ was built under R version 3.6.1
Warning: package ‘rtracklayer’ was built under R version 3.6.1
** testing if installed package keeps a record of temporary installation path
* DONE (BSgenome.Slycopersicum.SGN.SL3.00)


When I tried to load it to R for plotKaryotype, I get:

Error in is(genome, "GRanges") : object 'Slycopersicum' not found


Not sure what's going on exactly but the fact that you have

seqs_srcdir:/ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/


in your seed file does not look good. As explained in the vignette, the seqs_srcdir folder must be local:

So we assume that you've downloaded the sequence data files and that they are now located in a folder on your machine. From now on, we'll refer to this folder as the seqs_srcdir folder.

So I'm surprised that the forging step (i.e. forgeBSgenomeDataPkg("path/to/your/seed")) worked. Did it?

Yes, forging did not give any error. But it still is not loading.

Yes, forging did not give any error. But it still is not loading.

This most likely means that you haven't loaded the package. OR it may be that the object is actually called Slycopersicum.SGN.SL3 or some such. You can tell by doing

library(BSgenome.Slycopersicum.SGN.SL3.00)
ls(2)


As an example

> library(BSgenome.Scerevisiae.UCSC.sacCer1)
> ls(2)
[1] "BSgenome.Scerevisiae.UCSC.sacCer1" "Scerevisiae"


So I now know the nickname for this object is Scerevisiae

Thank you! But it does not give any nickname:

ls(2)
character(0)


I think it's not loaded properly.