Error in creating BSGenome package
2
0
Entering edit mode
zen • 0
@zen-22107
Last seen 3.5 years ago

Hello,

I am creating a BSGenome package for Solanum lycopersicum using this seed file and this package is very new to me:

Package: BSgenome.Slycopersicum.SGN.SL3.00
Title: Full genome sequences for Solanum lycopersicum (SGN version SL3.00)
Description: Full genome sequences for Solanum lycopersicum (tomato) as provided by SGN (v3.0, 2017) and stored in Solanum lycopersicum genome browser
Version: 3.00
organism: Solanum lycopersicum
common_name: Tomato
provider: SGN
provider_version: SL3.00
release_date: Feb. 2017
release_name: SL3.00
source_url: https://solgenomics.net/organism/Solanum_lycopersicum/genome/
organism_biocview: Solanum_lycopersicum
BSgenomeObjname: Slycopersicum
seqnames: paste("chr", c(1:12, "Un", paste(c(1:12, "Un"), "_random", sep=""))
seqs_srcdir:/ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/

But I am getting error:

Error in Biobase::createPackage(x@Package, destdir, template_path, symvals) : directory './BSgenome.Slycopersicum.SGN.SL3.00' exists; use unlink=TRUE to remove it, or choose another destination directory

Thank you for helping me!

BSGenome • 2.0k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

Looking at the vignette for this package, I see how you might be confused. So, here's a step-by-step.

1.) Download the fasta file. As in, like, download it to your computer.

2.) If you downloaded the tar.gz file, do tar xvfz S_lycopersicum_chromosomes.3.00.fa.tar.gz. Otherwise get the full fasta to begin with.

3.) In R, after loading BSgenome do

> fasta.seqlengths("S_lycopersicum_chromosomes.3.00.fa")
 SL3.0ch00 SL3.0ch01  SL3.0ch02  SL3.0ch03  SL3.0ch04  SL3.0ch05  SL3.0ch06  
  20852292   98455869   55977580   72290146   66557038   66723567   49794276 
SL3.0ch07  SL3.0ch08  SL3.0ch09  SL3.0ch10  SL3.0ch11  SL3.0ch12  
  68175699   65987440   72906345   65633393   56597135   68126176 

4.) Note that

> paste0("SL3.0ch", sprintf("%02d", 0:12))
 [1] "SL3.0ch00" "SL3.0ch01" "SL3.0ch02" "SL3.0ch03" "SL3.0ch04" "SL3.0ch05"
 [7] "SL3.0ch06" "SL3.0ch07" "SL3.0ch08" "SL3.0ch09" "SL3.0ch10" "SL3.0ch11"
[13] "SL3.0ch12"

generates the same chromosome names. This is important!

5.) For FASTA files, you need one FASTA file per chromosome (it says so in the vignette).

> z <- readDNAStringSet("S_lycopersicum_chromosomes.3.00.fa")
> dir.create("S_lycopersicum_chromosomes.3.00")
> for(i in 1:13) writeXStringSet(z[i,], paste0("S_lycopersicum_chromosomes.3.00/", gsub("\\s+", "", names(z)[i], perl = TRUE), ".fa"))
> dir("S_lycopersicum_chromosomes.3.00/")
 [1] "SL3.0ch00.fa" "SL3.0ch01.fa" "SL3.0ch02.fa" "SL3.0ch03.fa" "SL3.0ch04.fa"
 [6] "SL3.0ch05.fa" "SL3.0ch06.fa" "SL3.0ch07.fa" "SL3.0ch08.fa" "SL3.0ch09.fa"
[11] "SL3.0ch10.fa" "SL3.0ch11.fa" "SL3.0ch12.fa"

6.) Now you need a seed file. It should look like this:

Package: BSgenome.Slycopersicum.SGN.SL3
Title: Full genome sequences for Solanum lycopersicum (SGN version 3)
Description: Full genome sequences for Solanum lycopersicum as provided by SGN.
Version: 0.0.1
Suggests: GenomicFeatures
organism: Solanum lycopersicum
common_name: Tomato
provider: SGN
provider_version: SL3.00
release_date: Feb 2017
release_name: SL3.00
source_url: ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/
organism_biocview: Solanum_lycopersicum
BSgenomeObjname: Slycopersicum
SrcDataFiles: S_lycopersicum_chromosomes.3.00.fa from ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/
seqs_srcdir: C:/Users/jmacdon/Desktop/S_lycopersicum_chromosomes.3.00
seqnames: paste0("SL3.0ch", sprintf("%02d", 0:12))

EDIT Note that the last line has the same R code that generates the chromosome names that I showed in step 4! In addition this is a text file that I saved on my computer as "Slycopersicum-seed".

ALSO NOTE THAT the seqs_srcdir has to point to the directory that you put your FASTA files in! Mine points to a dir on my computer, so don't use that.

7.) Build and install

> forgeBSgenomeDataPkg("Slycopersicum-seed")
Creating package in ./BSgenome.Slycopersicum.SGN.SL3 
Loading 'SL3.0ch00' sequence from FASTA file 'C:/Users/jmacdon/Desktop/S_lycopersicum_chromosomes.3.00/SL3.0ch00.fa' ... DONE
<snip>
Writing all sequences to './BSgenome.Slycopersicum.SGN.SL3/inst/extdata/single_sequences.2bit' ... DONE
> install.packages("BSgenome.Slycopersicum.SGN.SL3/", repos = NULL, type = "source") 
## I'm on Windows so I need to say 'source'
<snip>
* DONE (BSgenome.Slycopersicum.SGN.SL3)
> library(BSgenome.Slycopersicum.SGN.SL3)
> ls(2)
[1] "BSgenome.Slycopersicum.SGN.SL3" "Slycopersicum"                 
> Slycopersicum
Tomato genome:
# organism: Solanum lycopersicum (Tomato)
# provider: SGN
# provider version: SL3.00
# release date: Feb 2017
# release name: SL3.00
# 13 sequences:
#   SL3.0ch00 SL3.0ch01 SL3.0ch02 SL3.0ch03 SL3.0ch04 SL3.0ch05 SL3.0ch06
#   SL3.0ch07 SL3.0ch08 SL3.0ch09 SL3.0ch10 SL3.0ch11 SL3.0ch12          
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)

Et voila!

ADD COMMENT
0
Entering edit mode

Thank you so much for such a detailed step by step explanation. Although I got a few warnings, it worked and the package is loaded.

ADD REPLY
0
Entering edit mode

Hi Jim,

Looking at the vignette for this package, I see how you might be confused.

I'd love to improve the vignette so if you could provide more details about what you find confusing that would be great. Thanks!

H.

ADD REPLY
0
Entering edit mode

Hi Herve,

I think the confusing part is that there isn't a basic overview to get people oriented. All they need are two files; a genome and a text file that describes it. The tricky part is the acceptable format of the genome and the seed file.

The vignette is complete as is, but it can be TL;DR; if the end user already has a genome in the acceptable format.

For example, if an end user has a 2bit file, they don't need to know anything more about the genome, and can go on to generating the seed file. The same is true if they have a multi-chromosome FASTA file. It's only tricky if they have something different (like the OP) and have to either convert to a multi-chromosome FASTA file or 2bit. If the vignette were HTML you could say what they need, and if they have the 2bit or multi-chromosome FASTA file, provide a link to go to the seed file section. If they don't, then provide a link to go to a section that has more information about generating the correct format for the genome.

It's a bit different for the seed file. You have a whole section that shows all the fields that people could use, and what goes in each field. If you just want to build a basic package and don't need to get fancy, the easiest thing to do is just copy an existing seed file and modify to suit (which is what I did). If the vignette just said to do that, provided code to copy an existing file to the working directory and gave a basic idea of what should go into the fields, that might be sufficient for most. You could then have a link that takes people to the more detailed description of all the fields, for those who want or need to include more detail.

ADD REPLY
0
Entering edit mode

Thanks for the feedback. Very useful. I'll work on that.

H.

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

So there's three parts to that error message. The first part tells you what function had the error

Error in Biobase::createPackage(x@Package, destdir, template_path, symvals) :

And the second part explains what the problem is

directory './BSgenome.Slycopersicum.SGN.SL3.00' exists

And the third part gives you a couple of helpful suggestions

use unlink=TRUE to remove it, or choose another destination directory

The idea is that you would read that and it would be self-explanatory, and you would then make changes and go ahead with what you are doing. But evidently it wasn't self-explanatory? Can you say what was confusing, so perhaps we could improve?

ADD COMMENT
0
Entering edit mode

Thank you! I am not sure where does this directory exists. I checked the available genomes and it is not there.

ADD REPLY
0
Entering edit mode

When you make a BSgenome package, you are generating everything required for the package installation in your working directory. Like an actual directory called BSgenome.Slycopersicum.SGN.SL3.00, that contains a bunch of subdirectories and whatnot. You can then install that package and use it. Presumably you have read the vignette?

What R is telling you is that you have already run forgeBSgenomeDataPkg, and you have generated the package, and you can now install. Which is also described in the vignette.

If you don't know where the directory exists, it's in your working directory! Or maybe you passed a different directory, using the destDir argument? Probably not, in which case you can use getwd to figure out what the current working directory is.

ADD REPLY
0
Entering edit mode

Update: I have got this:

R CMD INSTALL BSgenome.Slycopersicum.SGN.SL3.00_3.00.tar.gz
* installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
* installing *source* package ‘BSgenome.Slycopersicum.SGN.SL3.00’ ...
** using staged installation
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Warning: package ‘S4Vectors’ was built under R version 3.6.1
Warning: package ‘IRanges’ was built under R version 3.6.1
Warning: package ‘GenomicRanges’ was built under R version 3.6.1
Warning: package ‘rtracklayer’ was built under R version 3.6.1
** testing if installed package can be loaded from final location
Warning: package ‘S4Vectors’ was built under R version 3.6.1
Warning: package ‘IRanges’ was built under R version 3.6.1
Warning: package ‘GenomicRanges’ was built under R version 3.6.1
Warning: package ‘rtracklayer’ was built under R version 3.6.1
** testing if installed package keeps a record of temporary installation path
* DONE (BSgenome.Slycopersicum.SGN.SL3.00)

When I tried to load it to R for plotKaryotype, I get:

Error in is(genome, "GRanges") : object 'Slycopersicum' not found
ADD REPLY
0
Entering edit mode

Not sure what's going on exactly but the fact that you have

seqs_srcdir:/ftp://ftp.solgenomics.net/tomato_genome/assembly/build_3.00/

in your seed file does not look good. As explained in the vignette, the seqs_srcdir folder must be local:

So we assume that you've downloaded the sequence data files and that they are now located in a folder on your machine. From now on, we'll refer to this folder as the seqs_srcdir folder.

So I'm surprised that the forging step (i.e. forgeBSgenomeDataPkg("path/to/your/seed")) worked. Did it?

ADD REPLY
0
Entering edit mode

Yes, forging did not give any error. But it still is not loading.

ADD REPLY
0
Entering edit mode

Yes, forging did not give any error. But it still is not loading.

ADD REPLY
0
Entering edit mode

This most likely means that you haven't loaded the package. OR it may be that the object is actually called Slycopersicum.SGN.SL3 or some such. You can tell by doing

library(BSgenome.Slycopersicum.SGN.SL3.00)
ls(2)

As an example

> library(BSgenome.Scerevisiae.UCSC.sacCer1)
> ls(2)
 [1] "BSgenome.Scerevisiae.UCSC.sacCer1" "Scerevisiae"

So I now know the nickname for this object is Scerevisiae

ADD REPLY
0
Entering edit mode

Thank you! But it does not give any nickname:

ls(2)
character(0)

I think it's not loaded properly.

ADD REPLY
0
Entering edit mode

upvoting just because of how nice you wrote the comment. Other people need to do this more often :-)

ADD REPLY

Login before adding your answer.

Traffic: 796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6