forging a BSgenome data package - Seed file issue
2
0
Entering edit mode
n.mary • 0
@615fff84
Last seen 3.5 years ago
France

Hello,

I would like to create a BSgenome of Apis Mellifera (honey bee) with the last assembly (Ame_HAv3.1). The current package of Apis Mellifera is "BSgenome.Amellifera.BeeBase.assembly4" from 2005.

Here is my directory with the .fasta files for each chromosomes and the seed file:

ls /media/nmary/DONNEES/Abeille/Temp/
BSgenome.Amellifera.NCBI.Amel-HAv3.1.seed  
chrLG15.fa.gz  
chrLG5.fa.gz
chrLG10.fa.gz                              
chrLG16.fa.gz  
chrLG6.fa.gz
chrLG11.fa.gz                              
chrLG1.fa.gz   
chrLG7.fa.gz
chrLG12.fa.gz                              
chrLG2.fa.gz   
chrLG8.fa.gz
chrLG13.fa.gz                              
chrLG3.fa.gz   
chrLG9.fa.gz
chrLG14.fa.gz                              
chrLG4.fa.gz

Here is my seed file

cat /media/nmary/DONNEES/Abeille/Temp/BSgenome.Amellifera.NCBI.Amel-HAv3.1.seed 
Package: BSgenome.Amellifera.NCBI.Amel-HAv3-1
Title: Full genome sequences for Apis mellifera (NCBI version Amel.HAv3.1)
Description: Full genome sequences for Apis mellifera (Honey Bee) as provided by NCBI (2018-09-10) and stored in Biostrings objects.
Version: 0.1
License: Artistic-2.0
Author: Nicolas Mary
Maintainer: Mary <nicolas.mary@envt.fr>
organism: Apis mellifera
genome: Amel-HAv3-1
common_name: Honey Bee
provider: NCBI
release_date: Sept. 2018
source_url: https://ftp.ncbi.nlm.nih.gov/genomes/refseq/invertebrate/Apis_mellifera/latest_assembly_versions/GCF_003254395.2_Amel_HAv3.1/GCF_003254395.2_Amel_HAv3.1_assembly_structure/Primary_Assembly/assembled_chromosomes/FASTA/
organism_biocview: Apis_mellifera
BSgenomeObjname: Amellifera
seqs_srcdir: /media/nmary/DONNEES/Abeille/Temp/

I think I have an issue when a use the command forgeBSgenomeDataPkg:

forgeBSgenomeDataPkg("/media/nmary/DONNEES/Abeille/Temp/BSgenome.Amellifera.NCBI.Amel-HAv3.1.seed")
Error in makeS4FromList("BSgenomeDataPkgSeed", x) : 
  some names in 'x' are not valid BSgenomeDataPkgSeed slots (Genome)

I tryed to remove the "genome" field but it still not working

forgeBSgenomeDataPkg("/media/nmary/DONNEES/Abeille/Temp/BSgenome.Amellifera.NCBI.Amel-HAv3.1.seed")
Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,  : 
  values for symbols PROVIDERVERSION, RELEASENAME are not single strings

If anyone can help... thanks in advance.

sessionInfo( )
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] BSgenome_1.52.0      rtracklayer_1.44.4   Biostrings_2.52.0   
 [4] XVector_0.24.0       GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 
 [7] IRanges_2.18.3       S4Vectors_0.22.1     BiocGenerics_0.30.0 
[10] LEA_2.6.0            hexbin_1.28.1        snpStats_1.34.0     
[13] Matrix_1.2-18        survival_3.2-7       ade4_1.7-16         
[16] sp_1.4-4             apcluster_1.4.8     

loaded via a namespace (and not attached):
 [1] Biobase_2.44.0              splines_3.6.1              
 [3] shiny_1.5.0                 assertthat_0.2.1           
 [5] GenomeInfoDbData_1.2.1      Rsamtools_2.0.3            
 [7] yaml_2.2.1                  progress_1.2.2             
 [9] ggrepel_0.8.2               pillar_1.4.6               
[11] lattice_0.20-41             glue_1.4.2                 
[13] digest_0.6.27               promises_1.1.1             
[15] colorspace_2.0-0            htmltools_0.5.0            
[17] httpuv_1.5.4                plyr_1.8.6                 
[19] XML_3.99-0.3                pkgconfig_2.0.3            
[21] zlibbioc_1.30.0             purrr_0.3.4                
[23] xtable_1.8-4                scales_1.1.1               
[25] later_1.1.0.1               BiocParallel_1.18.1        
[27] tibble_3.0.4                generics_0.1.0             
[29] ggplot2_3.3.2               ellipsis_0.3.1             
[31] SummarizedExperiment_1.14.1 cli_2.1.0                  
[33] magrittr_1.5                crayon_1.3.4               
[35] mime_0.9                    fansi_0.4.1                
[37] nlme_3.1-150                MASS_7.3-53                
[39] tools_3.6.1                 prettyunits_1.1.1          
[41] hms_0.5.3                   matrixStats_0.57.0         
[43] lifecycle_0.2.0             stringr_1.4.0              
[45] munsell_0.5.0               DelayedArray_0.10.0        
[47] compiler_3.6.1              rlang_0.4.8                
[49] grid_3.6.1                  RCurl_1.98-1.2             
[51] ggridges_0.5.2              rstudioapi_0.12            
[53] igraph_1.2.6                bitops_1.0-6               
[55] gtable_0.3.0                codetools_0.2-18           
[57] reshape2_1.4.4              R6_2.5.0                   
[59] GenomicAlignments_1.20.1    knitr_1.30                 
[61] dplyr_1.0.2                 fastmap_1.0.1              
[63] seqinr_4.2-4                ape_5.4-1                  
[65] stringi_1.5.3               Rcpp_1.0.5                 
[67] vctrs_0.3.4                 xfun_0.19                  
[69] tidyselect_1.1.0
seedfiles BSgenome • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

You should upgrade to the current release - your version is two releases behind the times. Anyway, you are missing some of the required fields, which is what is causing the error.

The simplest thing to do is to just grab an existing seed file from the BSgenome package, and fill in the fields with your data. That way you know you will have a functional seed right from the beginning. You can do something like

file.copy(paste0(path.package("BSgenome"), 
              "/extdata/GentlemanLab/BSgenome.Amellifera.BeeBase.assembly4-seed"), "newseed")

## Which looks like

Package: BSgenome.Amellifera.BeeBase.assembly4
Title: Full genome sequences for Apis mellifera (BeeBase assembly4)
Description: iFull genome sequences for Apis mellifera (Honey Bee) as provided by BeeBase (assembly4, Feb. 2008) and stored in Biostrings objects.
Version: 1.4.2
organism: Apis mellifera
common_name: Honey Bee
provider: BeeBase
provider_version: assembly4
release_date: Feb. 2008
release_name: assembly4
source_url: NA
organism_biocview: Apis_mellifera
BSgenomeObjname: Amellifera
seqnames: paste("Group", 1:16, sep="")
mseqnames: "GroupUn"
PkgExamples: genome$Group1  # same as genome[["Group1"]]
seqs_srcdir: /fh/fast/morgan_m/BioC/BSgenomeForge/srcdata/BSgenome.Amellifera.BeeBase.assembly4/seqs

And now that file will be in your current working directory and you can edit to suit. BUT do note that this file still isn't right! You will need to change provider_version to genome and delete the release_name row, which is deprecated (although you will just get a warning for that line).

ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 11 hours ago
Seattle, WA, United States

Hi Mary,

BSgenome.Amellifera.BeeBase.assembly4 is very old indeed. Time for an update. I'll put together BSgenome.Amellifera.UppsalaUniversity.AmelHAv3.1. Should become available in the next 24h or so.

Best,

H.

Edit: The name of the new package will be BSgenome.Amellifera.NCBI.AmelHAv3.1, not BSgenome.Amellifera.UppsalaUniversity.AmelHAv3.1.

ADD COMMENT
1
Entering edit mode

BSgenome.Amellifera.NCBI.AmelHAv3.1 is now available:

library(BiocManager)
install("BSgenome.Amellifera.NCBI.AmelHAv3.1")

library(BSgenome.Amellifera.NCBI.AmelHAv3.1)
seqinfo(BSgenome.Amellifera.NCBI.AmelHAv3.1)
# Seqinfo object with 177 sequences (1 circular) from Amel_HAv3.1 genome:
#   seqnames    seqlengths isCircular      genome
#   Group1        27754200      FALSE Amel_HAv3.1
#   Group2        16089512      FALSE Amel_HAv3.1
#   Group3        13619445      FALSE Amel_HAv3.1
#   Group4        13404451      FALSE Amel_HAv3.1
#   Group5        13896941      FALSE Amel_HAv3.1
#   ...                ...        ...         ...
#   GroupUN_104       4407      FALSE Amel_HAv3.1
#   GroupUN_213       3840      FALSE Amel_HAv3.1
#   GroupUN_76        3691      FALSE Amel_HAv3.1
#   GroupUN_232       2376      FALSE Amel_HAv3.1
#   GroupUN_163       2302      FALSE Amel_HAv3.1

Cheers,

H.

ADD REPLY
0
Entering edit mode

thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6