Hello,
Using "forgeBSgenomeDataPkg" I get an error I cannot solve. The problem seems to be to create the twobit file. I have write permission in the destdir and enough disc space on my drive. The sequence data are located in a directory mounted from a linux-server, but I think this is not the problem, since the fasta files were loaded.
This is my seed-file:
Package: BSgenome.Hsapiens.NCBI.GRCh38.p11
Title: Homo Sapiens full genome for RRBS (NCBI version GRCh38.p11)
Description: Homo Sapiens full genome as provided by NCBI
organism: Homo sapiens
common_name: Human
provider: NCBI
provider_version: GRCh38.p11
release_date: 14/06/17
release_name: GRCh38.p11
source_url: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.37_GRCh38.p11/
organism_biocview: Homo_sapiens
BSgenomeObjname: Hsapiens
seqnames: c("NT_187687.1","NT_113949.2","NC_012920.1","RRBS_spike-in_SQ6hmC","RRBS_spike-in_SQ1hmC","RRBS_spike-in_SQC","RRBS_spike-in_SQmC","RRBS_spike-in_SQfC","RRBS_spike-in_DC_100","RRBS_spike-in_CC_100")
circ_seqs: "NC_012920.1"
seqs_srcdir: /home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/seqs_srcdir
seqfiles_suffix: .fa
This is the command I was running:
>forgeBSgenomeDataPkg("/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/DESTCRIPTION", destdir = "/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/", verbose = TRUE)
This is the end of the stdout I get:
...
Loading 'RRBS_spike-in_DC_100' sequence from FASTA file '/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/seqs_srcdir/RRBS_spike-in_DC_100.fa' ... DONE
Loading 'RRBS_spike-in_CC_100' sequence from FASTA file '/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/seqs_srcdir/RRBS_spike-in_CC_100.fa' ... DONE
Writing all sequences to '/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package//BSgenome.Hsapiens.NCBI.GRCh38.p11/inst/extdata/single_sequences.2bit' ...
error in .TwoBits_export(mapply(.DNAString_to_twoBit, object, seqnames), :
UCSC library operation failed
Zusätzlich: Es gab 17 Warnungen (Anzeige mit warnings()) [translation: additionally there were 17 warnings]
The created output directory only contains those four elements: DESCRIPTION and NAMESPACE file, R/ and man/ directory. There is no genome created.
> traceback()
15: .Call(TwoBits_write, object, con)
14: .TwoBits_export(mapply(.DNAString_to_twoBit, object, seqnames),
twoBitPath(path(con)))
13: .local(object, con, format, ...)
12: export(object, FileForFormat(con, format), ...)
11: export(object, FileForFormat(con, format), ...)
10: export(seqs, dest_filepath, format = "2bit")
9: export(seqs, dest_filepath, format = "2bit")
8: .forgeTwobitFileFromFastaFiles(seqnames, prefix, suffix, seqs_srcdir,
seqs_destdir, verbose = verbose)
7: forgeSeqFiles(.seqnames, mseqnames = .mseqnames, seqfile_name = x@seqfile_name,
prefix = x@seqfiles_prefix, suffix = x@seqfiles_suffix, seqs_srcdir = seqs_srcdir,
seqs_destdir = seqs_destdir, ondisk_seq_format = x@ondisk_seq_format,
verbose = verbose)
6: forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,
verbose = verbose)
5: forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,
verbose = verbose)
4: forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,
verbose = verbose)
3: forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir,
verbose = verbose)
2: forgeBSgenomeDataPkg("/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/DESTCRIPTION",
destdir = "/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/",
verbose = TRUE)
1: forgeBSgenomeDataPkg("/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/DESTCRIPTION",
destdir = "/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/",
verbose = TRUE)
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
[6] LC_MESSAGES=de_DE.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome_1.44.1 rtracklayer_1.36.4 Biostrings_2.44.2 XVector_0.16.0 GenomicRanges_1.28.5 GenomeInfoDb_1.12.2 IRanges_2.10.3
[8] S4Vectors_0.14.4 BiocGenerics_0.22.0
loaded via a namespace (and not attached):
[1] lattice_0.20-35 matrixStats_0.52.2 XML_3.98-1.9 Rsamtools_1.28.0 GenomicAlignments_1.12.2
[6] bitops_1.0-6 grid_3.4.1 zlibbioc_1.22.0 Matrix_1.2-10 BiocParallel_1.10.1
[11] tools_3.4.1 Biobase_2.36.2 RCurl_1.95-4.8 DelayedArray_0.2.7 compiler_3.4.1
[16] SummarizedExperiment_1.6.4 GenomeInfoDbData_0.99.0
I appreciate every hint! Thanks in advance!
Lena
Hi Lena,
Not sure what's going on. But I wonder what these
RRBS_spike-in_*
sequences are and where you downloaded theRRBS_spike-in_*.fa
files from. Thesource_url
field you provide points to a folder that contains theGCF_000001405.37_GRCh38.p11_genomic.fna.gz
file (this is a FASTA file that contains all the genomic sequences). This suggests that you downloaded this file to get the sequences but that file doesn't seem to contain theRRBS_spike-in_*
sequences. Puzzling! Furthermore, yourseqnames
field only contains 10 sequences but the GRCh38.p11 assembly has 578 sequences! See:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.37_GRCh38.p11/GCF_000001405.37_GRCh38.p11_assembly_report.txt
Note that the official assembly report for GRCh38.p11 doesn't contain any
RRBS_spike-in_*
sequences either.All this is somewhat confusing and suggests that you might actually be trying to do something quite different than forging a BSgenome package for GRCh38.p11. It would help if you could clarify what you are exactly trying to do.
FWIW note that if, instead of using one FASTA file per sequence (like you seem to be doing), you use a single FASTA file that contains all the sequences (e.g.
GCF_000001405.37_GRCh38.p11_genomic.fna.gz
), then you don't need to specify theseqnames
field. Note however thatGCF_000001405.37_GRCh38.p11_genomic.fna.gz
cannot be used as-is and requires the following massage before it can be used to forge the BSgenome package:This will take a few minutes. Then all you need to do is move the
GRCh38.p11.2bit
file to/home/lmoebus/mountrz_work_zfs/references/BSgenome_R_package/seqs_srcdir/
and replace theseqnames
andseqfiles_suffix
fields of your seed file with:Hope this helps,
H.