Hi. I want to use QDNAseq to look for copy number changes in a Plasmodium falciparum sequence. I have made a BSgenome package from the sequence, which looks OK. It gave 2 Notes and 2 Warnings when I ran R CMD check, but it loads successfully:
> library("BSgenome.Pfalciparum3D7.PlasmoDB.3D7v3")
> pfg <- BSgenome.Pfalciparum3D7.PlasmoDB.3D7v3
> seqinfo(pfg)
Seqinfo object with 16 sequences (1 circular) from Pf3D7v3 genome:
  seqnames   seqlengths isCircular  genome
  chrom04       1200490      FALSE Pf3D7v3
  chrom05       1343557      FALSE Pf3D7v3 
...
> head(pfg[['chrom01']]) 6-letter "DNAString" instance seq: TGAACC
But QDNAseq createBins function creates an empty structure:
> pfBins10k <- createBins(pfg, 10) Creating bins of 10 kbp for genome pfg > pfBins10k [1] chromosome start end bases gc <0 rows> (or 0-length row.names)
createBins() worked when I tested it with BSgenome.Celegans.UCSC.ce2, so I think the problem must be in the way I forged the package, but I can't find it. Any advice?
Thank you, Jocelyn
> sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: CentOS release 6.4 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Pfalciparum3D7.PlasmoDB.3D7v3_0.1-0 QDNAseq_1.4.1 [3] BSgenome_1.36.3 rtracklayer_1.28.6 [5] Biostrings_2.36.2 XVector_0.8.0 [7] GenomicRanges_1.20.5 GenomeInfoDb_1.4.1 [9] IRanges_2.2.5 S4Vectors_0.6.3 [11] BiocGenerics_0.14.0

Problem solved, sort-of: it worked when I changed to ignoreMitochondria=FALSE
> pfBins10k <- createBins(pfg, 10, ignoreMitochondria=FALSE) Creating bins of 10 kbp for genome pfg Processing chrom04 ... ... Processing chrom11 ... Processing chromMito ... Processing chromApico ... > str(pfBins10k) 'data.frame': 2342 obs. of 5 variables: $ chromosome: chr "om04" "om04" "om04" "om04" ... $ start : num 1 10001 20001 30001 40001 ... $ end : num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+05 ... $ bases : num 100 100 100 100 100 100 100 100 100 100 ... $ gc : num 33 32.5 30.3 29.7 22.2 ...Follow-up question: how is Mitochondrial status determined?
And I can see that I chose poor names for the chromosomes - I will change to more standardised. I thought I would use 'chrom' because 'chr' was confusable with 'character'. The original fasta file calls them "Pf3D7_04_v3", and "PFC10_API_IRAB", etc., and I thought the underscores might be the problem.