Hello,
I encountered a rare problem with segment() in the DNAcopy package. Specifically I see artifactual segments with ouptut$loc.start greater than output$end.start and output$num.mark=0 and output$seg.mean=NA.
In my case, it seems to be caused by "chromosomes" (contigs) with empty copy number data point (NAs). Here is a minimal example to try and illustrate the problem:
> segment(CNA(genomdat=c(1,2,1,NA,3),chrom=c(rep("chr1",3),"chr2","chr3"),maploc=c(1:3,1,1))) Analyzing: Sample.1 Call: segment(x = CNA(genomdat = c(1, 2, 1, NA, 3), chrom = c(rep("chr1", 3), "chr2", "chr3"), maploc = c(1:3, 1, 1))) ID chrom loc.start loc.end num.mark seg.mean 1 Sample.1 chr1 1 3 3 1.3333 2 Sample.1 chr1 1 3 0 NA 3 Sample.1 chr3 1 1 1 3.0000
So the second segment in the output is assigned to chr1 and has num.mark of 0. It should be omitted I believe. SegRows in segment() output would also show a problem as startRow is greater than endRow for that second segment (with real data I have also seen examples where, in addition, loc.start would be greater than loc.end, which is not the case here).
A simple workaround is to eliminate segments with num.mark=0 and seg.mean=NA. However, it would be great to fix the segment() function directly (if needed). Thanks!
> sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.8 (Final) Matrix products: default BLAS: /home/akuhn/R/R-3.4.1/lib/libRblas.so LAPACK: /home/akuhn/R/R-3.4.1/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] DNAcopy_1.50.1 loaded via a namespace (and not attached): [1] compiler_3.4.1
I stand corrected. I'm sure though there are other corner cases where it doesn't work with NAs, since I've stumbled across this "negative segment sizes with NAs" a few times before. But you are probably right that this is a bug worth fixing.