Search
Question: wrong output of DNAcopy::segment() for chr. with empty copy number data
0
gravatar for Alexandre Kuhn
3 months ago by
Alexandre Kuhn20 wrote:

Hello,

I encountered a rare problem with segment() in the DNAcopy package. Specifically I see artifactual segments with ouptut$loc.start greater than output$end.start and output$num.mark=0 and output$seg.mean=NA.

In my case, it seems to be caused by "chromosomes" (contigs) with empty copy number data point (NAs). Here is a minimal example to try and illustrate the problem:

 

> segment(CNA(genomdat=c(1,2,1,NA,3),chrom=c(rep("chr1",3),"chr2","chr3"),maploc=c(1:3,1,1)))

Analyzing: Sample.1 

Call:

segment(x = CNA(genomdat = c(1, 2, 1, NA, 3), chrom = c(rep("chr1", 

    3), "chr2", "chr3"), maploc = c(1:3, 1, 1)))


        ID chrom loc.start loc.end num.mark seg.mean

1 Sample.1  chr1         1       3        3   1.3333

2 Sample.1  chr1         1       3        0       NA

3 Sample.1  chr3         1       1        1   3.0000

 

So the second segment in the output is assigned to chr1 and has num.mark of 0. It should be omitted I believe. SegRows in segment() output would also show a problem as startRow is greater than endRow for that second segment (with real data I have also seen examples where, in addition, loc.start would be greater than loc.end, which is not the case here).

A simple workaround is to eliminate segments with num.mark=0 and seg.mean=NA. However, it would be great to fix the segment() function directly (if needed). Thanks!

 

> sessionInfo()

R version 3.4.1 (2017-06-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: CentOS release 6.8 (Final)


Matrix products: default

BLAS: /home/akuhn/R/R-3.4.1/lib/libRblas.so

LAPACK: /home/akuhn/R/R-3.4.1/lib/libRlapack.so


locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              

 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 

 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       


attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     


other attached packages:

[1] DNAcopy_1.50.1


loaded via a namespace (and not attached):

[1] compiler_3.4.1

 

ADD COMMENTlink modified 12 weeks ago • written 3 months ago by Alexandre Kuhn20
0
gravatar for markus.riester
3 months ago by
markus.riester20 wrote:

I think you answered your question, segment() doesn't like NAs anywhere. 

ADD COMMENTlink written 3 months ago by markus.riester20
0
gravatar for Alexandre Kuhn
12 weeks ago by
Alexandre Kuhn20 wrote:

It does not seem to be true. segment() handles NAs correctly if there are other, non-NA values on the same chromosome:

> segment(CNA(genomdat=c(1,2,1,NA,2,3),chrom=c(rep("chr1",3),rep("chr2",2),"chr3"),maploc=c(1:3,1:2,1)))

Analyzing: Sample.1 

Call:

segment(x = CNA(genomdat = c(1, 2, 1, NA, 2, 3), chrom = c(rep("chr1", 

    3), rep("chr2", 2), "chr3"), maploc = c(1:3, 1:2, 1)))


        ID chrom loc.start loc.end num.mark seg.mean

1 Sample.1  chr1         1       3        3   1.3333

2 Sample.1  chr2         2       2        1   2.0000

3 Sample.1  chr3         1       1        1   3.0000

 

However, the (problematic) case where read count data are NAs for the entire contig length does occur in real world analyses (and, in particular these days with draft genome assemblies that may have numerous, smaller contigs). It would thus be helpful to users if segment() would handle this case properly.

ADD COMMENTlink written 12 weeks ago by Alexandre Kuhn20

I stand corrected. I'm sure though there are other corner cases where it doesn't work with NAs, since I've stumbled across this "negative segment sizes with NAs" a few times before. But you are probably right that this is a bug worth fixing.

 

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by markus.riester20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 171 users visited in the last hour