Question

"Some normalization factors are zero" error on cn.mops

0

Entering edit mode

Stephen Piccolo ▴ 590

@stephen-piccolo-6761

Last seen 3.6 years ago

United States

I'm applying cn.mops to ~10 exome sequencing samples. I can apply getSegmentReadCountsFromBAM just fine. But when I try to apply exomecn.mops, I get an error saying that "Some normalization factors are zero! Remove samples or chromosomes for which the average read count is zero, e.g. chromosome Y." I modified my GenomicRanges object so that it excludes chrY. I've also removed any region or sample from my read counts object that contains zero reads on average. But I still get the error message. However, if I remove two of the samples that have a low overall read count, it works fine.

Am I missing something? Or is there some specific criteria I could use to identify samples/regions that should be excluded?

sessionInfo()

R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] magrittr_1.5 dplyr_0.4.2 cn.mops_1.14.1
[4] GenomicRanges_1.20.5 GenomeInfoDb_1.4.1 IRanges_2.2.5
[7] S4Vectors_0.6.2 Biobase_2.28.0 BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 Rsamtools_1.20.4 Biostrings_2.36.1 assertthat_0.1
[5] bitops_1.0-6 R6_2.1.0 DBI_0.3.1 zlibbioc_1.14.0
[9] XVector_0.8.0 tools_3.2.1

cn.mops • 1.7k views

ADD COMMENT • link updated 8.7 years ago by Günter Klambauer ▴ 540 • written 8.7 years ago by Stephen Piccolo ▴ 590

score 1 · Answer 1 · 2015-08-06

1

Entering edit mode

Günter Klambauer ▴ 540

@gunter-klambauer-5426

Last seen 3.3 years ago

Austria

Hello Stephen,

Thanks for using cn.mops! The default normalization method estimates the size factors based on the median read count per sample. Please do not check the average read counts PER REGION (rows of the read count matrix), but the average read counts PER SAMPLE (columns of the read count matrix).

If one sample has a median read count of 0 (i.e. more than 50% of the regions have zero reads), this might also be an indicator for low sample quality. Especially for exome sequencing this behaviour would be strange.

If the error persists, please send me the read count matrix via email.

Regards,

Günter

ADD COMMENT • link 8.7 years ago Günter Klambauer ▴ 540

0

Entering edit mode

Thanks for your response. It failed when I excluded samples with a median read count of 0. However, I tweaked it to exclude samples that had more than 40% zeroes, and that worked. Some of these samples have very low coverage, so it seems reasonable to exclude them anyway. But you might consider adding a parameter that allows users to remove such samples automatically. Thanks again.

ADD REPLY • link 8.7 years ago Stephen Piccolo ▴ 590

0

Entering edit mode

Thanks for the feedback, Stephen! Let me know how you get along with the further analysis!

Regards,

Günter

ADD REPLY • link 8.7 years ago Günter Klambauer ▴ 540