DiffBind spike-in lib.sizes confusion
1
0
Entering edit mode
Weisheng • 0
@177e01d3
Last seen 18 months ago
United States

Hi,

I'm confused by how DiffBind uses the spike-ins for normalization. My understanding of the manual is that DiffBind calculates the spike-in reads in the bins, and uses those read counts as the library sizes for normalization. I can see that when I set spikein=FALSE, the $lib.sizes and the $background$binned$totals are equal, which is good:

db_data_spikeinNorm2 <- dba.normalize(db_data, spikein = FALSE, background=T, library=DBA_LIBSIZE_BACKGROUND, normalize=DBA_NORM_LIB)

db_data_spikeinNorm2$norm$DESeq2$lib.sizes
[1] 7424321 7030471 8640826 7006223

> db_data_spikeinNorm2$norm$background$binned$totals
[1] 7424321 7030471 8640826 7006223

However, when I set spikein=TRUE, they are not equal anymore:

db_data_spikeinNorm3 <-dba.normalize(db_data, spikein = TRUE, background=T, library=DBA_LIBSIZE_BACKGROUND, normalize=DBA_NORM_LIB)

db_data_spikeinNorm3$norm$DESeq2$lib.sizes
[1] 7747122 7334460 9112179 7386926

db_data_spikeinNorm3$norm$background$binned$totals
[1] 1970 1923 2638 2262

The $lib.sizes are still big numbers that are close to the $lib.sizes from spikein=FALSE, but not identical. Why are they not equal to the background totals anymore? The binned totals make sense because I have a small number of mapped reads in the spike-in control bams. What's going on with the $lib.sizes values when spikein=TRUE?

Thanks.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1               pheatmap_1.0.12             ggrepel_0.9.3              
 [4] profileplyr_1.12.0          csaw_1.30.1                 DiffBind_3.6.5             
 [7] SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
[10] matrixStats_0.63.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[13] IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0        
[16] forcats_1.0.0               stringr_1.5.0               dplyr_1.1.0                
[19] purrr_1.0.1                 readr_2.1.4                 tidyr_1.3.0                
[22] tibble_3.1.8                ggplot2_3.4.0               tidyverse_1.3.2
SpikeIn DiffBind • 766 views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK

In the second case, the $lib.sizes are recorded as the sum of the reads in the primary files plus those in the spike-ins. However, these are not used to compute the normalization factors; the $binned$totals are.

To see this in action, try running:

cor(db_data_spikeinNorm2$norm$DESeq2$lib.sizes,
    db_data_spikeinNorm2$norm$DESeq2$norm.facs)

cor(db_data_spikeinNorm3$norm$background$binned$totals,
    db_data_spikeinNorm3$norm$DESeq2$norm.facs)
ADD COMMENT
0
Entering edit mode

Perfect. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6