Some confuse about the DBA_SCORE_TMM_READS_EFFECTIVE_CPM in DiffBind
1
0
Entering edit mode
@shangguandong1996-21805
Last seen 17 months ago
China

Hello, Dr Stark. I am confused about the parameter DBASCORETMMREADSEFFECTIVE_CPM.

It says that DBASCORETMMREADSEFFECTIVE -> TMM normalized (using edgeR), using ChIP read counts and Effective Library size DBASCORETMMREADSEFFECTIVECPM -> same as DBASCORETMMREADS_EFFECTIVE, but reported in counts-per-million.

I konw the meaning of TMM and CPM. But I am confused about the DBASCORETMMREADSEFFECTIVE_CPM . At first I think it first get the TMM value, than normazlie TMM vlaue into CPM value. But I find the sum of value is not 1M, which confuse me

> colSums(CPM_merge)
 E5_0h_R1  E5_0h_R2  E5_3D_R1  E5_3D_R2  E5_3D_R3 
1076920.7 1050878.6 1154048.9 1100915.8 1023065.8 
    G3_R1     G3_R2     G3_R3   G3E1_R1   G3E1_R2 
1013695.3  984226.9 1116362.4  941578.9  924370.8 
  G3E3_R1   G3E3_R2   G3E7_R1   G3E7_R2 
 955514.5  917167.8  915660.3  873877.5 

My Tototal code is

dba_meta <- dba(minOverlap = 1, sampleSheet = sample_info)
dba_count <- dba.count(dba_meta,minOverlap = 1,score = DBA_SCORE_TMM_READS_EFFECTIVE_CPM)
peak_CPM_list <- dba_count$peaks
names(peak_CPM_list) <- dba_count$samples$SampleID

scores <- lapply(peak_CPM_list, function(x) {x$Score})
CPM_merge <- do.call(cbind, scores)

Best wishes Guandong Shang

And I have another confuse about the DiffBind Question: The coordinate system problem about DiffBind output if you can help me also, that will be greatul :)

diffbind CPM TMM • 852 views
ADD COMMENT
3
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 4 days ago
Cambridge, UK

Basically, in the CPM versions, DiffBind TMM scores assume each library was sequenced to a depth of 1M reads.

Details:

First, edgeR is used to calculate the $lib.size and $norm.factors for each sample. These are multiplied to derive a scaling factor.

Next, the raw read counts are divided by this scaling factor.

Finally, the adjusted read counts are expanded back into useful values by multiplying by a single representative library size. In the CPM case, this is set to a constant, 1E06. In the non-CPM case, this is taken as the mean $lib.size.

This is all somewhat arbitrary. These scores are only used for plotting non-analyzed data (heatmaps and PCAs), and these values are useful for that. In no case are DiffBind "scores" used directly in an analysis, only in certain clustering plots.

ADD COMMENT
0
Entering edit mode

I am not sure I understand well Just a example

- Raw count

|        | Sample 1 | Sample 2 |
| ------ | -------- | -------- |
| Peak_1 | 24       | 15       |

Scaling factor

- Sample 1: **1.2**
- Sample 2:  **0.8**



- divided by scaling factor

|        | Sample 1 | Sample 2 |
| ------ | -------- | -------- |
| Peak_1 | 20       | 18.75    |



-  multiplying by a single representative library size. In CPM case, 1E06

|        | Sample 1        | Sample 2           |
| ------ | --------------- | ------------------ |
| Peak_1 | 20 * 1E06 (???) | 18.75 * 1E06 (???) |


ADD REPLY

Login before adding your answer.

Traffic: 723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6