Question

Some confuse about the DBA_SCORE_TMM_READS_EFFECTIVE_CPM in DiffBind

0

Entering edit mode

Guandong Shang ▴ 40

@shangguandong1996-21805

Last seen 17 months ago

China

Hello, Dr Stark. I am confused about the parameter DBASCORETMMREADSEFFECTIVE_CPM.

It says that DBASCORETMMREADSEFFECTIVE -> TMM normalized (using edgeR), using ChIP read counts and Effective Library size DBASCORETMMREADSEFFECTIVECPM -> same as DBASCORETMMREADS_EFFECTIVE, but reported in counts-per-million.

I konw the meaning of TMM and CPM. But I am confused about the DBASCORETMMREADSEFFECTIVE_CPM . At first I think it first get the TMM value, than normazlie TMM vlaue into CPM value. But I find the sum of value is not 1M, which confuse me

> colSums(CPM_merge)
 E5_0h_R1  E5_0h_R2  E5_3D_R1  E5_3D_R2  E5_3D_R3 
1076920.7 1050878.6 1154048.9 1100915.8 1023065.8 
    G3_R1     G3_R2     G3_R3   G3E1_R1   G3E1_R2 
1013695.3  984226.9 1116362.4  941578.9  924370.8 
  G3E3_R1   G3E3_R2   G3E7_R1   G3E7_R2 
 955514.5  917167.8  915660.3  873877.5

My Tototal code is

dba_meta <- dba(minOverlap = 1, sampleSheet = sample_info)
dba_count <- dba.count(dba_meta,minOverlap = 1,score = DBA_SCORE_TMM_READS_EFFECTIVE_CPM)
peak_CPM_list <- dba_count$peaks
names(peak_CPM_list) <- dba_count$samples$SampleID

scores <- lapply(peak_CPM_list, function(x) {x$Score})
CPM_merge <- do.call(cbind, scores)

Best wishes Guandong Shang

And I have another confuse about the DiffBind Question: The coordinate system problem about DiffBind output if you can help me also, that will be greatul :)

diffbind CPM TMM • 852 views

ADD COMMENT • link updated 4.1 years ago by Rory Stark ★ 5.1k • written 4.1 years ago by Guandong Shang ▴ 40

score 3 · Accepted Answer · 2020-03-19

3

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 4 days ago

Cambridge, UK

Basically, in the CPM versions, DiffBind TMM scores assume each library was sequenced to a depth of 1M reads.

Details:

First, edgeR is used to calculate the $lib.size and $norm.factors for each sample. These are multiplied to derive a scaling factor.

Next, the raw read counts are divided by this scaling factor.

Finally, the adjusted read counts are expanded back into useful values by multiplying by a single representative library size. In the CPM case, this is set to a constant, 1E06. In the non-CPM case, this is taken as the mean $lib.size.

This is all somewhat arbitrary. These scores are only used for plotting non-analyzed data (heatmaps and PCAs), and these values are useful for that. In no case are DiffBind "scores" used directly in an analysis, only in certain clustering plots.

ADD COMMENT • link 4.1 years ago Rory Stark ★ 5.1k

0

Entering edit mode

I am not sure I understand well Just a example

- Raw count

|        | Sample 1 | Sample 2 |
| ------ | -------- | -------- |
| Peak_1 | 24       | 15       |

Scaling factor

- Sample 1: **1.2**
- Sample 2:  **0.8**



- divided by scaling factor

|        | Sample 1 | Sample 2 |
| ------ | -------- | -------- |
| Peak_1 | 20       | 18.75    |



-  multiplying by a single representative library size. In CPM case, 1E06

|        | Sample 1        | Sample 2           |
| ------ | --------------- | ------------------ |
| Peak_1 | 20 * 1E06 (???) | 18.75 * 1E06 (???) |

ADD REPLY • link 4.1 years ago Guandong Shang ▴ 40