Question

diffbind counts as same values

0

Entering edit mode

chadnewton • 0

@chadnewton-20911

Last seen 5.1 years ago

I have run ATAC-seq on 4 samples (2 groups with 2 samples each) and I am wanting to differential peak calling between groups but I noticed that when I generate a report of my analysis with dba.report() two of the samples have the exact same count values across all peaks (6.61 for both c1s and c3r). the samples are grouped by condition (slow vs rapid). For this experiment c1s and c4s are in the "slow" condition and c2r and c3r are in the "rapid" condition.

dat.count <- dba.count(dat, minOverlap = 0, score = DBA_SCORE_RPKM, bUseSummarizeOverlaps = FALSE,  filterFun = max, bRemoveDuplicates = FALSE, bScaleControl = FALSE

dat.con <- dba.contrast(dat.count, categories = DBA_CONDITION, minMembers = 2)

dat.an <- dba.analyze(dat.con, method = c(DBA_EDGER), bTagwise = FALSE, bFullLibrarySize = FALSE)

DB <- dba.reportdat.an, method = DBA_EDGER, th = 0.05, bUsePval = F, fold = 0, bNormalized = TRUE,
                    bCalled = TRUE, bCalledDetail = FALSE, bCounts = TRUE, precision = 0,
                    file = "cn1.a.reads", ext = ".csv" )
df = as.data.frame(DB)
head(df)

      seqnames    start      end width strand     Conc Conc_slow Conc_rapid      Fold      p.value
72377     chr22 24372720 24373588   869      * 4.978275  1.762114   5.898487 -4.136373 3.585008e-13
111971     chr8  8085285  8086444  1160      * 5.105623  1.832666   6.028998 -4.196332 1.659267e-12
127355     chrY 15016435 15017735  1301      * 4.340530  1.762114   5.214402 -3.452288 3.263057e-11
127389     chrY 22737178 22738068   891      * 4.318821  1.762114   5.190694 -3.428580 3.795535e-11
44922     chr16 90061131 90062608  1478      * 5.746760  2.463868   6.670675 -4.206807 4.227431e-10
127273     chrY  2802939  2804293  1355      * 3.878970  1.762114   4.702272 -2.940158 8.061298e-10
                FDR Called1 Called2  c1s  c4s    c2r  c3r
72377  4.567694e-08       0       1 6.61 0.17 112.69 6.61
111971 1.057044e-07       1       1 6.61 0.51 123.98 6.61
127355 1.208982e-06       1       2 6.61 0.17  67.64 6.61
127389 1.208982e-06       1       2 6.61 0.17  66.43 6.61
44922  1.077242e-05       1       2 6.61 4.42 197.14 6.61
127273 1.609109e-05       1       2 6.61 0.17  45.45 6.61

Can you tell me why the values for c1s and c3r are all the same? Is this a problem with my input files (bamReads are sorted .bam files and Peaks are narrowPeak files from MACS2)? Or is this a problem with the normalization? I tried using different scores (TMM, etc) but the problem persists.

Any help is appreciated!

diffbind dba.count dba.report • 739 views

ADD COMMENT • link updated 5.2 years ago by Rory Stark ★ 5.2k • written 5.2 years ago by chadnewton • 0

score 0 · Answer 1 · 2019-10-07

I suspect this is a normalization issue, where the normalized read values are all the same. You can check this by setting bNormalized=FALSE in the call to dba.report() to see the "raw" read values.

I also see that in the call dba.count(), you have set bScaleControl=FALSE. As you are subtracting control reads by default in the call to dba.analyze(), it may be that the number of control reads is greater than the number of ChIP reads in certain samples; these will all be set to 1 prior to normalization. You can turn off the subtraction of control reads by setting bSubControl=FALSE. You can also try using the DESeq2 analysis method, which will deploy a different, and less corrective, normali`ation scheme.