Question

DiffBind handling of replicates and consensus

0

Entering edit mode

Jim • 0

@jim-22844

Last seen 4.2 years ago

I am currently working on an analysis of two factors with two replicates each. I have called peaks using three different peaks callers, and would like to first derive a consensus of the peak callers for each factor before generating a consensus across the factors. Is there any way i can achieve this without having to create multiple DBA's?

My current workflow is: - create DBA with ALL samples, replicates and peak calls (2 x 2 x 3 = 12) - generate consensus for each factor - export and save consensus peaks as bed file - create new DBA with ALL samples, but using consensus peaks for appropriate factor - count - analyse - plot data ...

diffbind • 1.1k views

ADD COMMENT • link updated 7 months ago by Rory Stark ★ 5.2k • written 4.2 years ago by Jim • 0

score 0 · Answer 1 · 2020-02-06

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 15 days ago

Cambridge, UK

If you gave all the same bam files for the same samples, you might try the following:

myDBA <- dba(samplesheet=...)
consDBA <- dba.peakset(myDBA,consensus=c(DBA_FACTOR,DBA_REPLICATE))
consDBA <- dba(consDBA,mask=consDBA$masks$Consensus)
consDBA <- dba.count(consDBA)

If that doesn't work, you can set the bam files directly before counting:

consDBA$class[10,1] <- "path_to_first_bam_file.bam"
consDBA$class[10,2] <- "path_to_second_bam_file.bam"
consDBA <- dba.count(consDBA)

ADD COMMENT • link 4.2 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory Your advice is very useful. But now I want to use the consensus bed file generated by WT group and KO group respectively for the next step of analysis. I will footprinting analysis using the rgt-hint. He asked me to generate a bed file with an integer score for the fifth column. I wonder whether diffbind can extract the two consensus bed files. And the fifth column is an integer. now i had run the code below

dbObj_consensus<-dba(DBA = dbObj_consensus, mask = dbObj_consensus$masks$Consensus,minOverlap = 1)
dbObj_consensus

i get the output

2 Samples, 66064 sites in matrix:
  ID Tissue Factor Replicate Intervals
1 KO Spleen     KO   1-2-3-4     63479
2 WT Spleen     WT   1-2-3-4     57950

I want to get KO's consensus bed which contain 63479 lines and WT's consensus bed which contain 57950 lines Thank you

ADD REPLY • link 12 months ago 紫枫 • 0

score 0 · Answer 2 · 2023-07-26

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 15 days ago

Cambridge, UK

Try this:

consensusKO <- dba.peakset(dbObj_consensus, peaks=1, bRetrieve=TRUE)
consensusWT <- dba.peakset(dbObj_consensus, peaks=2, bRetrieve=TRUE)

ADD COMMENT • link 9 months ago Rory Stark ★ 5.2k

0

Entering edit mode

Thanks for your help. I'm sorry, I have another question: I'm using diffbind to analyze my ATACseq data. When I use a old version(R base is 3.6.2, diffbind is 3.5). my code is

m1 <- "KO"
m2 <- "WT"
dbObj <- dba.count(dbObj, bUseSummarizeOverlaps=TRUE, minOverlap = 0.5)
contr <- paste0(m1, "vs", m2)
dbObj <- dba.contrast(dbObj, group1 = dbObj$masks[[m1]], group2 = dbObj$masks[[m2]])
dbObj <- dba.analyze(dbObj, bFullLibrarySize = FALSE)
dba.show(dbObj, bContrasts=TRUE)

i can get 2500 differential binding site

but when i use R4.2.2 and diffbind is 3.8.0 because this version delete the parameter bFullLibrarySize, my code is :

m1 <- "KO"
m2 <- "WT"
bObj <- dba.count(dbObj, bUseSummarizeOverlaps=TRUE,minOverlap = 0.5,bParallel = 8)
dbObj <- dba.normalize(DBA = dbObj, library = "RiP")
contr <- paste0(m1, "vs", m2)
dbObj <- dba.contrast(dbObj, group1 = dbObj$masks[[m1]], group2 = dbObj$masks[[m2]])
dbObj <- dba.analyze(dbObj)
dba.show(dbObj, bContrasts=TRUE)

I just get 552 differential differential binding site.

I have seen you reply about bFullLibrarySize. like your reply 1 and your reply 2. you said that If bFullLibrarySize=FALSE, the number of reads that overlap consensus peaks is used for each sample (basically, the sum of all the counts). So I used dbObj <- dba.normalize(DBA = dbObj, library = "RiP") to make sure that the method of normalize is consistent.

I known you don't recommand using bFullLibrarySize = FALSE. But the reason I use it is that it filters out more significant peaks at FDR<0.05, which helps me not miss any information that might be useful.

My question is how can I reproduce the results obtained from the old version diffbind ? Thank you!

ADD REPLY • link 8 months ago 紫枫 • 0

0

Entering edit mode

You may want to try different values for the normalize parameter:

dbObj <- dba.normalize(DBA = dbObj, library = "RiP", normalize=DBA_NORM_LIB)

vs.

dbObj <- dba.normalize(DBA = dbObj, library = "RiP", normalize=DBA_NORM_RLE)

ADD REPLY • link 7 months ago Rory Stark ★ 5.2k