Question

DiffBind: dealing with replicates

0

Entering edit mode

hkitano • 0

@hkitano-23077

Last seen 4.1 years ago

I have 10 "summit" bed files from macs2 peak-calling, with 5 factors with 2 replicates each.

I added them all one-by-one to a dba object "dbObj" that looks like this:

10 Samples, 3202 sites in matrix (234027 total):
            ID        Factor Replicate Caller Intervals
1            CTR-1           ctr         1    bed     29940
2            CTR-2           ctr         2    bed     27812
3           IFNG-1          ifng         1    bed     20954
4           IFNG-2          ifng         2    bed     19747
5            IL4-1           il4         1    bed     25353
6            IL4-2           il4         2    bed     25543
7         preIL4-1        preil4         1    bed     21657
8         preIL4-2        preil4         2    bed     20200
9  preIL4posIFNG-1 preil4posifng         1    bed     24002
10 preIL4posIFNG-2 preil4posifng         2    bed     22068

Then, I tried to follow the steps in the answer here to deal with the replicants. I called:

dbObj2 <- dba.peakset(dbObj, consensus = -DBA_REPLICATE)
dbObj2 <-  dba(dbObj2, mask=dbObj2$masks$Consensus)

and got this for dbObj2:

 5 Samples, 0 sites in matrix (416 total):
         ID        Factor Replicate Caller Intervals
1           ctr           ctr       1-2    bed       112
2          ifng          ifng       1-2    bed        76
3           il4           il4       1-2    bed        77
4        preil4        preil4       1-2    bed        63
5 preil4posifng preil4posifng       1-2    bed        88

which looks right to me. But if I try this:

ConsensusPeaks <- dba.peakset(dbObj2, bRetrieve=TRUE)

ConsensusPeaks is a null object. I'm trying to be able to use dba.counts() on the 5 factors, is there any other way or am I doing this wrong? Thanks!

diffbind chipseq • 2.3k views

ADD COMMENT • link updated 4.1 years ago by Rory Stark ★ 5.1k • written 4.1 years ago by hkitano • 0

score 1 · Answer 1 · 2020-03-15

1

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 13 days ago

Cambridge, UK

You are getting very little overlap between the replicates (between 63 and 112 locations), and zero overlap between conditions.

Basically, you should NOT be using the summit files, as they only identify the loci of maximum pileup, and the not entire enriched region. You should use the peak files (*_peaks.xls) which include intervals (typically several hundred nt) and you will get identify more overlaps in your consensus.

ADD COMMENT • link 4.1 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Thank you for replying, Rory. When I try to read in a *_peaks.xls file, I get the following error:

> dbObj <- dba.peakset(NULL, 
+                      peaks="BMDM-RNAPIIpS2-CTR-1-ChIPseq-GTACACCT_S9_L003_R1_001_peaks.xls",
+                      peak.caller="bed",
+                      factor = "ctr",
+                      sampID="CTR-1",
+                      replicate=1)

Error in Summary.factor(c(29945L, 16L, 90L, 95L, 575L, 1141L, 1143L, 1146L,  : 
  ‘max’ not meaningful for factors

Any ideas for what this means? Thank you!

Hugo

ADD REPLY • link 4.1 years ago hkitano • 0

0

Entering edit mode

You need to use:

peak.caller="macs"

ADD REPLY • link 4.1 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Thank you! I appreciate your help. Going back to this ticket, I'm having trouble reproducing your results, and keep running into the same Can't count: some peaksets are not associated with a .bam file. error.

I have my dbObj, which has 10 peaksets (2 replicates for 5 factors)

10 Samples, 36221 sites in matrix (51169 total):
            ID        Factor Replicate Caller Intervals
1            CTR-1           ctr         1   macs     29940
2            CTR-2           ctr         2   macs     27812
3           IFNG-1          ifng         1   macs     20954
4           IFNG-2          ifng         2   macs     19747
5            IL4-1           il4         1   macs     25353
6            IL4-2           il4         2   macs     25543
7         preIL4-1        preil4         1   macs     21657
8         preIL4-2        preil4         2   macs     20200
9  preIL4posIFNG-1 preil4posifng         1   macs     24002
10 preIL4posIFNG-2 preil4posifng         2   macs     22068

The problem is when I run the following, I get the can't count error on the last line of code

ExptConsensus <- dba.peakset(dbObj, consensus = -DBA_REPLICATE)
ExptConsensus <-  dba(ExptConsensus, mask=ExptConsensus$masks$Consensus, minOverlap=1)
ConsensusPeaks <- dba.peakset(ExptConsensus, bRetrieve=TRUE)
dbObj <- dba.count(dbObj, peaks=ConsensusPeaks)

I feel like I'm doing everything you suggested...do you see an issue? Thank you!

Hugo

ADD REPLY • link 4.1 years ago hkitano • 0

0

Entering edit mode

If you send me the dbObj, I can look and see what is going on.

-Rory

ADD REPLY • link 4.1 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Here's how I build my dbObj:

dbObj <- dba.peakset(dbObj, 
               peaks="BMDM-RNAPIIpS2-preIL4-1-ChIPseq-TCTAGGAG_S15_L003_R1_001_peaks.xls",
               peak.caller="macs",
               factor = "preil4",
               sampID="preIL4-1",
               replicate=1)
 dbObj <- dba.peakset(dbObj,
               peaks="BMDM-RNAPIIpS2-preIL4-2-ChIPseq-CTCGAACA_S16_L003_R1_001_peaks.xls",
               peak.caller="macs",
               factor = "preil4",
               sampID="preIL4-2",
               replicate=2)

I do this 5 times for 5 factors, and end up with the following dbObj:

> dbObj
10 Samples, 36221 sites in matrix (51169 total):
                ID        Factor Replicate Caller Intervals
1            CTR-1           ctr         1   macs     29940
2            CTR-2           ctr         2   macs     27812
3           IFNG-1          ifng         1   macs     20954
4           IFNG-2          ifng         2   macs     19747
5            IL4-1           il4         1   macs     25353
6            IL4-2           il4         2   macs     25543
7         preIL4-1        preil4         1   macs     21657
8         preIL4-2        preil4         2   macs     20200
9  preIL4posIFNG-1 preil4posifng         1   macs     24002
10 preIL4posIFNG-2 preil4posifng         2   macs     22068

Then, I run the following code, and fail on the last line (Error in pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score, : Can't count: some peaksets are not associated with a .bam file.).

ExptConsensus <- dba.peakset(dbObj, consensus = -DBA_REPLICATE)
ExptConsensus <-  dba(ExptConsensus, mask=ExptConsensus$masks$Consensus, minOverlap=1)
ConsensusPeaks <- dba.peakset(ExptConsensus, bRetrieve=TRUE)
dbObj <- dba.count(dbObj, peaks=ConsensusPeaks)

If I do the following to check for bam files, i get NULLs

> dbObj$samples
NULL
> dbObj$class[10,]
          CTR-1           CTR-2          IFNG-1          IFNG-2           IL4-1           IL4-2        preIL4-1 
             NA              NA              NA              NA              NA              NA              NA 
       preIL4-2 preIL4posIFNG-1 preIL4posIFNG-2 
             NA              NA              NA

I think I'm confused about the process of building DBA objects via peaksets. I have bam files associated with each .xls file from before I called peaks on each replicate. Are including those necessary when I instantiate each peakset for the DBA object?

ADD REPLY • link 4.1 years ago hkitano • 0

0

Entering edit mode

Yes, you must tell DiffBind where the bam files are for each peakset so it can count reads when you invoke dba.count(). You can do this using the bamReads parameter to dba.peakset(), but in general it is much easier to keep track of this using a sample sheet passed to a single call to dba().

ADD REPLY • link 4.1 years ago Rory Stark ★ 5.1k