Question

Cut&Run normalization for Diffbind and DESeq2

2

Entering edit mode

Monica ▴ 20

@monica-24377

Last seen 4.2 years ago

Hi I was wanting to know the best practices for normalizing my Cut&Run data (Genomic localization data similar to ChIP-Seq) after degradation of my protein of interest. I am currently using Diffbind in combination with DESeq2 method of dba.analzye() but diffbind provides so many different options for dba.normalize(), I am not sure what to use. I am between either DBA_NORM_RLE or DBA_NORM_LIB. Can someone please suggest what is best practice and if I am missing anything in my code? I get very different results when using Option 1 vs Option 2 in dba.normalize(). Thank you!


#Importing the the table for diffbind containing all the necessary files, names, and locations

CTCF_narrowpeaks_table <- data.frame(SampleID=c("minusdT-A","minusdT-B","plusdT24h-A","plusdT24h-B"),Condition=c("minusdT","minusdT","plusdT24h","plusdT24h"),Treatment =c("0","0","dT","dT"), Replicate = c("1","2","1","2"),
                                           bamReads=c("/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-1.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-2.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-7.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-8.hg19.F4q10.sorted.bam"),
                                           Peaks=c("/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_0hA.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_0hB.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_24hA.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_24hB.filtered_CTCF_q0.01__peaks.narrowPeak"),PeakCaller="narrow")

#Creating diffbind matrix from the table
CTCF_narrowpeaks <- dba(sampleSheet = CTCF_narrowpeaks_table)

#Counting all reads within the bam files under each peak 
CTCF_narrowpeakscount <- dba.count(CTCF_narrowpeaks, minOverlap = 0 , score = DBA_SCORE_READS)

#This is where I get confused
#normalize my data Option 1: 
CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)
#normalize my data Option 2: 
CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)

#print out the normalization factors 
norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
norm

#contrast the read counts 
CTCF_narrowpeakscontrast <- dba.contrast(CTCF_narrowpeaksnormalized, categories = DBA_CONDITION, minMembers = 2) 

#analyze the data with DESEQ2
CTCF_narrowpeaksanalyzed <- dba.analyze(CTCF_narrowpeakscontrast, method=DBA_DESEQ2)

GenomicLocalization DESeq2 Cut&Run Diffbind Normalization • 2.9k views

ADD COMMENT • link 4.3 years ago Monica ▴ 20

score 1 · Answer 1 · 2020-12-14

If you are getting different results, it is likely that there is a systematic shift in enriched regions in one of the conditions. Unless you have a specific reason to believe this is due to technical issues, this is likely a biological signal you want to retain. In this case, you should NOT use RLE on the binding matrix.

DBA_NORM_LIB is a quick way to perform a minimal normalization. I would try:

normalize <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, 
                           normalize = DBA_NORM_RLE,  background=TRUE)

to run RLE against background bins rather than the binding matrix.

This should probably be the default, but computing the background bins is somewhat compute intensive.

score 0 · Answer 2 · 2020-12-14

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 11 weeks ago

Cambridge, UK

Yes you can set your own vector of normalization factors directly by supplying them as the value for normalize:

CTCFnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, 
                                normalize = my_vector_of_normalization_factors)

It may be worthwhile to first examine the calculated normalization factors by retrieving them:

CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount,
                                            method = DBA_DESEQ2, 
                                            normalize = DBA_NORM_LIB)
norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
norm$lib.sizes
norm$norm.factors

ADD COMMENT • link 4.3 years ago Rory Stark ★ 5.2k

0

Entering edit mode

When I use

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)

it seems the library sizes and normalization factors are different than from

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE, background=TRUE)

which are also different from

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)

I am not sure how to determine which normalization factor is best to use, do you look at the raw data and determine which makes the most sense based off your bam files?

Here are the results I got from the three different methods:

> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2,normalize = DBA_NORM_RLE, background = TRUE, library = DBA_LIBSIZE_FULL)
Generating background bins...
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes
[1]  7708026 11083976  8114455  8965465
> norm$norm.factors
  minusdT-A   minusdT-B plusdT24h-A plusdT24h-B 
  0.7664015   1.2371741   1.0130157   1.0665464 
> 
> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes
[1]  8034512 11685026  8468403  9995836
> norm$norm.factors
[1] 0.8416676 1.2240828 0.8871205 1.0471291
> 
> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes
  minusdT-A   minusdT-B plusdT24h-A plusdT24h-B 
    1720052     2167287      832247     1173892 
> norm$norm.factors
  minusdT-A   minusdT-B plusdT24h-A plusdT24h-B 
  1.3027111   1.5935320   0.5946036   0.8450465

ADD REPLY • link 4.3 years ago Monica ▴ 20