Cut&Run normalization for Diffbind and DESeq2
2
2
Entering edit mode
Monica ▴ 20
@monica-24377
Last seen 3.2 years ago

Hi I was wanting to know the best practices for normalizing my Cut&Run data (Genomic localization data similar to ChIP-Seq) after degradation of my protein of interest. I am currently using Diffbind in combination with DESeq2 method of dba.analzye() but diffbind provides so many different options for dba.normalize(), I am not sure what to use. I am between either DBA_NORM_RLE or DBA_NORM_LIB. Can someone please suggest what is best practice and if I am missing anything in my code? I get very different results when using Option 1 vs Option 2 in dba.normalize(). Thank you!


#Importing the the table for diffbind containing all the necessary files, names, and locations

CTCF_narrowpeaks_table <- data.frame(SampleID=c("minusdT-A","minusdT-B","plusdT24h-A","plusdT24h-B"),Condition=c("minusdT","minusdT","plusdT24h","plusdT24h"),Treatment =c("0","0","dT","dT"), Replicate = c("1","2","1","2"),
                                           bamReads=c("/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-1.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-2.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-7.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-8.hg19.F4q10.sorted.bam"),
                                           Peaks=c("/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_0hA.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_0hB.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_24hA.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_24hB.filtered_CTCF_q0.01__peaks.narrowPeak"),PeakCaller="narrow")

#Creating diffbind matrix from the table
CTCF_narrowpeaks <- dba(sampleSheet = CTCF_narrowpeaks_table)

#Counting all reads within the bam files under each peak 
CTCF_narrowpeakscount <- dba.count(CTCF_narrowpeaks, minOverlap = 0 , score = DBA_SCORE_READS)

#This is where I get confused
#normalize my data Option 1: 
CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)
#normalize my data Option 2: 
CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)

#print out the normalization factors 
norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
norm

#contrast the read counts 
CTCF_narrowpeakscontrast <- dba.contrast(CTCF_narrowpeaksnormalized, categories = DBA_CONDITION, minMembers = 2) 

#analyze the data with DESEQ2
CTCF_narrowpeaksanalyzed <- dba.analyze(CTCF_narrowpeakscontrast, method=DBA_DESEQ2)
GenomicLocalization DESeq2 Cut&Run Diffbind Normalization • 1.8k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 9 weeks ago
Cambridge, UK

If you are getting different results, it is likely that there is a systematic shift in enriched regions in one of the conditions. Unless you have a specific reason to believe this is due to technical issues, this is likely a biological signal you want to retain. In this case, you should NOT use RLE on the binding matrix.

DBA_NORM_LIB is a quick way to perform a minimal normalization. I would try:

normalize <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, 
                           normalize = DBA_NORM_RLE,  background=TRUE)

to run RLE against background bins rather than the binding matrix.

This should probably be the default, but computing the background bins is somewhat compute intensive.

ADD COMMENT
0
Entering edit mode

Thank you Rory Stark! That helped!

Is there a way to manually input the normalization factors and will those factors be similar to DESEQ2 where you divide? How do you make dba.analyze use those normalization factors?

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 9 weeks ago
Cambridge, UK

Yes you can set your own vector of normalization factors directly by supplying them as the value for normalize:

CTCFnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, 
                                normalize = my_vector_of_normalization_factors)

It may be worthwhile to first examine the calculated normalization factors by retrieving them:

CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount,
                                            method = DBA_DESEQ2, 
                                            normalize = DBA_NORM_LIB)
norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
norm$lib.sizes
norm$norm.factors
ADD COMMENT
0
Entering edit mode

When I use

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)

it seems the library sizes and normalization factors are different than from

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE, background=TRUE)

which are also different from

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)

I am not sure how to determine which normalization factor is best to use, do you look at the raw data and determine which makes the most sense based off your bam files?

Here are the results I got from the three different methods:

> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2,normalize = DBA_NORM_RLE, background = TRUE, library = DBA_LIBSIZE_FULL)
Generating background bins...
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes
[1]  7708026 11083976  8114455  8965465
> norm$norm.factors
  minusdT-A   minusdT-B plusdT24h-A plusdT24h-B 
  0.7664015   1.2371741   1.0130157   1.0665464 
> 
> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes
[1]  8034512 11685026  8468403  9995836
> norm$norm.factors
[1] 0.8416676 1.2240828 0.8871205 1.0471291
> 
> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes
  minusdT-A   minusdT-B plusdT24h-A plusdT24h-B 
    1720052     2167287      832247     1173892 
> norm$norm.factors
  minusdT-A   minusdT-B plusdT24h-A plusdT24h-B 
  1.3027111   1.5935320   0.5946036   0.8450465
ADD REPLY

Login before adding your answer.

Traffic: 714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6