Cut&Run normalization for Diffbind and DESeq2
2
2
Entering edit mode
Monica ▴ 20
@monica-24377
Last seen 6 months ago

Hi I was wanting to know the best practices for normalizing my Cut&Run data (Genomic localization data similar to ChIP-Seq) after degradation of my protein of interest. I am currently using Diffbind in combination with DESeq2 method of dba.analzye() but diffbind provides so many different options for dba.normalize(), I am not sure what to use. I am between either DBA_NORM_RLE or DBA_NORM_LIB. Can someone please suggest what is best practice and if I am missing anything in my code? I get very different results when using Option 1 vs Option 2 in dba.normalize(). Thank you!


#Importing the the table for diffbind containing all the necessary files, names, and locations

CTCF_narrowpeaks_table <- data.frame(SampleID=c("minusdT-A","minusdT-B","plusdT24h-A","plusdT24h-B"),Condition=c("minusdT","minusdT","plusdT24h","plusdT24h"),Treatment =c("0","0","dT","dT"), Replicate = c("1","2","1","2"),
bamReads=c("/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-1.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-2.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-7.hg19.F4q10.sorted.bam","/Volumes/BombXtreme/5176_CutRun/5176_bam/5176-MB-8.hg19.F4q10.sorted.bam"),
Peaks=c("/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_0hA.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_0hB.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_24hA.filtered_CTCF_q0.01__peaks.narrowPeak","/Volumes/BombXtreme/5176_CutRun/5176_macs/5176_macs_filterblacklist_q0.01/5176_24hB.filtered_CTCF_q0.01__peaks.narrowPeak"),PeakCaller="narrow")

#Creating diffbind matrix from the table
CTCF_narrowpeaks <- dba(sampleSheet = CTCF_narrowpeaks_table)

#Counting all reads within the bam files under each peak
CTCF_narrowpeakscount <- dba.count(CTCF_narrowpeaks, minOverlap = 0 , score = DBA_SCORE_READS)

#This is where I get confused
#normalize my data Option 1:
CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)
#normalize my data Option 2:
CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)

#print out the normalization factors
norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
norm

#contrast the read counts
CTCF_narrowpeakscontrast <- dba.contrast(CTCF_narrowpeaksnormalized, categories = DBA_CONDITION, minMembers = 2)

#analyze the data with DESEQ2
CTCF_narrowpeaksanalyzed <- dba.analyze(CTCF_narrowpeakscontrast, method=DBA_DESEQ2)

ADD COMMENT
1
Entering edit mode
Rory Stark ★ 4.0k
@rory-stark-5741
Last seen 6 days ago
CRUK, Cambridge, UK

If you are getting different results, it is likely that there is a systematic shift in enriched regions in one of the conditions. Unless you have a specific reason to believe this is due to technical issues, this is likely a biological signal you want to retain. In this case, you should NOT use RLE on the binding matrix.

DBA_NORM_LIB is a quick way to perform a minimal normalization. I would try:

normalize <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2,
normalize = DBA_NORM_RLE,  background=TRUE)


to run RLE against background bins rather than the binding matrix.

This should probably be the default, but computing the background bins is somewhat compute intensive.

ADD COMMENT
0
Entering edit mode

Thank you Rory Stark! That helped!

Is there a way to manually input the normalization factors and will those factors be similar to DESEQ2 where you divide? How do you make dba.analyze use those normalization factors?

ADD REPLY
0
Entering edit mode
Rory Stark ★ 4.0k
@rory-stark-5741
Last seen 6 days ago
CRUK, Cambridge, UK

Yes you can set your own vector of normalization factors directly by supplying them as the value for normalize:

CTCFnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2,
normalize = my_vector_of_normalization_factors)


It may be worthwhile to first examine the calculated normalization factors by retrieving them:

CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount,
method = DBA_DESEQ2,
normalize = DBA_NORM_LIB)
norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
norm$lib.sizes norm$norm.factors

ADD COMMENT
0
Entering edit mode

When I use

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)


it seems the library sizes and normalization factors are different than from

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE, background=TRUE)


which are also different from

dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)


I am not sure how to determine which normalization factor is best to use, do you look at the raw data and determine which makes the most sense based off your bam files?

Here are the results I got from the three different methods:

> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2,normalize = DBA_NORM_RLE, background = TRUE, library = DBA_LIBSIZE_FULL)
Generating background bins...
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes [1] 7708026 11083976 8114455 8965465 > norm$norm.factors
minusdT-A   minusdT-B plusdT24h-A plusdT24h-B
0.7664015   1.2371741   1.0130157   1.0665464
>
> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_LIB)
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes [1] 8034512 11685026 8468403 9995836 > norm$norm.factors
[1] 0.8416676 1.2240828 0.8871205 1.0471291
>
> CTCF_narrowpeaksnormalized <- dba.normalize(CTCF_narrowpeakscount, method = DBA_DESEQ2, normalize = DBA_NORM_RLE)
> norm <- dba.normalize(CTCF_narrowpeaksnormalized, bRetrieve=TRUE)
> norm$lib.sizes minusdT-A minusdT-B plusdT24h-A plusdT24h-B 1720052 2167287 832247 1173892 > norm$norm.factors
minusdT-A   minusdT-B plusdT24h-A plusdT24h-B
1.3027111   1.5935320   0.5946036   0.8450465

ADD REPLY

Login before adding your answer.

Traffic: 818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6