Question: DiffBind :Retrieve count value from Diffbind by dab.report
0
19 months ago by
JunLVI40
Japan
JunLVI40 wrote:

I applied "DiffBind" to analyze my dataset like:

H3K27ac_CellA_CellD <- dba(sampleSheet="sampleconfiguration.csv",)
H3K27ac_CellA_CellD_Count <- dba.count(H3K27ac_CellA_CellD, score= DBA_SCORE_TMM_READS_EFFECTIVE, mapQCth=0, bRemoveDuplicates=FALSE)
H3K27ac_CellA_CellD_Diff <- dba.analyze(H3K27ac_CellA_CellD_Count,method=DBA_DESEQ2)
H3K27ac_CellA_CellD_Diff_All <- dba.report(H3K27ac_CellA_CellD_Diff,method=DBA_DESEQ2, th=1)
H3K27ac_All <- data.frame (Conc_WT=H3K27ac_CellA_CellD_Diff_All @elementMetadata$Conc_WT, Conc_GeneXcKO= H3K27ac_CellA_CellD_Diff_All @elementMetadata$Conc_GeneXcKO)

DataFrame with 6523 rows and 6 columns
Conc Conc_GeneXcKO   Conc_WT      Fold   p-value       FDR
<numeric>    <numeric> <numeric> <numeric> <numeric> <numeric>
1         4.77        -0.51      5.75     -6.26  6.51e-11  4.11e-07
2         4.41         0.27      5.37     -5.11  2.55e-08  8.06e-05
...
6523      3.88         4.06      3.67      0.39         1         1

I was wondering how these values are calculated:

Conc Conc_GeneXcKO Conc_WT

since DESEQ2 was utilized to perform the differential analysis, does this means the "regularized log transformation" "DESeq2"  was applied to the raw count value?  Or in my case above, simple log transformation of "TMM_READS_EFFECTIVE"?

Either case, it seems that values were "normalized" one way or another.

when I plotted the "H3K27ac_All" containing the "Conc_GeneXcKO Conc_WT" in my case,

p <- ggplot(H3K27ac_All, aes(Conc_WT, Conc_GeneXcKO))
p + geom_point() +
geom_abline(intercept = 0, slope = 1, linetype=2) +
scale_x_continuous(name="Epifeatures\n in WT Cell ", limits = c(-1, 11)) +
scale_y_continuous(name="Epifeatures \n in KO Cell ",limits = c(-1, 11)) ​

I expected the dots distributed along the line with slope of 1, somehow the results turned out to be different.

Could this be a result of improper data process?

I just noticed some related post DiffBind: Normalization for DESeq2 and DiffBind dba.report() output: Is "concentration" normalized?:

so it seems to me that in my case, the "conc" is log transformed counts  normalized by library size.

or set the lib size as "sizeFactor" in "DESeq2"

"DESeq2::sizeFactors(DESeqDataSeq) <- libsize/min(libsize)"

just make sure, is log transformation here equivalent to "rlog" in "DEseq2"?

diffbind • 354 views
modified 19 months ago by Rory Stark2.7k • written 19 months ago by JunLVI40
Answer: A: DiffBind :Retrieve count value from Diffbind by dab.report
0
19 months ago by
Rory Stark2.7k
CRUK, Cambridge, UK
Rory Stark2.7k wrote:

The transform is a little different than rlog. The read counts are, as you saw above, normalized by the minimum library size. The concentrations are the log2() of the mean of these values.

You can do a "pure" DESeq2 normalization by setting bFulLibrarySize=FALSE in the call to dba.analyze(). This will only use the reads within peaks to calculate the normalisation factors.

I can think of a couple of things you may want to look at to see what is going on. One is to retrieve the read counts in the report by setting bCounts=TRUE in the call to dba.report(). (You can get these all in a data frame by setting DataType=DBA_DATA_FRAME in the call as well). You can retrieve normalized or "raw" read counts by changing the value of the bNormalized parameter -- it can be interesting to compare "raw" and normalized counts. Finally, getting an MA plot would be informative to see if the shift from the diagonal seems to be "real" or not. You can call dba.plotMA() with bNormalized=FALSE to see the raw distribution, and you can also compare the impact of setting bFullLibrarySize=FALSE to get a more complete picture of how your reads are distributed and the effects of normalization.

-Rory