Dear dr. Stark and community,
I'm using Diffbind to obtain differential expression values for a series of interesting regions and I have some questions.I run the following:
TF1_initial_IP=dba(sampleSheet="TF1DiffBind_optimal02_IP.csv",config=data.frame(fragmentSize=130), peakCaller="narrow",bCorPlot=FALSE)
TF1count_IP = dba.count ( TF1_initial_IP, minOverlap=2, fragmentSize=130, filter=200, bCorPlot=FALSE)
TF1_IP = dba.contrast( TF1count_IP, categories=c(DBA_TISSUE,DBA_TREATMENT),minMembers=2)
TF1_IP = dba.analyze( TF1_IP,method=DBA_DESEQ2,bReduceObjects=F, bFullLibrarySize=TRUE,bCorPlot=FALSE)
for (i in c(3,4,8,10)){dba.report( TF1_IP,contrast=i,method=c(DBA_DESEQ2), th=1, bCounts=TRUE,bNormalized=TRUE,bCalled=TRUE,DataType=DBA_DATA_FRAME,bCalledDetail=TRUE, file=i,ext="csv",initString="DESEQ2_TF1onIP")}
I noticed that for many peaks (here I attached only one example) the called columns differ from the original list of peaks while, just eliminating the filter parameter, the calls perfectly match to the list of peaks given as input.
Start End Conc Conc_WT:Femedia Conc_WT:Dipymedia Fold p-value FDR Called1 Called2 IP-1C IP-1D IP-2C IP-2D IP-1C IP-1D IP-2C IP-2D
Filter=default 1420588 1420984 12.69 13.55 10.34 3.21 1.5626087851071E-017 9.35233019146187E-017 2 2 13235.69 10695.00 1537.05 1050.26 + + + +
Filter=200 1420588 1420984 12.69 13.55 10.34 3.21 1.5626087851071E-017 9.35233019146187E-017 0 0 13235.69 10695.00 1537.05 1050.26 - - - -
From what I understand, if none of the samples reaches the minimum of reads required by filter parameter, the interval is eliminated. While the Called column should just indicate how many samples were originally defined as peaks for the examined region. I'm missing something?
Finally I have a doubt about dba.count-dba.analyze functions.
Using default settings, from what I read on this blog, I figured that:
1)Scaled input reads are subtracted to the IP raw counts (and scaling is done only if the input is deeper than the compared ChIP)
2) non integer numbers are rounded
3)negative numbers are set to 1
Counts are then passed to the selected tool (DESEQ2), which calculates normalization factors on the original library size but applies them to the final counts. Is it correct?
I'm just try to produce .wig files to have some images and check how tracks are modulated by all these operations; there is a way to obtain the normalization factors and the scaling factors used?
Thank you in advance
Eva