Search
Question: Diffbind discordant "Called" column in the output when setting filter parameter in dba.count
0
3.5 years ago by
eva.pinatel10
Italy
eva.pinatel10 wrote:

Dear dr. Stark and community,

I'm using Diffbind to obtain differential expression values for a series of interesting regions and I have some questions.I run the following:

TF1_initial_IP=dba(sampleSheet="TF1DiffBind_optimal02_IP.csv",config=data.frame(fragmentSize=130), peakCaller="narrow",bCorPlot=FALSE)
TF1count_IP = dba.count ( TF1_initial_IP, minOverlap=2, fragmentSize=130, filter=200, bCorPlot=FALSE)
TF1_IP = dba.contrast( TF1count_IP, categories=c(DBA_TISSUE,DBA_TREATMENT),minMembers=2)
TF1_IP = dba.analyze( TF1_IP,method=DBA_DESEQ2,bReduceObjects=F, bFullLibrarySize=TRUE,bCorPlot=FALSE)
for (i in c(3,4,8,10)){dba.report( TF1_IP,contrast=i,method=c(DBA_DESEQ2), th=1, bCounts=TRUE,bNormalized=TRUE,bCalled=TRUE,DataType=DBA_DATA_FRAME,bCalledDetail=TRUE, file=i,ext="csv",initString="DESEQ2_TF1onIP")}

I noticed that for many peaks (here I attached only one example) the called columns differ from the original list of peaks while, just eliminating the filter parameter, the calls perfectly match to the list of peaks given as input.

Start    End    Conc    Conc_WT:Femedia    Conc_WT:Dipymedia    Fold    p-value    FDR    Called1 Called2    IP-1C    IP-1D    IP-2C    IP-2D    IP-1C    IP-1D    IP-2C    IP-2D
Filter=default    1420588    1420984    12.69    13.55    10.34    3.21    1.5626087851071E-017    9.35233019146187E-017    2    2    13235.69    10695.00    1537.05    1050.26    +    +    +    +
Filter=200    1420588    1420984    12.69    13.55    10.34    3.21    1.5626087851071E-017    9.35233019146187E-017    0    0    13235.69    10695.00    1537.05    1050.26    -    -    -    -

From what I understand, if none of the samples reaches the minimum of reads required by filter parameter, the interval is eliminated. While the Called column should just indicate how many samples were originally defined as peaks for the examined region. I'm missing something?

Finally I have a doubt about dba.count-dba.analyze functions.

Using default settings, from what I read on this blog, I figured that:
1)Scaled input reads are subtracted to the IP raw counts (and scaling is done only if the input is deeper than the compared ChIP)
2) non integer numbers are rounded
3)negative numbers are set to 1
Counts are then passed to the selected tool (DESEQ2), which calculates normalization factors on the original library size but applies them to the final counts. Is it correct?

I'm  just try to produce .wig files to have some images and check how tracks are modulated by all these operations; there is a way to obtain the normalization factors and the scaling factors used?

Eva

modified 3.5 years ago by Rory Stark2.6k • written 3.5 years ago by eva.pinatel10
0
3.5 years ago by
Rory Stark2.6k
CRUK, Cambridge, UK
Rory Stark2.6k wrote:

Hi Eva-

First, can you tell me what version you are working with by sending along the output of sessionInfo()? That will help with the issue with the Called statistics, as this code has changed in recent versions. One workaround you may try is to count and filter in separate steps:

> TF1count_IP = dba.count (TF1_initial_IP, minOverlap=2, fragmentSize=130, bCorPlot=FALSE)
> TF1count_IP = dba.count (TF1count_IP, peaks=NULL, filter=200, bCorPlot=FALSE)

Your explanation regarding the counting algorithm is quite good, except point 3), which should read "non-positive numbers are set to 1) as zero values are also set to 1.

Cheers-

Rory