Question: Diffbind with age as blocking factor
0
5 weeks ago by
melnuesch10
melnuesch10 wrote:

I have chip-seq data which I am analysing with Diffbind. I understood the "blocking factor" functionality of it can be used to remove confounding variables. In the Diffbind manual, the confounding factors used as examples are strings (e.g. cell line names, if the cells from those samples were "resistant" or not), but I would like to use numeric variables such as age or post-mortem delay (e.g. 80 years, 24 hours). I happen to have also another string variable, which is sex (M for male and F for female). When I build my sample sheet and dba object with a string variable, everything runs smoothly (the name of the column, by the way, is Factor, and then I call it as DBA_FACTOR). To do the same for age I simply substituted the values of the column that had Fs and Ms for the age of each individual (each sample belongs to a different individual). The age does not differ that much, so I did not expect the results to change that much. The PCA plot shows exactly the same clustering as before. Nevertheless, while running further parts of the script (dba.contrast, dba.plotMA, dba.plotVolcano, dba.analyze) I find problems.

Context: Let's consider I have conditions A, B, C and D with multiple replicates each (6,4,5,7 respectively). They are incremental stages of a disease condition, being A the milder state and D the worst (and the amount of differentially bound sites I know increments as well).

The kind of errors I find are: (for samples A, B and C)

Error in pv.DBAplotVolcano(DBA, contrast = contrast, method = method, : object 'sigSites' not found In addition: Warning message: No sites above threshold

(but there should be sites above threshold because of my previous knowledge of the data)

For sample D:

results=dba.analyze(results, method=DBA_DESEQ2) converting counts to integer mode gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates converting counts to integer mode DESeq2 multi-factor analysis Error in checkForExperimentalReplicates(object, modelMatrix) : The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.22.

Question: I was wondering if Diffbind has some problem when handling numeric instead of string variables or there is something else in my script. But, since I used exactly the same script for both and only changed the values of the columns, I cannot think of any bug on my code that might be responsible and decided to double-check here.

diffbind chip-seq • 74 views
modified 28 days ago by Rory Stark2.9k • written 5 weeks ago by melnuesch10
Answer: Diffbind with age as blocking factor
0
28 days ago by
Rory Stark2.9k
CRUK, Cambridge, UK
Rory Stark2.9k wrote:

For the first question you should check if the analysis actually identified differentially bound sites with a high confidence. Just because you expect there to be DB sites based on the biology, you may not get the result you expect from the statistical analysis (for example, if there is high variance in the samples). You can check the results of the statistical analysis by looking at MA plots (dba.plotMA()) and getting a report with a high FDR threshold (dba.report(th=1)).

I suspect the second issue is not a numeric/string issue, but that you have fewer replicates of each age group. Maybe there are ages for which you only have one sample? You could combine similar ages, but DESeq2 won't run if you have a sample group with no replicates.

If you like you can send me the DBA object and your script so I can have a look at what is going on.