Diffbind with age as blocking factor
Entering edit mode
melnuesch ▴ 10
Last seen 2.9 years ago

I have chip-seq data which I am analysing with Diffbind. I understood the "blocking factor" functionality of it can be used to remove confounding variables. In the Diffbind manual, the confounding factors used as examples are strings (e.g. cell line names, if the cells from those samples were "resistant" or not), but I would like to use numeric variables such as age or post-mortem delay (e.g. 80 years, 24 hours). I happen to have also another string variable, which is sex (M for male and F for female). When I build my sample sheet and dba object with a string variable, everything runs smoothly (the name of the column, by the way, is Factor, and then I call it as DBA_FACTOR). To do the same for age I simply substituted the values of the column that had Fs and Ms for the age of each individual (each sample belongs to a different individual). The age does not differ that much, so I did not expect the results to change that much. The PCA plot shows exactly the same clustering as before. Nevertheless, while running further parts of the script (dba.contrast, dba.plotMA, dba.plotVolcano, dba.analyze) I find problems.

Context: Let's consider I have conditions A, B, C and D with multiple replicates each (6,4,5,7 respectively). They are incremental stages of a disease condition, being A the milder state and D the worst (and the amount of differentially bound sites I know increments as well).

The kind of errors I find are: (for samples A, B and C)

Error in pv.DBAplotVolcano(DBA, contrast = contrast, method = method, : object 'sigSites' not found In addition: Warning message: No sites above threshold

(but there should be sites above threshold because of my previous knowledge of the data)

For sample D:

results=dba.analyze(results, method=DBA_DESEQ2) converting counts to integer mode gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates converting counts to integer mode DESeq2 multi-factor analysis Error in checkForExperimentalReplicates(object, modelMatrix) : The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.22.

Question: I was wondering if Diffbind has some problem when handling numeric instead of string variables or there is something else in my script. But, since I used exactly the same script for both and only changed the values of the columns, I cannot think of any bug on my code that might be responsible and decided to double-check here.

diffbind chip-seq • 424 views
Entering edit mode
Rory Stark ★ 4.4k
Last seen 6 days ago
CRUK, Cambridge, UK

For the first question you should check if the analysis actually identified differentially bound sites with a high confidence. Just because you expect there to be DB sites based on the biology, you may not get the result you expect from the statistical analysis (for example, if there is high variance in the samples). You can check the results of the statistical analysis by looking at MA plots (dba.plotMA()) and getting a report with a high FDR threshold (dba.report(th=1)).

I suspect the second issue is not a numeric/string issue, but that you have fewer replicates of each age group. Maybe there are ages for which you only have one sample? You could combine similar ages, but DESeq2 won't run if you have a sample group with no replicates.

If you like you can send me the DBA object and your script so I can have a look at what is going on.


Login before adding your answer.

Traffic: 502 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6