Question

Using DiffBind Normalization and Contrasting on Fold Change Values

0

Entering edit mode

ssankar3 • 0

@2e896c5b

Last seen 20 months ago

United States

I'm very new to R, Bioinformatics, and programming in general, so I appreciate the aid.

I've acquired a consensus peakset count matrix from a program for ChipSeq MCAP Data with a minimum overlap of 1 across ~40 samples. These samples can be divided into two groups, and I hope to contrast them to determine differentially methylated regions/genes with DiffBind. The problem lies with the fact that I'm unsure how the data was normalized, if it was at all. Therefore, I would like to do this on my own from the counts that I was given.

The data is formatted in the below format. Note that I've acquired the fold change data as the peaks for each sample are quantified as either a 0 or 1 rather than a range in between. Therefore, I don't think it would be as useful for normalization as the fold change values themselves.

I've tried doing successful calls to dba.peakset for each sample to generate a new DBA object. However, it doesn't seem to work, yielding an error message I'm unclear about. The code I used is below.

> head(BAandWA) #partial head
Chromosome  Start    End Fold.change.BA12_MCAP_Peaks.with.gene.annotations. Fold.change.BA16_MCAP_Peaks.with.gene.annotations.
1       chr1   9993  11257                                          17.308525                                           5.414499
2       chr1  51517  52052                                           0.000000                                          21.969871
3       chr1 121222 121570                                           7.496554                                          14.837665
4       chr1 135042 135329                                           0.000000                                           5.802254
5       chr1 136266 137151                                           8.312252                                           9.293681
6       chr1 180710 181826                                           8.730159                                          11.106704

> countMatrix <- BAandWA[, -c(1:3)]
> newDBA <- NULL
> for(sample in 1:ncol(countMatrix)){
  newDBA <- dba.peakset(newDBA, peaks = BAandWA, sampID = colnames(countMatrix)[sample], counts = countMatrix[, sample])
}

Error in if (treads) { : argument is not interpretable as logical

> newDBA
1 Samples, 71010 sites in matrix:
                                                  ID Caller Intervals    Reads FRiP
1 Fold.change.BA12_MCAP_Peaks.with.gene.annotations. counts     71010 149058.4    1

I'd like to know how to fix this. Then, I'd like to know how to perform a normalization and contrast. Would dba.contrast and dba.analyze work fine?

DiffBind • 813 views

ADD COMMENT • link updated 20 months ago by Rory Stark ★ 5.2k • written 20 months ago by ssankar3 • 0

score 1 · Answer 1 · 2023-03-16

1

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 5 weeks ago

Cambridge, UK

You can't do this kind of analysis using Fold Change data -- you need to be able to get raw (non-normalized) reads. Optimally you would have access to the aligned read files in .bam format, do you have those? Alternatively a proper count matrix with raw read counts.

Regarding the error message, I'm not sure exactly what it is having trouble with (although I'm a bit surprised it took non-integer values for the counts). I can see that you are not using the current version DiffBind_3.8.4 as the error you are seeing is no longer possible in more recent versions. Updating should resolve this.

ADD COMMENT • link 20 months ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you for the reply. I can get the bam files, but I'd need to export them from the program itself apparently, and it would take a long time considering the size of each file.

However, I've been reading through the documentation, and I see that I can read in a peakset containing the chromosome, start, end, and confidence score in a peak. I do have all of those values for each sample, at least I believe I do. I've used the MACS algorithm, and I've read that the scores are returned as the -log of the pvalue, indicating that a high number means greater confidence. I have columns titled P score and Q score for each sample, so I believe I can use these.

My question is: is that all I need to continue with the analysis? Do I need the raw read counts or the bam files?

ADD REPLY • link 20 months ago ssankar3 • 0

0

Entering edit mode

You need the bam files (or a raw count matrix) to do any kind of quantitative analysis and compute confidence statistics (p-values/FDR values).

Without the reads, you can only do what is referred to as an "occupancy analysis" in the vignette. That is, you can do overlaps (basically Venn diagrams), basic clustering, and not much else. You can find peaks that are identified uniquely in one sample group vs. another group, but you can't assess if these changes are meaningful. More importantly, you can't really assess changes in binding strength in sites that are identified across sample groups; these are often some of the most important sites.

ADD REPLY • link 20 months ago Rory Stark ★ 5.2k