Question: DiffBind BED output with score
0
2.8 years ago by
rbronste60
rbronste60 wrote:

Hi all,

I have used DiffBind to come up with some differentially bound/occupied sites in chromatin accessibility data and for some downstream purposes I wanted to take those lists and export as BED files with their standard scores - for example to use in a PWMtool and other areas. I was wondering what the most straightforward way was to accomplish this. Thank you!

Rob

modified 2.8 years ago by Rory Stark2.9k • written 2.8 years ago by rbronste60
Answer: DiffBind BED output with score
1
2.8 years ago by
Rory Stark2.9k
CRUK, Cambridge, UK
Rory Stark2.9k wrote:

Hi Rob-

I'm not sure what you mean by "standard scores." Do you mean a score generated by the original peak caller [eg -10 log10(pvalue)]? Or a statistic relating to the differential binding itself (ie FDR, fold change, mean concentration, etc)?

If you want to retrieve scores generated by the peak caller, there are a couple of complications. One issue is that the consensus peaks do not map directly to called peaks. The peak intervals may be expanded during the peak merging process, and not all peaks are called for all samples (so there is no score from the peak caller). Another issue is that DiffBind normalizes the original peak scores to a 0..1 range, so the actual original scores are no longer available.

Let me know if you are trying to retrieve the peak caller scores, as I could give you some sample code. The basic approach is to read the original peak calls with their scores into a GRanges object, and then overlap each one with the GRanges object returned by dba.report(). You can then use the original scores from peaks that overlap the significantly differential bound regions.

-Rory

Hi Rory,

Appreciate the quick response. So I wanted to use this tool:http://ccg.vital-it.ch/pwmtools/pwmeval.php which asks for a score which I assumed to be the original peak caller score you mention however if there is a proxy for it within the DiffBind output I guess that would work also. I will gladly try the sample code though if thats my best path to an answer. Thank you.

Rob.

1

This tool uses the score to rank sites. If you are using differentially bound sites from DiffBind, a score from DiffBind would be the best bet. For example:

> data(tamoxifen_analysis)
> report <- dba.report(tamoxifen, th=.05,
DataType=DBA_DATA_FRAME)
> score <- -10*(log10(report$FDR)) > write.table(cbind(report[,1:3],rownames(report),score), "DBsites.bed", quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE) ADD REPLYlink written 2.8 years ago by Rory Stark2.9k Hi Rory, That works great thank you! One final question, if I only wanted to export a BED with one one set of differential peaks lets say in the minus direction above -2 fold or in the plus above 2 fold how would I go about getting independent BED files for that? Thanks again. Rob. ADD REPLYlink written 2.8 years ago by rbronste60 1 > report <- dba.report(tamoxifen, th=.05, DataType=DBA_DATA_FRAME) > scores <- -10*(log10(report$FDR))
> sites  <- cbind(report[,1:3],rownames(report),scores)

> gains  <- report$Fold > 2 > losses <- report$Fold < -2

​> write.table(sites[gains,],
"DBsitesGains.bed", quote=FALSE, sep="\t",
row.names=FALSE, col.names=FALSE)
> write.table(sites[losses,],
"DBsitesLosses.bed", quote=FALSE, sep="\t",
row.names=FALSE, col.names=FALSE)

Thank you very much!

Hi Rory,

Had a follow-up question on this workflow for exporting to a BED file. How can I go about including strand information of the differentially called peaks in the exported BED file? Thank you!

Rob.

As far as standard ChIP-seq data goes, the protocol is not strand-specific -- there is no way to know where on the pulled-down DNA, or which strand, the protein of interest was binding. Peaks form peak callers generally do not have a strand specified, the entire genomic interval (both strands) comprises the peaks. Sometimes a motif analysis can tell you which strand was probably bound but that is beyond what DiffBind does.

-Rory

Hi,

ChIP-seq peaks are inherently not stranded, so there is no meaningful strand information to include.  All you can tell is that there is binding at this location.  There are (unless the protocol has been radically changed) top-strand reads upstream of the actual binding site, and bottom-strand reads downstream, just reflecting that we're reading more or less at random from either end of the fragment that was immunoprecipitated.

Edited to add:  yeah, what Rory said... ;)