DiffBind BED output with score
1
0
Entering edit mode
rbronste ▴ 60
@rbronste-12189
Last seen 4.4 years ago

Hi all,

I have used DiffBind to come up with some differentially bound/occupied sites in chromatin accessibility data and for some downstream purposes I wanted to take those lists and export as BED files with their standard scores - for example to use in a PWMtool and other areas. I was wondering what the most straightforward way was to accomplish this. Thank you!

Rob

diffbind differential binding analysis • 3.5k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 14 days ago
Cambridge, UK

Hi Rob-

I'm not sure what you mean by "standard scores." Do you mean a score generated by the original peak caller [eg -10 log10(pvalue)]? Or a statistic relating to the differential binding itself (ie FDR, fold change, mean concentration, etc)?

If you want to retrieve scores generated by the peak caller, there are a couple of complications. One issue is that the consensus peaks do not map directly to called peaks. The peak intervals may be expanded during the peak merging process, and not all peaks are called for all samples (so there is no score from the peak caller). Another issue is that DiffBind normalizes the original peak scores to a 0..1 range, so the actual original scores are no longer available.

Let me know if you are trying to retrieve the peak caller scores, as I could give you some sample code. The basic approach is to read the original peak calls with their scores into a GRanges object, and then overlap each one with the GRanges object returned by dba.report(). You can then use the original scores from peaks that overlap the significantly differential bound regions.

-Rory

 

ADD COMMENT
0
Entering edit mode

 

Hi Rory,

Appreciate the quick response. So I wanted to use this tool:http://ccg.vital-it.ch/pwmtools/pwmeval.php which asks for a score which I assumed to be the original peak caller score you mention however if there is a proxy for it within the DiffBind output I guess that would work also. I will gladly try the sample code though if thats my best path to an answer. Thank you.

Rob.

 

ADD REPLY
1
Entering edit mode

This tool uses the score to rank sites. If you are using differentially bound sites from DiffBind, a score from DiffBind would be the best bet. For example:

> data(tamoxifen_analysis)
> report <- dba.report(tamoxifen, th=.05, 
                       DataType=DBA_DATA_FRAME)
> score <- -10*(log10(report$FDR))
> write.table(cbind(report[,1:3],rownames(report),score),
              "DBsites.bed", quote=FALSE, sep="\t",
              row.names=FALSE, col.names=FALSE)
ADD REPLY
0
Entering edit mode

Hi Rory,

That works great thank you! One final question, if I only wanted to export a BED with one one set of differential peaks lets say in the minus direction above -2 fold or in the plus above 2 fold how would I go about getting independent BED files for that? Thanks again.

Rob.

ADD REPLY
1
Entering edit mode
> report <- dba.report(tamoxifen, th=.05,
                       DataType=DBA_DATA_FRAME)
> scores <- -10*(log10(report$FDR))
> sites  <- cbind(report[,1:3],rownames(report),scores)

> gains  <- report$Fold > 2
> losses <- report$Fold < -2

​> write.table(sites[gains,],
              "DBsitesGains.bed", quote=FALSE, sep="\t",
              row.names=FALSE, col.names=FALSE)
> write.table(sites[losses,],
              "DBsitesLosses.bed", quote=FALSE, sep="\t",
              row.names=FALSE, col.names=FALSE)
ADD REPLY
0
Entering edit mode

Thank you very much!

 

ADD REPLY
0
Entering edit mode

Hi Rory,

Had a follow-up question on this workflow for exporting to a BED file. How can I go about including strand information of the differentially called peaks in the exported BED file? Thank you!

Rob.

ADD REPLY
0
Entering edit mode

As far as standard ChIP-seq data goes, the protocol is not strand-specific -- there is no way to know where on the pulled-down DNA, or which strand, the protein of interest was binding. Peaks form peak callers generally do not have a strand specified, the entire genomic interval (both strands) comprises the peaks. Sometimes a motif analysis can tell you which strand was probably bound but that is beyond what DiffBind does.

-Rory

ADD REPLY
0
Entering edit mode

Hi,

ChIP-seq peaks are inherently not stranded, so there is no meaningful strand information to include.  All you can tell is that there is binding at this location.  There are (unless the protocol has been radically changed) top-strand reads upstream of the actual binding site, and bottom-strand reads downstream, just reflecting that we're reading more or less at random from either end of the fragment that was immunoprecipitated.

Edited to add:  yeah, what Rory said... ;)

ADD REPLY

Login before adding your answer.

Traffic: 936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6