Question

Question about X and Y chromosomes in Diffbind and how they are handled.

0

Entering edit mode

surjray • 0

@surjray-7556

Last seen 9.8 years ago

United States

Dear all,

We are using Diffbind to examine differential peak enrichment between male and female samples. Our input control is from the same sample before the immunoprecipitation stage. We have 4 replicates per treatment. The way we are using the Diffbind commands is exactly like given in the vignette and the manual.

It is our understanding that the Input control counts are subtracted from the ChIPed count data. We are comparing ChIP data from males to ChIP data from females using MACS2peaks in Diffbind. In the differential binding results we are observing that many of the significant peaks are from the X chromosome and these peaks have count data that is higher in females (XX) than males (XY). Our worry is that somehow the input subtraction is not accounting for the starting dose differences in the X chromosome in males and females.

Is there a way to check the Input subtraction stage and if you can please suggest steps to alleviate or explain this issue.

Thanking you.

Surjyendu Ray.

diffbind • 1.5k views

ADD COMMENT • link updated 9.8 years ago by Gord Brown ▴ 670 • written 9.8 years ago by surjray • 0

score 0 · Answer 1 · 2016-01-29

Hi,

The short answer is that DiffBind is not (yet) copy-number aware. The input subtraction won't compensate, as you've observed. Consider a region that has ~100 reads in condition XY, and 200 in condition XX (due to 2 copies of the region), with inputs having 10 and 20 reads respectively... after input subtraction (the default), you have 90 and 180 reads, which will still show up as differentially bound. There are alternative scores: setting "score=DBA_SCORE_READS_FOLD" in dba.count will divide by the input, instead of subtracting. But when the time comes for differential analysis via EdgeR or DESeq, they'll start from the raw counts anyway, so the regions will still show up as differentially bound.

You'll probably have to use a copy-number aware package, such as ABCD-DNA http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514678/ , or csaw http://www.bioconductor.org/packages/release/bioc/html/csaw.html which uses EdgeR's CNV modeling capability.

We have the best of intentions for adding copy-number awareness to DiffBind, but haven't got around to it yet, sorry.

- Gord