Question

How to decide on genes linked by differential binding peaks in opposite conditions.

0

Entering edit mode

vanhzh • 0

@vanhzh-9993

Last seen 6.3 years ago

Dear all,

I have done chip-seq and called differential binding peaks between WT and mutant conditions using "diffbind" package. I annotated the two lists of condition specific peaks to nearest genes and found that 537 genes were annotated to both lists. Here is the venn diagram:

Venn diagram of genes annotated to mutant and WT differential peaks

I need to link those genes with RNA-seq. I need to compare the gene expression distribution between WT and mutant target genes. My question is how should I decide on those 537 genes? Should I discard them from both WT and mutant annotated genes or keep them?

Any helps are highly appreciated!

Zhenhua Hu

P.S: Thank Steve Lianoglou for the guidance. Hopefully its much clearer.

annotation • 1.7k views

ADD COMMENT • link updated 6.2 years ago by Rory Stark ★ 5.2k • written 6.3 years ago by vanhzh • 0

0

Entering edit mode

You will have to better formulate the question you are trying to answer with this data before you can get help on deciding what to do after you have collected it.

Once the question you have in mind is more clear, likely the things you should do will become more self evident.

ADD REPLY • link 6.3 years ago Steve Lianoglou ★ 13k

score 0 · Answer 1 · 2019-10-23

We currently don't have reliable methods for mapping between ChIP-seq peaks and the genes they regulate (except possibly if they happen to binding inside of an annotated promoter). Some of the peaks may be non-functional, and we don't know which of the "nearby" genes inter-genic binding sites regulate with more data. Even with corresponding RNA-seq data, this is difficult to untangle.

Basically, we hypothesize that a set of genes that are "near" differentially bound sites are enriched for genes regulated by the binding factor. That is, they are more likely than a set of background genes to change their transcription levels. You can verify that by comparing the distributions of read counts in those genes in the two sample groups (you can even test if they "significantly" differ). You should include the genes that are in both sets (the 537 overlapping) in both of these distributions; the sets are not meant to be definitive, only to show enrichment for differential regulation.

The gene sets can be further refined by looking at differential expression, and the direction of the fold change. So you can consider the set of gens that are proximal to differential binding sites and exhibit differential expression. You can refine those further by including genes that gain expression in the sample with increased binding (sign of fold change is the same).