Dear All,
I have chromatin data, from ATAC and other histone marks and RNA-Seq data for same samples. Lets say I have 14 independent samples subjected to ATAC-Seq and RNA-Seq, but at different times, different sequencing centres.
Now I want to compare the signal from chromatin data to gene expression data, lets say calculating the correlation values of chromatin signal at a peak to near by gene expression levels.
As this data is not directly comparable to each other, and sequencing depth normalisation is not enough, I would like to know how to make this data sets comparable to each other such that the analysis is biologically valid and accepted for publication. I could not find any online methods for this sort of analysis, though its been shown in many papers.
Thanks,
Hi! It's not clear to my what you're doing. When you say:
"14 independent samples subjected <...> at different times" are you implying that:
1. The ATAC-Seq was generated at biological days 1 .. 2 .. 10 (for example) while RNA-seq was at biological days 1 ... 3 .. 5 ?
2. You sent off 14 ATAC-seq libraries for sequencing, each one independently to the sequencing centre? And the 14 RNA-seq libraries were also independently sent?
3. The ATAC-Seq and RNA-seq were generated from the same cells, at the same biological day (i.e. you split your cell population into two, and then made libraries from the two different protocols). Then you sent off say 5 and then 5 and then another 4 libraries to the same or different sequencing centres?
In the case of (1) and (2) there really is no truly "valid" way of doing the comparison. In the case of (3), you can analyse the RNA-seq and ATAC-seq independently of each other (as appropriate for each of the techniques), making sure that you have a "batch" variable in your design formula which takes into account which samples were run on the same sequencing machine in the same run (this is actually what's important, not which sequencing centre your samples went to). I'm also assuming that you have at least three replicates per assay for each of the biological conditions you're investigating (or two, at the non-ideal-and-really-invalid bare minimum).
Then you combine the output (say, differential gene expression with ATAC-seq differentially detected peaks at the same time points). 14 samples isn't really enough to try things like WGCNA and other more complex, fun, correlation tools.