Would you like to tell me whether DESeq2 can analyze m6A-seq to detect differential m6A peaks? Other papers firstly used input reads to normalize m6A-seq reads, and then performed fisher’s exact test. It seems that DESeq2 can not accept normalized reads.
In DESeq2 instead of pre-scaling counts, you would just include the assay type as a design variable. Then you are essentially doing a paired analysis with assay type as the condition effect. E.g. ~sample + assay will compare, within each sample, the enriched vs input LFC. I would look at MA plots though as a diagnostic, and consider if you have features could be used to define controlGenes, but we don't have any automated workflows to do this. See ?estimateSizeFactors for description of controlGenes, and this step can then be run before DESeq if you want to specify that certain features be used for estimating the size factors.
Hello Michal, Thanks for your reply. Maybe what I describled was not clearly. I have 12 samples:
I want to analyze m6A-seq to detect differential m6A peaks between case and control groups. For m6A-seq, input samples must be used to normalize/control IP samples. So I want to know whether DESeq2 can regress out/control input samples. Thanks again.
You can accommodate additional designs than just enriched vs input by using interaction terms. E.g. you are asking with your comment whether you can see enrich / input changes across another variable. That's just adding an interaction term. Do you have a local statistician that you could work with on choosing a design and interpreting results?
do you mean design (~assay(IP/input) + condition (case/control))?
Sure, that’s one way to code this up. It would be good to discuss an analysis plan with a statistician but we and many groups have used this approach to analyze enrichment or differential enrichment within DESeq2.
I have a question considering the differential m6A peaks here. DESeq2 use the read couts for each peak as input. Does the dependency between peaks affect the results? For example, different peaks on the same gene. They might correlate to each other and the expression of this gene. How does DESeq2 consider this?
Are you saying you will have multiple rows where some of the data is repeated?
I don't think it's a concern that there is natural biological correlation among the peaks that are near each other. Often this correlation is accounted for by the design (either batch or condition). It's correlation _under the null_ which anyway would affect the inflation of error rate.
There will be correlation among the p-values across peaks, but I'm not very worried about this positive dependency with respect to the FDR methods.
Thank you very much for the reply. I am saying that each row represents one peak, so it is not repeat. But multiple peaks (rows) might come from one gene.
So DESeq2 does not require the independency between different rows in the input?
Not strict dependency, certainly not under the alternative hypothesis.