edgeR - Correct approach to compare abundances of genomic regions?
Hello all,

This is mostly a conceptual question. I have the aim of testing the hypothesis that genomic windows within an organisms genome might have higher read mapping abundances than the same region for a different organism. I am wondering if edgeR, or any other differential expression software really, would be applicable for testing this hypothesis. I understand that read mapping and differential gene and transcript expression have different - and likely harder - challenges than mapping to genic regions, and this leads me to wonder if the approaches used by edgeR, such as the TMM normalization and the shrinkage of dipersion using an empirical Bayes approach, are adequate for genomic data. Thank you for your time.



I imagine that csaw is close to what you want:

The real challenge in your case is finding which regions in one species are homologous to regions in another species, and applying appropriate normalization for uninteresting biases in mappability, sequenceability, etc. This is possibly a rare situation where you could use input controls in an interaction model to cancel out those biases.

edgeR is used all the time for analysing DNA read counts from genomic windows using (for example) reads from ChIP-seq, ATAC-seq, BS-seq or HI-C. Genomic data causes no problems, in fact it is generally simpler than RNA-seq.

If your genomic windows are preset (for example promoter regions) then you can use edgeR directly. If you want to merge adjacent windows into larger DE regions while maintaining FDR control, then csaw is specifically designed for that purpose.

As already mentioned by Aaron, the more difficult issue is that you seem to be comparing different species with different genomes.


