I have whole exome sequencing data from insect individuals from two different populations. I'm interested to see if there are any genome regions that have much better coverage in one population than in the other. (For example, I want to see if the exome capture probes work much better for one population, or see if there are any large deletions.) Would there be any problem, conceptually, with using edgeR for this purpose? They're both dealing with read abundances that vary between individuals and groups of samples. In principle, is calculating differential coverage and calculating its statistical significance different from doing the same thing for differential gene expression?
Sounds like my PhD in a nutshell.
https://bioconductor.org/packages/cydar (postdoc work)
It's generally fine if you select the features correctly. In RNA-seq, this is not a problem because the features are defined for us. In genome-wide applications, we can't test every position in the genome, so we have to do some filtering - the choice of filtering method determines the validity of the results. See:
... and related publications for more details.