I have whole exome sequencing data from insect individuals from two different populations. I'm interested to see if there are any genome regions that have much better coverage in one population than in the other. (For example, I want to see if the exome capture probes work much better for one population, or see if there are any large deletions.) Would there be any problem, conceptually, with using edgeR for this purpose? They're both dealing with read abundances that vary between individuals and groups of samples. In principle, is calculating differential coverage and calculating its statistical significance different from doing the same thing for differential gene expression?
(Sorry for my delayed response, I apparently didn't have notifications on.) Thank you for this; this will be very helpful! In my exome example, would the list of regions defined by the probe set used for exon capture be sufficient to define the features of interest? If not, could you explain further what you mean by "selecting features"?
Yes, pre-defined regions from the probe set are fine. The real problems begin when you have to define the features from the data (e.g., peak calling in ChIP-seq data), which requires some care to avoid circularity and data dredging. You don't have to worry about this when your features are defined in advance (from a separate source of data), which makes the statistics nice and simple.
I would also imagine there to be a fairly clear demarcation between captured and non-captured regions, so filtering should be fairly straightforward. Not like ChIP-seq, where weakly "bound" regions dominate and you need to apply stringent filters to get to the interesting bits. The more stringent the filter, the more apparent errors become in the filtering procedure - see here for some comments.