Question

edgeR for looking at differences in coverage?

0

Entering edit mode

am3 • 0

@am3-20682

Last seen 6.2 years ago

I have whole exome sequencing data from insect individuals from two different populations. I'm interested to see if there are any genome regions that have much better coverage in one population than in the other. (For example, I want to see if the exome capture probes work much better for one population, or see if there are any large deletions.) Would there be any problem, conceptually, with using edgeR for this purpose? They're both dealing with read abundances that vary between individuals and groups of samples. In principle, is calculating differential coverage and calculating its statistical significance different from doing the same thing for differential gene expression?

edgeR exome • 1.3k views

ADD COMMENT • link updated 6.6 years ago by Aaron Lun ★ 29k • written 6.6 years ago by am3 • 0

score 1 · Answer 1 · 2019-05-03

1

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 5 hours ago

The city by the bay

Sounds like my PhD in a nutshell.

https://bioconductor.org/packages/csaw

https://bioconductor.org/packages/diffHic

https://bioconductor.org/packages/cydar (postdoc work)

It's generally fine if you select the features correctly. In RNA-seq, this is not a problem because the features are defined for us. In genome-wide applications, we can't test every position in the genome, so we have to do some filtering - the choice of filtering method determines the validity of the results. See:

https://bioconductor.org/packages/devel/workflows/html/csawUsersGuide.html

https://bioconductor.org/packages/devel/workflows/html/chipseqDB.html

... and related publications for more details.

ADD COMMENT • link 6.6 years ago Aaron Lun ★ 29k

0

Entering edit mode

(Sorry for my delayed response, I apparently didn't have notifications on.) Thank you for this; this will be very helpful! In my exome example, would the list of regions defined by the probe set used for exon capture be sufficient to define the features of interest? If not, could you explain further what you mean by "selecting features"?

ADD REPLY • link 6.6 years ago am3 • 0

0

Entering edit mode

Yes, pre-defined regions from the probe set are fine. The real problems begin when you have to define the features from the data (e.g., peak calling in ChIP-seq data), which requires some care to avoid circularity and data dredging. You don't have to worry about this when your features are defined in advance (from a separate source of data), which makes the statistics nice and simple.

I would also imagine there to be a fairly clear demarcation between captured and non-captured regions, so filtering should be fairly straightforward. Not like ChIP-seq, where weakly "bound" regions dominate and you need to apply stringent filters to get to the interesting bits. The more stringent the filter, the more apparent errors become in the filtering procedure - see here for some comments.

ADD REPLY • link 6.6 years ago Aaron Lun ★ 29k