Question

Fitting Gamma GLMs for examining differential expression

1

Entering edit mode

jspmccain ▴ 10

@jspmccain-13725

Last seen 5.4 years ago

It's my understanding that negative binomial GLMs are used in different ways in both EdgeR and DeSeq2. Both of these packages assume the input are un-normalized counts. I am working with proteomics data from a mass spectrometer. Specifically, I'm using TMT-Labeling to quantify changes in differential expression. However, the normalization procedure is a bit unintuitive and we followed methods from Plubell et al (2017; http://www.mcponline.org/content/16/5/873.short). Importantly, we do not get integers as our expression values. Also in this normalization procedure, we only compare peptides which were found in all treatments.

So my questions are:

- Is there a way to implement a Gamma GLM in EdgeR or DeSeq2 to better fit these continuous data?

- If not, would fitting a gamma GLM with the regular `glm()` function in R, for each peptide, be a possible approach? (With a much more stringent significance cutoff)

Thanks!

deseq2 edger differential expression • 1.4k views

ADD COMMENT • link updated 6.1 years ago by Ryan C. Thompson ★ 7.9k • written 6.1 years ago by jspmccain ▴ 10

score 4 · Accepted Answer · 2018-03-20

Both edgeR and DESeq2 are built around the negative binomial distribution, so there is no way to shoehorn a different distribution into them. However, if your raw data actually consists of counts, you can probably still use these methods with your custom normalization method. Instead of computing the normalized values directly, compute the appropriate normalization factors and feed them into edgeR or DESeq2 along with the raw counts. (The specifics of how to do this are different for both methods, but should be documented in their respective manuals.)

In addition, edgeR does not require counts to be integers, although it does require that the input values be on a raw count scale. (For example, RSEM estimated counts are acceptable inputs, but FPKM values are not.)

Anyway, you could certainly fit a gamma GLM to each peptide. As I'm sure you're aware, you will lose the primary benefit of edgeR or DESeq2, which is the empirical Bayes moderated estimation of biological variability. If you want to use a similar method to edgeR or DESeq2 that doesn't assume a negative binomial distribution, I would recommend using limma-trend, i.e. a standard limma analysis with trend=TRUE in the call to eBayes. (Don't use voom, since your inputs are already normalized, not raw counts.)

There is one other issue you should be aware of with your data, which is that your filtering criterion is not independent of your test statistic. By requiring a peptide to be present in all samples, you are effectively filtering on the minimum count (or minimum abundance), which is equivalent to using a lower abundance threshold for peptides with large fold changes and/or high variances, and a higher threshold for low-variance, small fold-change peptides. Since your filtering and test statistic are not independent, your FDR values may not be trustworthy. Lun & Smyth 2014 clearly demonstrates the failure of false discovery rate control when non-independent filtering is used (See Tables 1 & 2): https://academic.oup.com/nar/article/42/11/e95/1442937 (This paper demonstrates the problem in a ChIP-seq context, but the issue is not specific to ChIP-seq.)