Search
Question: Fitting Gamma GLMs for examining differential expression
1
8 months ago by
jspmccain10
jspmccain10 wrote:

It's my understanding that negative binomial GLMs are used in different ways in both EdgeR and DeSeq2. Both of these packages assume the input are un-normalized counts. I am working with proteomics data from a mass spectrometer. Specifically, I'm using TMT-Labeling to quantify changes in differential expression. However, the normalization procedure is a bit unintuitive and we followed methods from Plubell et al (2017; http://www.mcponline.org/content/16/5/873.short). Importantly, we do not get integers as our expression values. Also in this normalization procedure, we only compare peptides which were found in all treatments.

So my questions are:

- Is there a way to implement a Gamma GLM in EdgeR or DeSeq2 to better fit these continuous data?

- If not, would fitting a gamma GLM with the regular glm() function in R, for each peptide, be a possible approach? (With a much more stringent significance cutoff)

Thanks!

modified 8 months ago by Ryan C. Thompson7.1k • written 8 months ago by jspmccain10
4
8 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.1k wrote:

Both edgeR and DESeq2 are built around the negative binomial distribution, so there is no way to shoehorn a different distribution into them. However, if your raw data actually consists of counts, you can probably still use these methods with your custom normalization method. Instead of computing the normalized values directly, compute the appropriate normalization factors and feed them into edgeR or DESeq2 along with the raw counts. (The specifics of how to do this are different for both methods, but should be documented in their respective manuals.)

In addition, edgeR does not require counts to be integers, although it does require that the input values be on a raw count scale. (For example, RSEM estimated counts are acceptable inputs, but FPKM values are not.)

Anyway, you could certainly fit a gamma GLM to each peptide. As I'm sure you're aware, you will lose the primary benefit of edgeR or DESeq2, which is the empirical Bayes moderated estimation of biological variability. If you want to use a similar method to edgeR or DESeq2 that doesn't assume a negative binomial distribution, I would recommend using limma-trend, i.e. a standard limma analysis with trend=TRUE in the call to eBayes. (Don't use voom, since your inputs are already normalized, not raw counts.)

There is one other issue you should be aware of with your data, which is that your filtering criterion is not independent of your test statistic. By requiring a peptide to be present in all samples, you are effectively filtering on the minimum count (or minimum abundance), which is equivalent to using a lower abundance threshold for peptides with large fold changes and/or high variances, and a higher threshold for low-variance, small fold-change peptides. Since your filtering and test statistic are not independent, your FDR values may not be trustworthy. Lun & Smyth 2014 clearly demonstrates the failure of false discovery rate control when non-independent filtering is used (See Tables 1 & 2): https://academic.oup.com/nar/article/42/11/e95/1442937 (This paper demonstrates the problem in a ChIP-seq context, but the issue is not specific to ChIP-seq.)

This is super helpful, thank you for the limma suggestion, and for your filtering criterion insight. I will be looking into the application of limma, and to be honest I'm still trying to understand the empirical Bayes estimation of biological variability.

Regarding the filtering criterion my understanding is as follows: tandem mass tags (TMT) are used for peptide quantitation from mass spectrometry due to high variability across runs. I think you if you included peptides that are not found in all samples, your normalization method would not be able accurately quantify differentially expressed proteins (peptides in the MS). I'm still working through the literature on this, but this normalization method in the link above (http://www.mcponline.org/content/16/5/873.short) seems to be the most appropriate for our experiment. If you have any suggestions on ways forward, please let me know! Thanks so much for your response.

1