Question: Protein differential Expression analysis
0
20 months ago by
cardin.julie0 wrote:

I have experienced very good results with DESeq2 for my RNASeq analysis. As far as I understand, it is a tool that normalise our data from sequencing to make them comparable.

I have a new project implicating proteins counts.

I have  couple of data sets. For each sample we have:

rows with proteins names (instead of genes), with their respective counts.

My goal is again to make a differential expression between treated groups versus controls.

I wonder if I can use DESeq2 to do a differential expression for proteins?

Or if the correcting factor that is used by DESeq2 to correct counts for RNASeq is specific to DNA sequencing and it is not applicable to proteins?

Is there a tool that do the exact same thing as DESeq2 but for proteins?

modified 20 months ago by Laurent Gatto1.2k • written 20 months ago by cardin.julie0
1
20 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Julie

There is no "in-principle" reason why DESeq2 shouldn't produce useful results also for count data from technologies that are not DNA-sequencing based. There are two issues:

• the normalization (a.k.a. size factors)
• the error model (Gamma-Poisson, GP)

Both of these are quite generic, and whether they are appropriate for your data is a specific question on the particular dataset, rather than the technology that produced it

Regarding the  normalization, can you show us MA plots between replicates (and also, between different conditions)? Also include the line of M=0.

Regarding the error model, you will want to do model fit diagnostics to see whether the residuals for each protein across replicates and conditions (after fitting the model) are reasonably consistent with the GP, in particular, that they look unimodal. There is one argument for piece of mind: If you have enough replicates (or: degrees of freedom) that you can actually "see" deviations from the GP assumption (i.e. >=dozens), then you don't really need a parametric method, and you could switch just as well to something non-parametric, without any of DESeq2's shrinkage or distributional assumptions. If not, then it obviously cannot matter much.

Kind regards
Wolfgang

1

In addition to Wolfgang's answer, you could also use edgeR via the msmsTests package: if you have your spectral counting data in a spreadsheet, you can import it as an MSnSet object with readMSnSet2 and run the functions in msmsTests as described in the vignette.