Question: Deseq2 with replicate mean and variance information
gravatar for blehtine
19 months ago by
blehtine0 wrote:

I'm trying to find differentially abundant proteins in different environmental (carbon sources, growth rates, stressors) conditions in E. coli. Altogether, I have absolute proteomics data for 2359 proteins (in 22 conditions). For each protein I have the protein count, and the coefficient of variance from three replicates in every condition. I want to  do pairwise comparisons between conditions to identify proteins that change from one condition to the other. For the task, I am attempting to use Deseq2 on R.

The problem is that I have the data (Schmidt et al 2016) as the abundance estimate based on 3 replicates, and the coefficient of variance (standard deviation/mean) for each protein in each condition. The DESeq() function, however, requires the protein counts in all 3 replicates. The problem with DESeq() function is that if no information of replicates is given, it compares each condition to the 21 other conditions which is what I don't want as I want pairwise comparisons.

I thought of the following to circumvent the problem: Simulate the replicate information, for instance by sampling 3 replicates from normal distribution where mean = protein count and std = coefficient of variance * mean. However, I'm wondering if there is a neater way of doing this. If I'm not mistaken, the DESeq() function uses the triplicate information to calculate the variance between the triplicates anyway, so I would like to convey the information I already have to the function.

The code I'm using works, but in case it's useful:

#Load the data

data = read.csv(file="Ecoli_proteome.csv", header=TRUE, sep=",")

cts = data.matrix(data)
coldata = colnames(cts)

# Remove NA's as it disturbs the DESeq function

narows = apply(cts, 1, function(x) any(
ctcClean = cts[ !narows, ]

# Define the sample names

samples = data.frame(row.names=c(coldata), condition=as.factor(coldata))
dds =DESeqDataSetFromMatrix(countData = ctcClean,
                              colData = samples,
                              design = ~ condition)

ddsDE = DESeq(dds)

# I used contrast attribute to find differences between Glucose and Glycerol, however, I would like the analysis to originally only take into account these two conditions.
ddsRES = results(ddsDE, contrast=c("condition", "Glucose", "Glycerol")

Would anybody have experience or knowledge about how to proceed?

ADD COMMENTlink modified 19 months ago • written 19 months ago by blehtine0
Answer: Deseq2 with replicate mean and variance information
gravatar for Michael Love
19 months ago by
Michael Love26k
United States
Michael Love26k wrote:

I wouldn't first try to shoehorn the data for DESeq2 via a data simulation procedure, but instead work with a method that can incorporate the summarized values and CV for proteomics analysis. I don't know what this is but maybe others can suggest. There's nothing wrong with your approach, but it should involve debugging and testing/benchmarking, whereas there may be a more appropriate off-the-shelf package.

ADD COMMENTlink written 19 months ago by Michael Love26k
Answer: Deseq2 with replicate mean and variance information
gravatar for blehtine
19 months ago by
blehtine0 wrote:

Thank you very much! I started off by looking for an off-the-shelf solution but unfortunately couldn't find a good one that would work for the type of data I have at hand. That's how I ended up trying with DESeq2 in the first place. I'm very open for other solutions, and will definitely post it here if I find it outside the forum as well

ADD COMMENTlink written 19 months ago by blehtine0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 224 users visited in the last hour