Question

Deseq2 with replicate mean and variance information

0

Entering edit mode

blehtine • 0

@blehtine-15422

Last seen 6.1 years ago

I'm trying to find differentially abundant proteins in different environmental (carbon sources, growth rates, stressors) conditions in E. coli. Altogether, I have absolute proteomics data for 2359 proteins (in 22 conditions). For each protein I have the protein count, and the coefficient of variance from three replicates in every condition. I want to do pairwise comparisons between conditions to identify proteins that change from one condition to the other. For the task, I am attempting to use Deseq2 on R.

The problem is that I have the data (Schmidt et al 2016) as the abundance estimate based on 3 replicates, and the coefficient of variance (standard deviation/mean) for each protein in each condition. The DESeq() function, however, requires the protein counts in all 3 replicates. The problem with DESeq() function is that if no information of replicates is given, it compares each condition to the 21 other conditions which is what I don't want as I want pairwise comparisons.

I thought of the following to circumvent the problem: Simulate the replicate information, for instance by sampling 3 replicates from normal distribution where mean = protein count and std = coefficient of variance * mean. However, I'm wondering if there is a neater way of doing this. If I'm not mistaken, the DESeq() function uses the triplicate information to calculate the variance between the triplicates anyway, so I would like to convey the information I already have to the function.

The code I'm using works, but in case it's useful:

#Load the data

data = read.csv(file="Ecoli_proteome.csv", header=TRUE, sep=",")

cts = data.matrix(data)
coldata = colnames(cts)

# Remove NA's as it disturbs the DESeq function

narows = apply(cts, 1, function(x) any(is.na(x)))
ctcClean = cts[ !narows, ]

# Define the sample names

samples = data.frame(row.names=c(coldata), condition=as.factor(coldata))
dds =DESeqDataSetFromMatrix(countData = ctcClean,
                              colData = samples,
                              design = ~ condition)

ddsDE = DESeq(dds)

# I used contrast attribute to find differences between Glucose and Glycerol, however, I would like the analysis to originally only take into account these two conditions.
ddsRES = results(ddsDE, contrast=c("condition", "Glucose", "Glycerol")

Would anybody have experience or knowledge about how to proceed?

deseq2 deseq differential analysis proteomics • 1.5k views

ADD COMMENT • link 6.1 years ago blehtine • 0

score 1 · Answer 1 · 2018-04-04

I wouldn't first try to shoehorn the data for DESeq2 via a data simulation procedure, but instead work with a method that can incorporate the summarized values and CV for proteomics analysis. I don't know what this is but maybe others can suggest. There's nothing wrong with your approach, but it should involve debugging and testing/benchmarking, whereas there may be a more appropriate off-the-shelf package.

score 0 · Answer 2 · 2018-04-04

Thank you very much! I started off by looking for an off-the-shelf solution but unfortunately couldn't find a good one that would work for the type of data I have at hand. That's how I ended up trying with DESeq2 in the first place. I'm very open for other solutions, and will definitely post it here if I find it outside the forum as well