Deseq2 with replicate mean and variance information
2
0
Entering edit mode
blehtine • 0
@blehtine-15422
Last seen 3.5 years ago

I'm trying to find differentially abundant proteins in different environmental (carbon sources, growth rates, stressors) conditions in E. coli. Altogether, I have absolute proteomics data for 2359 proteins (in 22 conditions). For each protein I have the protein count, and the coefficient of variance from three replicates in every condition. I want to  do pairwise comparisons between conditions to identify proteins that change from one condition to the other. For the task, I am attempting to use Deseq2 on R.

The problem is that I have the data (Schmidt et al 2016) as the abundance estimate based on 3 replicates, and the coefficient of variance (standard deviation/mean) for each protein in each condition. The DESeq() function, however, requires the protein counts in all 3 replicates. The problem with DESeq() function is that if no information of replicates is given, it compares each condition to the 21 other conditions which is what I don't want as I want pairwise comparisons.

I thought of the following to circumvent the problem: Simulate the replicate information, for instance by sampling 3 replicates from normal distribution where mean = protein count and std = coefficient of variance * mean. However, I'm wondering if there is a neater way of doing this. If I'm not mistaken, the DESeq() function uses the triplicate information to calculate the variance between the triplicates anyway, so I would like to convey the information I already have to the function.

The code I'm using works, but in case it's useful:

#Load the data

data = read.csv(file="Ecoli_proteome.csv", header=TRUE, sep=",")

cts = data.matrix(data)
coldata = colnames(cts)

# Remove NA's as it disturbs the DESeq function

narows = apply(cts, 1, function(x) any(is.na(x)))
ctcClean = cts[ !narows, ]

# Define the sample names

samples = data.frame(row.names=c(coldata), condition=as.factor(coldata))
dds =DESeqDataSetFromMatrix(countData = ctcClean,
                              colData = samples,
                              design = ~ condition)

ddsDE = DESeq(dds)

# I used contrast attribute to find differences between Glucose and Glycerol, however, I would like the analysis to originally only take into account these two conditions.
ddsRES = results(ddsDE, contrast=c("condition", "Glucose", "Glycerol")

Would anybody have experience or knowledge about how to proceed?

deseq2 deseq differential analysis proteomics • 634 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

I wouldn't first try to shoehorn the data for DESeq2 via a data simulation procedure, but instead work with a method that can incorporate the summarized values and CV for proteomics analysis. I don't know what this is but maybe others can suggest. There's nothing wrong with your approach, but it should involve debugging and testing/benchmarking, whereas there may be a more appropriate off-the-shelf package.

ADD COMMENT
0
Entering edit mode
blehtine • 0
@blehtine-15422
Last seen 3.5 years ago

Thank you very much! I started off by looking for an off-the-shelf solution but unfortunately couldn't find a good one that would work for the type of data I have at hand. That's how I ended up trying with DESeq2 in the first place. I'm very open for other solutions, and will definitely post it here if I find it outside the forum as well

ADD COMMENT

Login before adding your answer.

Traffic: 296 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6