I'm trying to find differentially abundant proteins in different environmental (carbon sources, growth rates, stressors) conditions in E. coli. Altogether, I have absolute proteomics data for 2359 proteins (in 22 conditions). For each protein I have the protein count, and the coefficient of variance from three replicates in every condition. I want to do pairwise comparisons between conditions to identify proteins that change from one condition to the other. For the task, I am attempting to use Deseq2 on R.
The problem is that I have the data (Schmidt et al 2016) as the abundance estimate based on 3 replicates, and the coefficient of variance (standard deviation/mean) for each protein in each condition. The DESeq() function, however, requires the protein counts in all 3 replicates. The problem with DESeq() function is that if no information of replicates is given, it compares each condition to the 21 other conditions which is what I don't want as I want pairwise comparisons.
I thought of the following to circumvent the problem: Simulate the replicate information, for instance by sampling 3 replicates from normal distribution where mean = protein count and std = coefficient of variance * mean. However, I'm wondering if there is a neater way of doing this. If I'm not mistaken, the DESeq() function uses the triplicate information to calculate the variance between the triplicates anyway, so I would like to convey the information I already have to the function.
The code I'm using works, but in case it's useful:
#Load the data data = read.csv(file="Ecoli_proteome.csv", header=TRUE, sep=",") cts = data.matrix(data) coldata = colnames(cts) # Remove NA's as it disturbs the DESeq function narows = apply(cts, 1, function(x) any(is.na(x))) ctcClean = cts[ !narows, ] # Define the sample names samples = data.frame(row.names=c(coldata), condition=as.factor(coldata)) dds =DESeqDataSetFromMatrix(countData = ctcClean, colData = samples, design = ~ condition) ddsDE = DESeq(dds) # I used contrast attribute to find differences between Glucose and Glycerol, however, I would like the analysis to originally only take into account these two conditions. ddsRES = results(ddsDE, contrast=c("condition", "Glucose", "Glycerol")
Would anybody have experience or knowledge about how to proceed?