Question: DESeq2 baseMean values for each sample
2
5.0 years ago by
aburkha6920
United States
aburkha6920 wrote:

Is it possible to extract the baseMean data from each replicated sample using DESeq2? In DESeq, the output was arranged in a format of baseMeanA, baseMeanB, etc. that correlated with each sample. In DESeq2 so far I can only get a results output that has the baseMean calculated across all of the samples. I have replicated time points in a time course and would like the baseMean data for each time point as well as the overall baseMean.

Thank you.

deseq2 • 7.0k views
modified 5.0 years ago • written 5.0 years ago by aburkha6920
Answer: DESeq2 baseMean values for each sample
6
5.0 years ago by
Michael Love26k
United States
Michael Love26k wrote:

We wrote the results table in DESeq2 to be more general, as sometimes users have dozens of conditions, or no replicated conditions but a crossed design, or numeric covariates, etc.

You can easily construct a table with the base means of each group using some custom code, for example, if the variable is 'condition':

baseMeanPerLvl <- sapply( levels(dds$condition), function(lvl) rowMeans( counts(dds,normalized=TRUE)[,dds$condition == lvl] ) )

To anyone who visits this many years later: I found this one liner fatally stopped halfway through my conditions list. Adding drop=F seems to fix it due to rowSums needing a 2D data.frame. Could be from an update to DESeq2.

baseMeanPerLvl <- sapply( levels(dds$condition), function(lvl) rowMeans( counts(dds,normalized=TRUE)[,dds$condition == lvl, drop=F] ) )


Is it similarly possible to extract other columns from the DESeq2 results table, such as log fold change for each replicated sample?

No, the LFC is not calculated by DESeq2 per sample.

Answer: DESeq2 baseMean values for each sample
0
5.0 years ago by
aburkha6920
United States
aburkha6920 wrote:

Thank you for the very prompt and helpful response. The code above successfully gave me a table with the baseMeans for each time point.

I would also like to get the baseMeans for each time point within each plant line. My data is 2 plant lines with multiple replicates per time point (6 time points total). In all, I would like a table with the baseMeans for all 12 different options with each mean being for a distinct time point and plant line. I am using the "time series experiment" online tutorial to scaffold my data entry. I tried to adjust the above program to fit my needs but was unable to do so; sorry I am extremely new with R.

Thanks

It sounds like you just need to define a new column which combines the two:

dds$combined = factor(paste0(dds$time, "-", dds$plantline)) then repeat the above with combined instead of condition. ADD REPLYlink modified 4.6 years ago • written 5.0 years ago by Michael Love26k I had a similar question and I ran the code on my data (3 samples= x,y,z, 3 time points= day0,day1day2, so 9 combinations in total) unfortunately when look at the baseMean data all the output is NA. day0 - x day0 - y day0 - z day1 - x day1 - y day1 - z day2 - x day2 - y day2 - z 0610005C13Rik NaN NaN NaN NaN NaN NaN NaN NaN NaN 0610007N19Rik NaN NaN NaN NaN NaN NaN NaN NaN NaN Can you briefly explain what the function is doing : baseMeanPerLvl <- sapply( levels(dds$condition), function(lvl) rowMeans( counts(dds,normalized=TRUE)[,dds$condition == lvl] ) ) Thank you very much, Linda ADD REPLYlink written 4.6 years ago by lmolla10 1 This line of code says, for each level of a factor (here, dds$condition), take the row means of the normalized counts of the samples for this level. Then return the output as a matrix. It requires that you have previously run either DESeq() or estimateSizeFactors() on the dds.

Answer: DESeq2 baseMean values for each sample
0
5.0 years ago by
aburkha6920
United States
aburkha6920 wrote:

Thank you very much for your help.