I am using DESeq2 (R version 3.5.2, DESeq2_1.22.2) on RNASeq read counts to get DE genes between two conditions, each with a few samples. I used these commands:
dds <- DESeqDataSetFromMatrix(countData = rc_data, colData = colData, design = ~ TimePoint + PhenotypeTag); Then: dds_tmp = dds; unnorm_cnt_colSums = colSums(counts(dds_tmp)) unnorm_cnt_colSums_mean_std = c(mean(unnorm_cnt_colSums),sd(unnorm_cnt_colSums)) dds_tmp <- estimateSizeFactors(dds_tmp); cnt_norm <- counts(dds_tmp, normalized=TRUE); norm_cnt_colSums = colSums(cnt_norm) norm_cnt_colSums_mean_std = c(mean(norm_cnt_colSums),sd(norm_cnt_colSums)) result <- list (unnorm_cnt_colSums = unnorm_cnt_colSums, unnorm_cnt_colSums_mean_std = unnorm_cnt_colSums_mean_std, dds_tmp = dds_tmp, norm_cnt_colSums = norm_cnt_colSums, norm_cnt_colSums_mean_std = norm_cnt_colSums_mean_std);
Above, besides standard DESeq2 commands, I computed the total reads (colSums) for each sample, and also mean and sd of total reads across the samples, before and after normalization. I was surprised to see that even after normalization, colSums varied a lot across the samples.
$unnorm_cnt_colSums E01T2_ind25_S17_001 E23T2_ind3 E39T2_ind25 E39T3_ind1 G13T2B 17957634 13496855 15947932 11386670 18747079 G21T2B 13435778 $unnorm_cnt_colSums_mean_std  15161991 2873733 $norm_cnt_colSums E01T2_ind25_S17_001 E23T2_ind3 E39T2_ind25 E39T3_ind1 G13T2B 13237059 17014225 13498446 16324754 15379562 G21T2B 13529330 $norm_cnt_colSums_mean_std  14830563 1631526
Has anyone observed such variability after DESeq2 normalization, or I might be just making some error somewhere.
Thanks a lot.