Hi guys,
I am using DESeq2 to normalize my samples collected from different tissues , and I am interested in plotting the expression of genes across different tissues using the normalized counts. One group of samples coming from the same tissue increased multiple of times in library sizes after normalization way above the median of the rest of the samples. I haven't seen this before, when I checked these samples before normalization it was clear a large percentage of genes are not expressed (15-20%) compared to other tissues. However, one would not expect this overcorrection by multiple times. Is this normal? and what would be the best way to deal with it?
I appreciate any suggestion, Best, Ali
Hi Michael, Thank you for your response, I examined one sample from each group and plot them like you said. The scaled counts for the normalized one increased by 5 fold in comparison to its original raw counts in the outlier tissue group. Normalizing in my situation for this specific outlier tissue group seemed to worsen and bias the counts versus the rest of the tissue groups. Best, Ali
Are you saying scaling makes the points farther from y=x line?
Correct , scaling for this specific tissue group (outlier group) is making the points farther from y=x line
If the sequencing depth is confounded with the biological grouping, and varies by eg 5-10 fold across group, you may need to be more hands-on with normalization by specifying
controlGenes
in estimateSizeFactors(). We refer to this situation as a pathologic case for relying on in silico normalization alone in the DESeq2 paper. Also you may need to filter out genes that only have counts in a small number of samples, as that can give spurious DE from the lowly sequenced biological group having counts below limit of detection.