DESeq2 Normalization causing overcorrection in libaries sizes for a few samples
1
0
Entering edit mode
asrour • 0
@asrour-24400
Last seen 11 months ago

Hi guys,

I am using DESeq2 to normalize my samples collected from different tissues , and I am interested in plotting the expression of genes across different tissues using the normalized counts. One group of samples coming from the same tissue increased multiple of times in library sizes after normalization way above the median of the rest of the samples. I haven't seen this before, when I checked these samples before normalization it was clear a large percentage of genes are not expressed (15-20%) compared to other tissues. However, one would not expect this overcorrection by multiple times. Is this normal? and what would be the best way to deal with it?

I appreciate any suggestion, Best, Ali

deseq2 normalization • 297 views
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

One way to examine this is:

take a sample from this group, say X, and a sample not from this group, Y and then plot:

1) raw counts for X vs Y 2) scaled counts for X vs Y (counts(dds, normalized=TRUE))

In both plots you can add abline(0,1) and set log="xy". The scaling should be assisting in bringing the values towards the y=x line.

0
Entering edit mode

Hi Michael, Thank you for your response, I examined one sample from each group and plot them like you said. The scaled counts for the normalized one increased by 5 fold in comparison to its original raw counts in the outlier tissue group. Normalizing in my situation for this specific outlier tissue group seemed to worsen and bias the counts versus the rest of the tissue groups. Best, Ali

0
Entering edit mode

Are you saying scaling makes the points farther from y=x line?

0
Entering edit mode

Correct , scaling for this specific tissue group (outlier group) is making the points farther from y=x line

1
Entering edit mode

If the sequencing depth is confounded with the biological grouping, and varies by eg 5-10 fold across group, you may need to be more hands-on with normalization by specifying controlGenes in estimateSizeFactors(). We refer to this situation as a pathologic case for relying on in silico normalization alone in the DESeq2 paper. Also you may need to filter out genes that only have counts in a small number of samples, as that can give spurious DE from the lowly sequenced biological group having counts below limit of detection.