edgeR /DESeq2 normalization for differential expression in RNA-seq blood samples
3
2
Entering edit mode
aec ▴ 90
@aec-9409
Last seen 5.4 years ago

Dear all,

Both DESeq2 and edgeR normalization methods take into account different library sizes and RNA composition between samples, but are they able to account for a high difference of hemoglobin content in human blood samples? In my experiment, a single globin gene consumes between 2% - 50% of the sequencing effort depending on the sample.

Thanks,

hemoglobin edgeR deseq2 normalization rnaseq • 3.3k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 29k
@alun
Last seen 1 hour ago
The city by the bay

Methods like TMM normalization (in calcNormFactors()) or median-based normalization (in DESeq) were designed for exactly the scenario you describe. If one sample has more hemoglobin mRNAs, the coverage of all other genes will be suppressed when the total amount of sequencing resource is fixed. This is what is known as "composition bias", and is the whole motivation for computing scaling factors (normalization factors in edgeR, size factors in DESeq - note that these are not the same thing!) that are not simply derived from the library size of each sample.

In practice, the success of normalization depends on the presence of sufficiently large counts for all the non-hemoglobin genes. All of these methods operate on ratios, and once the counts get too small, the ratios become unstable or undefined, requiring some ad hoc workarounds to avoid nonsensical scaling factor estimates. You can check that this is not the case in your data by creating MA plots for each sample (e.g., with plotSmear); if you see lots of discrete lines or patterns on the left, your counts are probably too low.

The other consideration is that, if the proportion of hemoglobin is highly variable, you will want to set robust=TRUE in downstream edgeR functions for dispersion estimation. This ensures that the increased variance of the hemoglobin genes will not inflate the apparent variability of the variances during empirical Bayes shrinkage. One could also filter out the hemoglobins entirely from the analysis, though this may not be sufficient; such high variability in hemoglobins can be a symptom of an underlying source of variability that affects other genes.

ADD COMMENT
3
Entering edit mode
@ryan-c-thompson-5618
Last seen 12 months ago
Icahn School of Medicine at Mount Sinai…

Aaron's answer adequately covers the theoretical reasons that the normalizations used in edgeR and DESeq2 are appropriate for data with variations in globin content, so I will just add that empirically, I have actually used edgeR on such a data set. Specifically, we were testing a custom globin blocking protocol, so by design there were large differences in globin content between the globin-blocked samples and the non-globin-blocked control samples. The normalization performed exactly as desired. You can see the resulting MA plot here, showing all the globin genes with large negative fold changes and all other genes centered around zero, indicating proper normalization:

https://darwinawardwinner.github.io/resume/examples/Salomon/globin/figure4%20-%20maplot-colored.pdf

ADD COMMENT
0
Entering edit mode
@fischer-philipp-18490
Last seen 3.4 years ago
Austria

Yes I think so normalization with DESeq2 and edgeR does take into account the library composition (meaning the composition between samples). In DESeq2 the normalization is done via a sclaing factor which is calculated via the geometric mean. The geometric mean does not emphasize on "outliers". Furthermore the scaling factor uses the median of all genes per sample putting more emphasize on housekeeping genes / moderately expressed genes.

I am not sure what your biological background is could`nt one just get rid of the hemoglobin stuff before? Like with the ribodepletion? I do not know if there are any papers on this issue but if you already have some data you could just check the effect on your own.

ADD COMMENT
0
Entering edit mode

The RNA-seq samples have already been sequenced (without globin depletion).

ADD REPLY

Login before adding your answer.

Traffic: 914 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6