Question

Contradictory results in smallRNA differential expression analysis using DESeq2

0

Entering edit mode

jucosie • 0

@7cb53f02

Last seen 11 months ago

Spain

Hi everyone,

I have some questions regarding the normalizations applied by DESeq2 and some contradictory results I have obtained, I hope you can help me. Our experimental design is as follows: we have libraries of small RNAs from 10 healthy individuals and 20 infected patients (10 with mild symptoms and 10 with severe symptoms), which are sampled at 2 different time points. We want to see how the RNA families vary between the different severity groups and time points and to obtain possible biomarkers. The strategy followed for the analysis is as follows:

We generate a matrix of counts containing each of the sequences present in the 50 samples (rows) and the absolute no. of counts for each sample (column), using in-house scripts. We also obtain the same matrix with the normalized counts in RPMs.
To be able to work in R with this matrix, and since we are only interested in potential endogenous biomarkers, we filter the matrix keeping only what maps against human, and eliminating rows with very low counts.
This matrix is the input we give to DESEqDataSetFromMatrix().

We then perform the various contrasts we are interested in. For example:

acute_vs_mild_agesex <- DESeqDataSetFromMatrix(countData = counts_table_human_sexage,
                                 colData = sampletable_2,
                                 design = ~ Sex + Age_group + Group) 

acute_vs_mild_mild_agesex$Group <- relevel(acute_vs_mild_mild_agesex$Group, ref ="T1_mild")
acute_vs_mild_mild_agesex$Age_group <- factor(acute_vs_mild_mild_agesex$Age_group, c(1,2,3,4))

acute_vs_mild_agesex2 <- DESeq(acute_vs_mild_agesex)

res_T1acute_vs_mild_agesex <- results(acute_vs_mild_agesex2, contrast =c("Group", "T1_acute", "T1_mild"), lfcThreshold = 0.585)
shrunk_T1acute_vs_mild_agesex <- lfcShrink(acute_vs_mild_agesex2, coef="Group_T1_acute_vs_T1_mild", res=res_T1acute_vs_mild_agesex)

With this contrast, what I intend to do is to obtain the DE sequences between acute and mild patients (in time 1), controlling for the variables sex and age group. It should be noted that my groups of patients are not balanced in the age group category: one of the groups presents mostly older patients. Here my first doubt arises: is it correct to include the age group variable in the design even though it is not balanced?

But my main doubt appears when reviewing the results of the contrast. DESEq2 returns 1113 sequences as significant DE (padj < 0.05). But in some cases, when I go to the counts tables (both the one from DESeq2 and the ones I get originally) and check the normalized counts of X sequence of interest, I don't see the fold change that DESeq2 tells me. For example:

Result:

seq baseMean log2FoldChange lfcSE stat pvalue padj

seqX 2527.6030 3.348650 0.8625103 3.204194 1.354413e-03 0.0240597523

Normalized DESeq2 counts of the sequence X

Acute T1: 2.476341e+03 1.378931e+03 1657.99238 2.440383e+03 2.010505e+03 1.016864e+03 2.047027e+03 1.581439e+03 1.468407e+03 2006.66562

Mild T1: 9.940807e+02 7.19070704e+02 1.26202034e+03 2.559517e+03 6.664843e+02 9.851480e+02 720.272552 522.275026 2471.92615 1.765345e+04

Counts in the original matrix (normalized in RPMs).

Acute: 605.636 137.624 101.907 136.753 782.274 316.689 287.133 150.494 370.804 59.239

Mild: 322.246 172.691 359.175 783.238 251.813 381.638 35.494 109.578 62.841 666.775

Is this behavior to be expected, the fact that I cannot observe with the naked eye in the normalized counts a differential expression of the sequences? Can it be due to the normalizations and internal transformations that DESeq2 performs on the counts or is it a false positive?

I hope I have explained myself, otherwise I will provide all the extra info needed.

Thank you very much!

SmallRNA DESeq2 • 1.1k views

ADD COMMENT • link 3.3 years ago jucosie • 0

score 2 · Accepted Answer · 2022-08-02

1) Yes you can put in a controlling covariate even if it has correlation with the condition of interest. This is actually one of the most important cases to include controlling covariates.

2) Regarding the DE genes, I recommend to look at plotCounts for these. Additionally, DESeq2 is looking for changes after controlling for differences due to other factors, which you can't see when you scan the counts. If you want something approximating the condition effect on the log normalized counts controlling for other variables, you could perform vst then remove the variance due to sex and age (see vignette FAQ on how to do this with limma removeBatchEffect), and then boxplot the VST data across group.