Significantly regulated genes produced by only a few affected samples
1
0
Entering edit mode
serpalma.v ▴ 60
@serpalmav-8912
Last seen 12 months ago
Germany

Hello
I used DESeq2 to perform differential expression between three groups (condition1, condition2 and condition3). Samples also come from three different countries.

PCA (using rlog transformed data) showed a similar pattern of clustering using country and condition (please see link) So I controlled for country as well (~ country + condition).

The MA plot showed two significant genes extremely affected by the condition, but upon inspection of the counts with plotCounts(), I see that there are only a few samples driving the difference. For one gene is even just one sample (please see link).

I checked other DEGs not as extreme and the plot of counts seemed much more reasonable.

Somehow outliers are being allowed to affect the coefficient estimation, but I am not sure why is that. This is how I ran the computations:

des <- model.matrix(~ 0+CountryCode + condition, data = colData(dds))
dds2 <- DESeq(dds, full = des, betaPrior = FALSE) # dds is a deseq data set with design ~1

dds2_cond3_vs_cond1 <- results(dds2, name = "conditionCond3", alpha = 0.05)

dds2_cond2_vs_cond1 <- results(dds2, name = "conditionCond2", alpha = 0.05)

dds2_cond2_vs_cond3 <-results(dds2, alpha = 0.05, contrast = list("conditionCond2", "conditionCond3"))

I would like ask for guidance on how to deal with those extremely regulated DEGs, and also if the data has been modeled properly, considering the highly overlapping effects of country and and condition. I want to make sure I am not in a "garbage in garbage out" type situation.

deseq2 • 1.3k views
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

Can you use lfcShrink plus an LFC filter to deal with the genes where a few outliers are not being filtered and are driving the DE signal? You can use

subset(res, abs(log2FoldChange) > 0.5)

After running lfcShrink()

0
Entering edit mode

Trying to run lfcShrink for the coefficient cond2 vs cond1, using lfcShrink(dds2, coef=4) (first 3 coefficients are for the countries) gives "Error: subscript contains invalid names". I tried changing the coef variable to the name of the coefficient, and also using the contrast argument with both, the number or the name of the coeficient, and the same error occurs.

1
Entering edit mode

I'm sorry, I gave bad advice. You will have to use ~countrycode+condition in your DESeq() call, because we cannot perform LFC shrinkage without an intercept term. It's strange because when I run this on a small example using the current release, I get an error that mentions the need for an intercept. What version of DESeq2 are you using?

0
Entering edit mode

Hi Michael,

the version is DESeq2_1.18.1

I did the changes you mentioned and LFC shrinkage worked.

Thanks again!

0
Entering edit mode

I plotted the shrinked fold changes against the mean and the extreme gene is now shown significant but almost not-regulated (please see figure in link).

Sorry but I did not get the part of subsetting after lfc shrinking by abs(log2FoldChange) > 0.5. Did you mean to remove the subset having larger log2fc  than 0.5 after shrinking, and keeping the remaining results?

0
Entering edit mode

LFC shrinkage doesn’t by default change the p-values (although it is optional to switch to a posterior probability based analysis using lfcShrink). I’m suggesting you to remove small LFC using subset (for v1.18). You can also use a padj filter with subset to make the final gene set, by adding “& padj < 0.05”