Question: Handling outliers in Deseq2
0
22 months ago by
lirongrossmann40 wrote:

Hello everyone,

I am using Deseq2 to perform differential gene expression between 2 groups (each 14 samples) using the following code:

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Response)

dds <- dds[ rowSums(counts(dds)) > 10, ]

dds <- DESeq(dds)

resGA <- results(dds, lfcThreshold=0.5, contrast=c("Response","High","Low"), altHypothesis="greaterAbs")

I got 152 genes with adjust p-value <0.1 and I ranked them according to the logfold.change. I then used variance stabilizing transformation (vsd) on the expression data and noticed that for some of the highly ranked genes (with a very high logfold change (>8) ) most of the samples had similar values of the vsd transformation across both groups and that very few samples (2-3) within one group had very large values compared to the rest, which might explain the fold change but it seems to me as more outlier related rather than true effect between the groups.

Is there a way to filter for those outlier and get a more "uniform" result, i.e finding genes that are consistently highly expressed in one group compared to the other?

Thanks

deseq2 outliers logfoldchange • 332 views
written 22 months ago by lirongrossmann40
Answer: C: Handling outliers in Deseq2
0
22 months ago by
Michael Love26k
United States
Michael Love26k wrote:

Are you using the latest version of DESeq2? If so, you need to use this function to meaningfully sort on LFC:

resLFC <- lfcShrink(dds, contrast=c("Response","High","Low"), res=resGA)

This will replace the MLE log2 fold change with a shrunken estimate that is much better for ranking, but it won't change the p-values or adjusted p-values.