Question

Handling outliers in Deseq2

0

Entering edit mode

lirongrossmann ▴ 80

@lirongrossmann-13938

Last seen 5.2 years ago

Hello everyone,

I am using Deseq2 to perform differential gene expression between 2 groups (each 14 samples) using the following code:

ep<-read.table("counts.txt",header = TRUE, row.names = 1)

cp<-read.csv("Annotation.csv")

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Response)

dds <- dds[ rowSums(counts(dds)) > 10, ]

dds <- DESeq(dds)

resGA <- results(dds, lfcThreshold=0.5, contrast=c("Response","High","Low"), altHypothesis="greaterAbs")

I got 152 genes with adjust p-value <0.1 and I ranked them according to the logfold.change. I then used variance stabilizing transformation (vsd) on the expression data and noticed that for some of the highly ranked genes (with a very high logfold change (>8) ) most of the samples had similar values of the vsd transformation across both groups and that very few samples (2-3) within one group had very large values compared to the rest, which might explain the fold change but it seems to me as more outlier related rather than true effect between the groups.

Is there a way to filter for those outlier and get a more "uniform" result, i.e finding genes that are consistently highly expressed in one group compared to the other?

Thanks

outliers deseq2 logfoldchange • 1.7k views

ADD COMMENT • link 8.0 years ago lirongrossmann ▴ 80

score 0 · Answer 1 · 2018-01-20

Are you using the latest version of DESeq2? If so, you need to use this function to meaningfully sort on LFC:

resLFC <- lfcShrink(dds, contrast=c("Response","High","Low"), res=resGA)

This will replace the MLE log2 fold change with a shrunken estimate that is much better for ranking, but it won't change the p-values or adjusted p-values.