Handling outliers in Deseq2
1
0
Entering edit mode
@lirongrossmann-13938
Last seen 3.6 years ago

Hello everyone,

I am using Deseq2 to perform differential gene expression between 2 groups (each 14 samples) using the following code:

ep<-read.table("counts.txt",header = TRUE, row.names = 1) 

cp<-read.csv("Annotation.csv")

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Response)

dds <- dds[ rowSums(counts(dds)) > 10, ]

dds <- DESeq(dds)

resGA <- results(dds, lfcThreshold=0.5, contrast=c("Response","High","Low"), altHypothesis="greaterAbs")

 

I got 152 genes with adjust p-value <0.1 and I ranked them according to the logfold.change. I then used variance stabilizing transformation (vsd) on the expression data and noticed that for some of the highly ranked genes (with a very high logfold change (>8) ) most of the samples had similar values of the vsd transformation across both groups and that very few samples (2-3) within one group had very large values compared to the rest, which might explain the fold change but it seems to me as more outlier related rather than true effect between the groups.

Is there a way to filter for those outlier and get a more "uniform" result, i.e finding genes that are consistently highly expressed in one group compared to the other?

Thanks

outliers deseq2 logfoldchange • 1.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 11 hours ago
United States

Are you using the latest version of DESeq2? If so, you need to use this function to meaningfully sort on LFC:

resLFC <- lfcShrink(dds, contrast=c("Response","High","Low"), res=resGA)

This will replace the MLE log2 fold change with a shrunken estimate that is much better for ranking, but it won't change the p-values or adjusted p-values.

ADD COMMENT

Login before adding your answer.

Traffic: 402 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6