Question

Incorrect DESeq2 Output

0

Entering edit mode

umajs • 0

@umajs-21424

Last seen 4.7 years ago

Hi all,

I am wondering if I have missed any essential components in my script for differential gene expression analysis using DESeq2. I first filtered my data by adjusted p value (0.05) and ranked those by fold change.

The following is an example of my output:

Gene x
baseMean 17.26376
log2FC -21.8639
lfcSE 3.90958
stat -5.59239
pvalue 2.24E-08
padj 4.03E-06
Sample 1 (group x) 0
Sample 2 (group x) 0
Sample 3 (group x) 0
Sample 4 (group y) 0
Sample 5 (group y) 0
Sample 6 (group y) 103.5826

I don't understand how group y compared to group x, for this gene, is calculated to have a statistically significant padj value, when all but one have a normalised read count value > 0? Should I be filtering something else? Have I missed something?

Here is the R script:

##Load DESeq2 R package
library(DESeq2)

##Load in raw read count file 
Gene_Counts <- read_excel("N:/path/to/file")

##Set working directory
setwd("N:/path/to/directory")

##Identify the counts within the file
head(Gene_Counts)
counts <- Gene_Counts[, 2:7]

##Determine the meta-data and row names
meta_data <- data.frame(condition = c("x", "x", "x", "y", "y", "y"), sample = colnames(counts)

##Perform the differential gene expression analysis
dds <- DESeqDataSetFromMatrix(countData = counts, colData = meta_data, design = ~condition)

dds <- DESeq(dds)
res <- results(dds)

##Exporting
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
head(resdata)

write.csv(resdata, file = "Diff_Expr_Results.csv")

Thanks!

deseq2 • 663 views

ADD COMMENT • link updated 5.5 years ago by Michael Love 43k • written 5.5 years ago by umajs • 0

score 0 · Answer 1 · 2019-07-22

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 26 minutes ago

United States

These can get a small p-value if the average dispersion of the dataset is small. It's hard to know if it's real or not based on 6 counts, so information sharing across genes is used. If you prefer to filter these genes out from the beginning, you can use:

keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]

ADD COMMENT • link 5.5 years ago Michael Love 43k