Question: Incorrect DESeq2 Output
0
11 weeks ago by
umajs0
umajs0 wrote:

Hi all,

I am wondering if I have missed any essential components in my script for differential gene expression analysis using DESeq2. I first filtered my data by adjusted p value (0.05) and ranked those by fold change.

The following is an example of my output:

1. Gene x
2. baseMean 17.26376
3. log2FC -21.8639
4. lfcSE 3.90958
5. stat -5.59239
6. pvalue 2.24E-08
8. Sample 1 (group x) 0
9. Sample 2 (group x) 0
10. Sample 3 (group x) 0
11. Sample 4 (group y) 0
12. Sample 5 (group y) 0
13. Sample 6 (group y) 103.5826

I don't understand how group y compared to group x, for this gene, is calculated to have a statistically significant padj value, when all but one have a normalised read count value > 0? Should I be filtering something else? Have I missed something?

Here is the R script:

##Load DESeq2 R package
library(DESeq2)

##Set working directory
setwd("N:/path/to/directory")

##Identify the counts within the file
counts <- Gene_Counts[, 2:7]

##Determine the meta-data and row names
meta_data <- data.frame(condition = c("x", "x", "x", "y", "y", "y"), sample = colnames(counts)

##Perform the differential gene expression analysis
dds <- DESeqDataSetFromMatrix(countData = counts, colData = meta_data, design = ~condition)

dds <- DESeq(dds)
res <- results(dds)

##Exporting
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)

write.csv(resdata, file = "Diff_Expr_Results.csv")


Thanks!

deseq2 • 116 views
modified 11 weeks ago by Michael Love25k • written 11 weeks ago by umajs0
0
11 weeks ago by
Michael Love25k
United States
Michael Love25k wrote:

These can get a small p-value if the average dispersion of the dataset is small. It's hard to know if it's real or not based on 6 counts, so information sharing across genes is used. If you prefer to filter these genes out from the beginning, you can use:

keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]