Question: Incorrect DESeq2 Output
11 weeks ago by
umajs0 wrote:

Hi all,

I am wondering if I have missed any essential components in my script for differential gene expression analysis using DESeq2. I first filtered my data by adjusted p value (0.05) and ranked those by fold change.

The following is an example of my output:

  1. Gene x
  2. baseMean 17.26376
  3. log2FC -21.8639
  4. lfcSE 3.90958
  5. stat -5.59239
  6. pvalue 2.24E-08
  7. padj 4.03E-06
  8. Sample 1 (group x) 0
  9. Sample 2 (group x) 0
  10. Sample 3 (group x) 0
  11. Sample 4 (group y) 0
  12. Sample 5 (group y) 0
  13. Sample 6 (group y) 103.5826

I don't understand how group y compared to group x, for this gene, is calculated to have a statistically significant padj value, when all but one have a normalised read count value > 0? Should I be filtering something else? Have I missed something?

Here is the R script:

##Load DESeq2 R package

##Load in raw read count file 
Gene_Counts <- read_excel("N:/path/to/file")

##Set working directory

##Identify the counts within the file
counts <- Gene_Counts[, 2:7]

##Determine the meta-data and row names
meta_data <- data.frame(condition = c("x", "x", "x", "y", "y", "y"), sample = colnames(counts)

##Perform the differential gene expression analysis
dds <- DESeqDataSetFromMatrix(countData = counts, colData = meta_data, design = ~condition)

dds <- DESeq(dds)
res <- results(dds)

resdata <- merge(,, normalized=TRUE)), by="row.names", sort=FALSE)

write.csv(resdata, file = "Diff_Expr_Results.csv")


Answer: Incorrect DESeq2 Output
11 weeks ago by
Michael Love25k
United States
Michael Love25k wrote:

These can get a small p-value if the average dispersion of the dataset is small. It's hard to know if it's real or not based on 6 counts, so information sharing across genes is used. If you prefer to filter these genes out from the beginning, you can use:

keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]
