Question: Incorrect DESeq2 Output
0
gravatar for umajs
28 days ago by
umajs0
umajs0 wrote:

Hi all,

I am wondering if I have missed any essential components in my script for differential gene expression analysis using DESeq2. I first filtered my data by adjusted p value (0.05) and ranked those by fold change.

The following is an example of my output:

  1. Gene x
  2. baseMean 17.26376
  3. log2FC -21.8639
  4. lfcSE 3.90958
  5. stat -5.59239
  6. pvalue 2.24E-08
  7. padj 4.03E-06
  8. Sample 1 (group x) 0
  9. Sample 2 (group x) 0
  10. Sample 3 (group x) 0
  11. Sample 4 (group y) 0
  12. Sample 5 (group y) 0
  13. Sample 6 (group y) 103.5826

I don't understand how group y compared to group x, for this gene, is calculated to have a statistically significant padj value, when all but one have a normalised read count value > 0? Should I be filtering something else? Have I missed something?

Here is the R script:

##Load DESeq2 R package
library(DESeq2)

##Load in raw read count file 
Gene_Counts <- read_excel("N:/path/to/file")

##Set working directory
setwd("N:/path/to/directory")

##Identify the counts within the file
head(Gene_Counts)
counts <- Gene_Counts[, 2:7]

##Determine the meta-data and row names
meta_data <- data.frame(condition = c("x", "x", "x", "y", "y", "y"), sample = colnames(counts)

##Perform the differential gene expression analysis
dds <- DESeqDataSetFromMatrix(countData = counts, colData = meta_data, design = ~condition)

dds <- DESeq(dds)
res <- results(dds)

##Exporting
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
head(resdata)

write.csv(resdata, file = "Diff_Expr_Results.csv")

Thanks!

deseq2 • 78 views
ADD COMMENTlink modified 28 days ago by Michael Love24k • written 28 days ago by umajs0
Answer: Incorrect DESeq2 Output
0
gravatar for Michael Love
28 days ago by
Michael Love24k
United States
Michael Love24k wrote:

These can get a small p-value if the average dispersion of the dataset is small. It's hard to know if it's real or not based on 6 counts, so information sharing across genes is used. If you prefer to filter these genes out from the beginning, you can use:

keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]
ADD COMMENTlink written 28 days ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 305 users visited in the last hour