Hi all,
I am wondering if I have missed any essential components in my script for differential gene expression analysis using DESeq2. I first filtered my data by adjusted p value (0.05) and ranked those by fold change.
The following is an example of my output:
- Gene x
- baseMean 17.26376
- log2FC -21.8639
- lfcSE 3.90958
- stat -5.59239
- pvalue 2.24E-08
- padj 4.03E-06
- Sample 1 (group x) 0
- Sample 2 (group x) 0
- Sample 3 (group x) 0
- Sample 4 (group y) 0
- Sample 5 (group y) 0
- Sample 6 (group y) 103.5826
I don't understand how group y compared to group x, for this gene, is calculated to have a statistically significant padj value, when all but one have a normalised read count value > 0? Should I be filtering something else? Have I missed something?
Here is the R script:
##Load DESeq2 R package
library(DESeq2)
##Load in raw read count file
Gene_Counts <- read_excel("N:/path/to/file")
##Set working directory
setwd("N:/path/to/directory")
##Identify the counts within the file
head(Gene_Counts)
counts <- Gene_Counts[, 2:7]
##Determine the meta-data and row names
meta_data <- data.frame(condition = c("x", "x", "x", "y", "y", "y"), sample = colnames(counts)
##Perform the differential gene expression analysis
dds <- DESeqDataSetFromMatrix(countData = counts, colData = meta_data, design = ~condition)
dds <- DESeq(dds)
res <- results(dds)
##Exporting
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
head(resdata)
write.csv(resdata, file = "Diff_Expr_Results.csv")
Thanks!