Hi, I am using DESeq2 (v1.34) to analyze RNA seq data in r(4.13), but I run into two problems. (1) DESeq2 renames all the genes (2) lack of downregulated genes.
My data look like this (top and bottom 5 entries):
"normal1" "normal2" "normal3" "tumor1" "tumor2" "tumor3"
"A1BG" 102 184 89 165 221 174
"A1BG-AS1" 95 218 77 197 247 140
"A1CF" 61 45 33 303 872 536
"A2M" 15275 25295 15593 53219 34578 167224
"A2M-AS1" 822 744 43 3112 1216 272
.....
"ZYG11A" 5 17 15 8 19 35
"ZYG11B" 730 1159 1131 767 1295 2770
"ZYX" 3138 3873 2307 3365 6314 14647
"ZZEF1" 2709 4887 3456 1879 4853 4383
"ZZZ3" 709 1243 1440 845 1145 2097
The seq of commands that I am following is:
#read in the data for gene expression and disease conditions
coldata <- read.csv("../Pat2/sample_id_table", sep=",", row.names=1)
coldata$Condition <- factor(coldata$Condition)
row.names(cts) = cts$gene_id
cts <- cts[, 2:7]
cts <- as.matrix(cts)
all(rownames(coldata) %in% colnames(cts))
all(rownames(coldata) == colnames(cts))
dds <- DESeqDataSetFromMatrix(cts, coldata, design = ~Condition)
dds <- DESeq(dds)
#filtering
keep <- rowSums(counts(dds) >= 10)
dds <- dds[keep, ]
dds$Condition <- factor(dds$Condition, levels = c("Normal","Tumor"))
res <- results(dds, name="Condition_Tumor_vs_Normal", alpha=0.05, lfcThreshold=0.58)
resOrdered <- res[order(res$pvalue), ]
> summary(res)
out of 25099 with nonzero total read count
adjusted p-value < 0.05
LFC > 0.58 (up) : 2726, 11%
LFC < -0.58 (down) : 0, 0% ????? [even at 0, there are no downregulated genes]
outliers [1] : 0, 0%
low counts [2] : 19692, 78%
(mean count < 159)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
When I looked into resOrdered; it looks like this:
> resOrdered
log2 fold change (MLE): Condition Tumor vs Normal
Wald test p-value: Condition Tumor vs Normal
DataFrame with 25099 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
A1CF 270.968 3.24205 0.511295 5.20649 1.92451e-07 9.34933e-07
A1CF 270.968 3.24205 0.511295 5.20649 1.92451e-07 9.34933e-07
A1CF 270.968 3.24205 0.511295 5.20649 1.92451e-07 9.34933e-07
A1CF 270.968 3.24205 0.511295 5.20649 1.92451e-07 9.34933e-07
A1CF 270.968 3.24205 0.511295 5.20649 1.92451e-07 9.34933e-07
... ... ... ... ... ... ...
A2ML1 5.00733 0.0147204 1.32641 0 1 NA
A2ML1 5.00733 0.0147204 1.32641 0 1 NA
A2ML1 5.00733 0.0147204 1.32641 0 1 NA
A2ML1 5.00733 0.0147204 1.32641 0 1 NA
A2ML1 5.00733 0.0147204 1.32641 0 1 NA
All genes are named in this range: A1CF --> A2ML1.
I am unable to figure out a source of why DESeq2 renames all the genes, plus there are no downregulated genes in the results? I can share my data if necessary to replicate this.
After correcting this line, I got the output somewhat reasonable with both up and downregulated as well as no-more-renaming of genes. I am sort of surprised that a misplaced brace has this much effect. Now the output looks like this with proper gene names:
Thanks a lot, I am updating this just to make sure that if someone else has a similar problem, he should try this one.