DESeq2 is renaming gene names and lack of downregulated genes
1
0
Entering edit mode
manwar • 0
@52a937a0
Last seen 3 days ago
South Korea

Hi, I am using DESeq2 (v1.34) to analyze RNA seq data in r(4.13), but I run into two problems. (1) DESeq2 renames all the genes (2) lack of downregulated genes.

My data look like this (top and bottom 5 entries):

"normal1" "normal2" "normal3" "tumor1" "tumor2" "tumor3"
"A1BG" 102 184 89 165 221 174
"A1BG-AS1" 95 218 77 197 247 140
"A1CF" 61 45 33 303 872 536
"A2M" 15275 25295 15593 53219 34578 167224
"A2M-AS1" 822 744 43 3112 1216 272
.....
"ZYG11A" 5 17 15 8 19 35
"ZYG11B" 730 1159 1131 767 1295 2770
"ZYX" 3138 3873 2307 3365 6314 14647
"ZZEF1" 2709 4887 3456 1879 4853 4383
"ZZZ3" 709 1243 1440 845 1145 2097


The seq of commands that I am following is:

#read in the data for gene expression and disease conditions

coldata$Condition <- factor(coldata$Condition)
row.names(cts) = cts$gene_id cts <- cts[, 2:7] cts <- as.matrix(cts) all(rownames(coldata) %in% colnames(cts)) all(rownames(coldata) == colnames(cts)) dds <- DESeqDataSetFromMatrix(cts, coldata, design = ~Condition) dds <- DESeq(dds) #filtering keep <- rowSums(counts(dds) >= 10) dds <- dds[keep, ] dds$Condition <- factor(dds$Condition, levels = c("Normal","Tumor")) res <- results(dds, name="Condition_Tumor_vs_Normal", alpha=0.05, lfcThreshold=0.58) resOrdered <- res[order(res$pvalue), ]

> summary(res)
out of 25099 with nonzero total read count
LFC > 0.58 (up)    : 2726, 11%
LFC < -0.58 (down) : 0, 0%  ????? [even at 0, there are no downregulated genes]
outliers [1]       : 0, 0%
low counts [2]     : 19692, 78%
(mean count < 159)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results


When I looked into resOrdered; it looks like this:

> resOrdered
log2 fold change (MLE): Condition Tumor vs Normal
Wald test p-value: Condition Tumor vs Normal
DataFrame with 25099 rows and 6 columns
baseMean log2FoldChange     lfcSE      stat      pvalue        padj
<numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
A1CF    270.968        3.24205  0.511295   5.20649 1.92451e-07 9.34933e-07
A1CF    270.968        3.24205  0.511295   5.20649 1.92451e-07 9.34933e-07
A1CF    270.968        3.24205  0.511295   5.20649 1.92451e-07 9.34933e-07
A1CF    270.968        3.24205  0.511295   5.20649 1.92451e-07 9.34933e-07
A1CF    270.968        3.24205  0.511295   5.20649 1.92451e-07 9.34933e-07
...         ...            ...       ...       ...         ...         ...
A2ML1   5.00733      0.0147204   1.32641         0           1          NA
A2ML1   5.00733      0.0147204   1.32641         0           1          NA
A2ML1   5.00733      0.0147204   1.32641         0           1          NA
A2ML1   5.00733      0.0147204   1.32641         0           1          NA
A2ML1   5.00733      0.0147204   1.32641         0           1          NA


All genes are named in this range: A1CF --> A2ML1.

I am unable to figure out a source of why DESeq2 renames all the genes, plus there are no downregulated genes in the results? I can share my data if necessary to replicate this.

DESeq DESeq2 • 130 views
3
Entering edit mode
Basti ▴ 390
@7d45153c
Last seen 1 day ago
France

Your line keep <- rowSums(counts(dds) >= 10) has no sense, it should be keep <- rowSums(counts(dds)) >= 10

0
Entering edit mode

After correcting this line, I got the output somewhat reasonable with both up and downregulated as well as no-more-renaming of genes. I am sort of surprised that a misplaced brace has this much effect. Now the output looks like this with proper gene names:

out of 28205 with nonzero total read count
LFC > 0.58 (up)    : 197, 0.7%
LFC < -0.58 (down) : 169, 0.6%
outliers [1]       : 300, 1.1%
low counts [2]     : 4922, 17%
(mean count < 7)


Thanks a lot, I am updating this just to make sure that if someone else has a similar problem, he should try this one.