Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed while using DESeq2
1
0
Entering edit mode
@f2014eb5
Last seen 18 months ago
Turkey

Hello, I'm working with DESeq2 package. When I try to run the code I get this error with the additional warning message:

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘0’, ‘0.0011’, ‘0.0012’, ‘0.0013’, ‘0.0014’, ‘0.0015’, ‘0.0016’, ‘0.0017’, ‘0.0018’, ‘0.0019’, ‘0.002’, ‘0.0021’, ‘0.0022’, ‘0.0023’, ‘0.0024’, ‘0.0025’, ‘0.0026’, ‘0.0027’, ‘0.0028’, ‘0.0029’, ‘0.003’, ‘0.0031’, ‘0.0032’, ‘0.0033’, ‘0.0034’, ‘0.0035’, ‘0.0036’, ‘0.0037’, ‘0.0038’, ‘0.0039’, ‘0.004’, ‘0.0041’, ‘0.0042’, ‘0.0043’, ‘0.0044’, ‘0.0045’, ‘0.0046’, ‘0.0047’, ‘0.0048’, ‘0.0049’, ‘0.005’, ‘0.0051’, ‘0.0052’, ‘0.0053’, ‘0.0054’, ‘0.0055’, ‘0.0056’, ‘0.0057’, ‘0.0058’, ‘0.0059’, ‘0.006’, ‘0.0061’, ‘0.0062’, ‘0.0063’, ‘0.0064’, ‘0.0065’, ‘0.0066’, ‘0.0067’, ‘0.0068’, ‘0.0069’, ‘0.007’, ‘0.0071’, ‘0.0072’, ‘0.0073’, ‘0.0074’, ‘0.0075’, ‘0.0076’, ‘0.0077’, ‘0.0078’,  [... truncated]

My data contain gene names as row names and FPKM counts as columns. Here is my script:

library("DESeq2")
library("ggplot2")

data1 <- read.delim("TCGA-D3-A2JF_alive_male.txt", header = T, sep = "\t")
rownames(data1) = make.names(data1$gene_name, unique = T)
data1 <- subset(data1[2])
data2 <- read.delim("TCGA-D3-A3ML_dead_male.txt", header = T, sep = "\t")
rownames(data2) = make.names(data2$gene_name, unique = T)
data2 <- subset(data2[2])
data3 <- read.delim("TCGA-D9-A148_alive_male.txt", header = T, sep = "\t")
rownames(data3) = make.names(data3$gene_name, unique = T)
data3 <- subset(data3[2])
data4 <- read.delim("TCGA-EE-A180_dead_male.txt", header= T, sep = "\t")
rownames(data4) = make.names(data4$gene_name, unique = T)
data4 <- subset(data4[2])
data5 <- read.delim("TCGA-ER-A3ES_dead_male.txt", header = T, sep = "\t")
rownames(data5) = make.names(data5$gene_name, unique = T)
data5 <- subset(data5[2])
data6 <- read.delim("TCGA-FR-A3YO_alive_female.txt", header = T, sep = "\t")
rownames(data6) = make.names(data6$gene_name, unique = T)
data6 <- subset(data6[2])
data7 <- read.delim("TCGA-FS-A1ZZ_dead_female.txt", header = T, sep = "\t")
rownames(data7) = make.names(data7$gene_name, unique = T)
data7 <- subset(data7[2])
data8 <- read.delim("TCGA-FS-A4F0_alive_female.txt", header = T, sep = "\t")
rownames(data8) = make.names(data8$gene_name, unique = T)
data8 <- subset(data8[2])
data9 <- read.delim("TCGA-FW-A3R5_alive_male.txt", header = T, sep = "\t")
rownames(data9) = make.names(data9$gene_name, unique = T)
data9 <- subset(data9[2])
data10 <- read.delim("TCGA-GN-A9SD_dead_female.txt", header = T, sep = "\t")
rownames(data10) = make.names(data10$gene_name, unique = T)
data10 <- subset(data10[2])
data11 <- read.delim("TCGA-HR-A2OH_dead_female.txt", header = T, sep = "\t")
rownames(data11) = make.names(data11$gene_name, unique = T)
data11 <- subset(data11[2])
data12 <- read.delim("TCGA-RP-A6K9_alive_female.txt", header = T, sep = "\t")
rownames(data12) = make.names(data12$gene_name, unique = T)
data12 <- subset(data12[2])
countData <- cbind(data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12, deparse.level = 1)
metaData <- read.delim("metadata.txt", header = T, sep = "\t")

dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = metaData,
                              design = ~status, tidy = T)

My initial data contained duplicate row names; therefore, I tried to remove them by using make.names. After that there was no duplicates in row names, when I checked it with:

anyDuplicated(rownames(countData))

Thank you in advance.

DESeq2 • 1.6k views
ADD COMMENT
1
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 1 day ago
San Diego

The error message suggests that you have RPKM as row names. You can't get valid results with DESeq if you give it RPKM anyway.

ADD COMMENT
1
Entering edit mode

That having said, TCGA raw counts as expected by DESeq2 can conveniently be obtained by packages like TCGABiolinks or recount. That is superior to FPKM which the DESeq2 model is not compatible with. Also I suggest to learn code automatisation, yours is laborious and error-prone, as you manually have to type a lot of things rather than looping over the files.

ADD REPLY

Login before adding your answer.

Traffic: 583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6