Entering edit mode
Hello, I'm working with DESeq2 package. When I try to run the code I get this error with the additional warning message:
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘0’, ‘0.0011’, ‘0.0012’, ‘0.0013’, ‘0.0014’, ‘0.0015’, ‘0.0016’, ‘0.0017’, ‘0.0018’, ‘0.0019’, ‘0.002’, ‘0.0021’, ‘0.0022’, ‘0.0023’, ‘0.0024’, ‘0.0025’, ‘0.0026’, ‘0.0027’, ‘0.0028’, ‘0.0029’, ‘0.003’, ‘0.0031’, ‘0.0032’, ‘0.0033’, ‘0.0034’, ‘0.0035’, ‘0.0036’, ‘0.0037’, ‘0.0038’, ‘0.0039’, ‘0.004’, ‘0.0041’, ‘0.0042’, ‘0.0043’, ‘0.0044’, ‘0.0045’, ‘0.0046’, ‘0.0047’, ‘0.0048’, ‘0.0049’, ‘0.005’, ‘0.0051’, ‘0.0052’, ‘0.0053’, ‘0.0054’, ‘0.0055’, ‘0.0056’, ‘0.0057’, ‘0.0058’, ‘0.0059’, ‘0.006’, ‘0.0061’, ‘0.0062’, ‘0.0063’, ‘0.0064’, ‘0.0065’, ‘0.0066’, ‘0.0067’, ‘0.0068’, ‘0.0069’, ‘0.007’, ‘0.0071’, ‘0.0072’, ‘0.0073’, ‘0.0074’, ‘0.0075’, ‘0.0076’, ‘0.0077’, ‘0.0078’, [... truncated]
My data contain gene names as row names and FPKM counts as columns. Here is my script:
library("DESeq2")
library("ggplot2")
data1 <- read.delim("TCGA-D3-A2JF_alive_male.txt", header = T, sep = "\t")
rownames(data1) = make.names(data1$gene_name, unique = T)
data1 <- subset(data1[2])
data2 <- read.delim("TCGA-D3-A3ML_dead_male.txt", header = T, sep = "\t")
rownames(data2) = make.names(data2$gene_name, unique = T)
data2 <- subset(data2[2])
data3 <- read.delim("TCGA-D9-A148_alive_male.txt", header = T, sep = "\t")
rownames(data3) = make.names(data3$gene_name, unique = T)
data3 <- subset(data3[2])
data4 <- read.delim("TCGA-EE-A180_dead_male.txt", header= T, sep = "\t")
rownames(data4) = make.names(data4$gene_name, unique = T)
data4 <- subset(data4[2])
data5 <- read.delim("TCGA-ER-A3ES_dead_male.txt", header = T, sep = "\t")
rownames(data5) = make.names(data5$gene_name, unique = T)
data5 <- subset(data5[2])
data6 <- read.delim("TCGA-FR-A3YO_alive_female.txt", header = T, sep = "\t")
rownames(data6) = make.names(data6$gene_name, unique = T)
data6 <- subset(data6[2])
data7 <- read.delim("TCGA-FS-A1ZZ_dead_female.txt", header = T, sep = "\t")
rownames(data7) = make.names(data7$gene_name, unique = T)
data7 <- subset(data7[2])
data8 <- read.delim("TCGA-FS-A4F0_alive_female.txt", header = T, sep = "\t")
rownames(data8) = make.names(data8$gene_name, unique = T)
data8 <- subset(data8[2])
data9 <- read.delim("TCGA-FW-A3R5_alive_male.txt", header = T, sep = "\t")
rownames(data9) = make.names(data9$gene_name, unique = T)
data9 <- subset(data9[2])
data10 <- read.delim("TCGA-GN-A9SD_dead_female.txt", header = T, sep = "\t")
rownames(data10) = make.names(data10$gene_name, unique = T)
data10 <- subset(data10[2])
data11 <- read.delim("TCGA-HR-A2OH_dead_female.txt", header = T, sep = "\t")
rownames(data11) = make.names(data11$gene_name, unique = T)
data11 <- subset(data11[2])
data12 <- read.delim("TCGA-RP-A6K9_alive_female.txt", header = T, sep = "\t")
rownames(data12) = make.names(data12$gene_name, unique = T)
data12 <- subset(data12[2])
countData <- cbind(data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12, deparse.level = 1)
metaData <- read.delim("metadata.txt", header = T, sep = "\t")
dds <- DESeqDataSetFromMatrix(countData = countData,
colData = metaData,
design = ~status, tidy = T)
My initial data contained duplicate row names; therefore, I tried to remove them by using make.names. After that there was no duplicates in row names, when I checked it with:
anyDuplicated(rownames(countData))
Thank you in advance.
That having said, TCGA raw counts as expected by DESeq2 can conveniently be obtained by packages like TCGABiolinks or recount. That is superior to FPKM which the DESeq2 model is not compatible with. Also I suggest to learn code automatisation, yours is laborious and error-prone, as you manually have to type a lot of things rather than looping over the files.