Search
Question: edgeR: non-unique values when setting 'row.names' - but the row names are unique!!!
0
3.8 years ago by
ccheung0
European Union
ccheung0 wrote:

Hi,

I'm having problems with the DGEList function in edgeR.  Here are the commands that I had input:

library(edgeR)
raw.data <- read.table(file = "Documents/.../myfile.csv", header=TRUE, sep=",")
Data <- raw.data[, 2:45]
rownames( Data ) <- raw.data[ , 1 ]
colnames(Data) <- paste (c("ML1,ML32,ML4,ML29,etc"), sep="")
groups <- c(rep("1",11), rep("2",33))
DGE1 <- DGEList(counts = Data , group = groups )

At this point, it keeps on giving me this error message:

Error in row.names<-.data.frame(*tmp*, value = c("ML1,ML32,ML4,ML29,etc",  :
duplicate 'row.names' are not allowed
non-unique values when setting 'row.names':

But I know for sure that my row names are unique!  Any advice would be appreciated. Thanx.

carol

modified 3.8 years ago • written 3.8 years ago by ccheung0
1
3.8 years ago by
United States
James W. MacDonald48k wrote:

The hint here is that the row.names in the error are actually the column names for your data matrix! One of the things that happens when you run DGEList() is that a 'samples' data frame is constructed, and the row.names of that samples data.frame are the column names of your data.

If you have duplicate column names (and you do), then this will result in an error. You shouldn't have duplicate column names anyway (you are calling two samples by the same name), so fix that and the error will go away.

0
3.8 years ago by
ccheung0
European Union
ccheung0 wrote:

Hi,

Thanx John for your answer!  However, I double-checked and I am pretty sure that both the column and the row names are unique.  Just to be sure, I even put in the command,

rownames(df) = make.names(nams, unique=TRUE)

but to no avail.....

Any other ideas? Thanx.

carol

I am not sure how you can be 'pretty sure' that the column names are unique. Either they are or they are not. Something like

any(duplicated(colnames(Data)))

will tell you for sure. And note that I am talking about the column names, not row names, so ensuring that the row names are unique is not helpful.

But I am still sure that you DO have duplicated column names, and I can replicate exactly the error you get by trying to create a DGEList with duplicated column names:

> mat <- matrix(rnorm(1e5), ncol = 10)
> colnames(mat) <- paste0("ML", c(1:9,1))
> colnames(mat)
[1] "ML1" "ML2" "ML3" "ML4" "ML5" "ML6" "ML7" "ML8" "ML9" "ML1"
> dglst <- DGEList(mat)
Error in row.names<-.data.frame(*tmp*, value = c("ML1", "ML2", "ML3",  :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘ML1’

Also, I am assuming that the code you show

colnames(Data) <- paste (c("ML1,ML32,ML4,ML29,etc"), sep="")

isn't really what you have done, because that won't work unless you have just a single column. In other words,

> paste (c("ML1,ML32,ML4,ML29,etc"), sep="")
[1] "ML1,ML32,ML4,ML29,etc"

is a character vector of length one, so you cannot set the column names for a 44 column matrix using that command.

ADD REPLYlink written 3.8 years ago by James W. MacDonald48k
0
3.8 years ago by
ccheung0
European Union
ccheung0 wrote:

Hi,

Haha, OK, I'm absolutely positive that the column names are not duplicated. With regard to your last comment, in fact, that is what I had input....Perhaps that is the problem. I will put in another command according to the edgeR vignette and see if that'll fix it.  Thanx!

carol

carol