Hi everyone, I'm quite new to coding and processing data with the DESeq package, but it seems I will need to use this one for my project. I have a dataset with 14coloms of expression data where I have the names of the genes (row names) in the first col. ( there are about 1400 genes)
The problem is that when I try to use DESeqDataSetFromMatrix I get an error. In case I set up rownames in my df the error is : "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'ncol': non-numeric variable(s) in data frame: X"
This is because I have the names of the genes that are obviously not numeric. (e.g., KANK1, HTT...etc.)
If I don't set up the row names, I will get an error that says:
"Error in .rowNamesDF<-
(x, value = value) :
duplicate 'row. names' are not allowed
In addition: Warning message:
non-unique values when setting 'row. names': ‘0’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’"
So basically, it tries to set up the first column with values in it as row names, and surely there are repetitions bc the expression values might be the same for some genes...
(I also needed to round the data; otherwise, I got an error about integers)
dds <- DESeqDataSetFromMatrix(countData=round(countData),
colData=metaData,
design=~dex, tidy = TRUE)
Can you recommend something about how to solve my problem? I really appreciate any help you can provide. Thank you, Dorina
Hi, thanks for your answer.
I have already unique rownames as the rownames are the genes and they are unique (there are no replicated data, as I ve already cleared those).
The problem is that the function does not accept the genes as rownames and tries to make rownames out of the first column of expression data. At this point (since some of the expression values are the same for two different genes, so) there are indeed replicates (you can see in the error it tries to select numbers as rownames.) "Error in .rowNamesDF<-(x, value = value) : duplicate 'row. names' are not allowed In addition: Warning message: non-unique values when setting 'row. names': ‘0’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’"
Unfortunately, I cannot show the full data, but the output for str(countData) is
(here there are other 13 samples with expression values)
Where the first col (x) are the genes (KANK1, HTT etc) - all unique and there are 14 cols, which are the expression values in different samples like C11.
If I try to have the rownames(genes) as it was just another column, the problem is that the names are not numbers...and I get this error: "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'ncol': non-numeric variable(s) in data frame: X"
Hi, Did you set the rownames of your matrix countData ?
and then remove the name column to have a homogen matrix:
Yes, can you please show the output of these commands by
l.troxler
? It helps to show the commands that you are running.Yes, I did this. This is how I got the errors
So the code do be exact was this:
In this case, I got the error : "Error in
.rowNamesDF<-
(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘0’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’ "If i run it with the genes col like this:
I get this error: "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'ncol': non-numeric variable(s) in data frame: X"
Then, you do still have duplicate gene names. What is the output of:
, and:
Can you verify that you have / don't have genes named as '0', '16', etc?
FALSE 1408
I do not have gene names as numbers. All of them are made up of letters. This is why I thought the function is trying to use the first col with expression values in it (as when I put round(countData) those numbers are integers, but when simply countData they are 23.425332 and such numbers)
I do have some, where the gene names are: "DHX16", "SEC16A", so it contains those numbers but I don't have any, where the entire cell content is only a number.
Hmm, that is weird. Directly after you read in the count data, what is the output of str()?
...or, this may work, please try:
THen, rownames should be set automatically when read.
'data.frame': 32463 obs. of 9 variables:
$ Gene : chr "ENSG00000164054.15" "ENSG00000257527.1" "ENSG00000140830.8" "ENSG00000180613.10" ...
$ AS_patient_1: num 6.61 0 4.4 0 0 0.43 0.17 0.16 1.09 1.2 ...
$ AS_patient_2: num 8.49 0.09 4.27 0 1.35 0.15 0 0.13 2.45 2.7 ...
$ AS_patient_3: num 10.77 0 3.68 0 0.42 ...
$ AS_patient_4: num 11.82 0 3.43 0 0.42 ...
$ AS_patient_5: num 9.83 0 2.46 0 1.91 0.49 0 0.26 1.33 3.06 ...
$ Health_1 : num 16.55 0 6 0 1.46 ...
$ Health_2 : num 8.63 0 5.39 0 0.54 0.22 0 0.19 2.78 3.38 ...
$ Health_3 : num 10.96 0 3.61 0.04 2.33 ...
This is the output of