Question

DESeq problems

0

Entering edit mode

dorina.jamniczky • 0

@user-25166

Last seen 3.8 years ago

Hungary

Hi everyone, I'm quite new to coding and processing data with the DESeq package, but it seems I will need to use this one for my project. I have a dataset with 14coloms of expression data where I have the names of the genes (row names) in the first col. ( there are about 1400 genes)

The problem is that when I try to use DESeqDataSetFromMatrix I get an error. In case I set up rownames in my df the error is : "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'ncol': non-numeric variable(s) in data frame: X"

This is because I have the names of the genes that are obviously not numeric. (e.g., KANK1, HTT...etc.)

If I don't set up the row names, I will get an error that says: "Error in .rowNamesDF<-(x, value = value) : duplicate 'row. names' are not allowed In addition: Warning message: non-unique values when setting 'row. names': ‘0’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’"

So basically, it tries to set up the first column with values in it as row names, and surely there are repetitions bc the expression values might be the same for some genes...

(I also needed to round the data; otherwise, I got an error about integers)

dds <- DESeqDataSetFromMatrix(countData=round(countData), 
                              colData=metaData, 
                              design=~dex, tidy = TRUE)

Can you recommend something about how to solve my problem? I really appreciate any help you can provide. Thank you, Dorina

help DESeq2 DESeqDataSetFromMatrix rownames • 5.8k views

ADD COMMENT • link updated 2.7 years ago by Sanatan • 0 • written 3.9 years ago by dorina.jamniczky • 0

score 0 · Answer 1 · 2021-04-15

0

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 7 weeks ago

Republic of Ireland

Hi, for clarity, please show the output of:

str(countData)

DESeq2 will only accept a data-frame / matrix of integer values; so, there cannot be any non-integer values in this.

If you require unique rownames, then try something like the make.unique() function, and/or go back a few steps and try to determine why there exists in your data rows with the duplicate gene names.

Kevin

ADD COMMENT • link 3.9 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Hi, thanks for your answer.

I have already unique rownames as the rownames are the genes and they are unique (there are no replicated data, as I ve already cleared those).

The problem is that the function does not accept the genes as rownames and tries to make rownames out of the first column of expression data. At this point (since some of the expression values are the same for two different genes, so) there are indeed replicates (you can see in the error it tries to select numbers as rownames.) "Error in .rowNamesDF<-(x, value = value) : duplicate 'row. names' are not allowed In addition: Warning message: non-unique values when setting 'row. names': ‘0’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’"

Unfortunately, I cannot show the full data, but the output for str(countData) is

  >  'data.frame':  1408 obs. of  15 variables:
>     $ X      : chr  "KANK1" "HTT" "HEATR1" "CAVIN1" ...
>      $ C11 : num  21 20.4 19.4 21.8 20.8 ...

(here there are other 13 samples with expression values)

Where the first col (x) are the genes (KANK1, HTT etc) - all unique and there are 14 cols, which are the expression values in different samples like C11.

If I try to have the rownames(genes) as it was just another column, the problem is that the names are not numbers...and I get this error: "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'ncol': non-numeric variable(s) in data frame: X"

ADD REPLY • link 3.8 years ago dorina.jamniczky • 0

1

Entering edit mode

Hi, Did you set the rownames of your matrix countData ?

row.names(countData)<-countData[,1]

and then remove the name column to have a homogen matrix:

countData<-countData[,-1]

ADD REPLY • link 3.8 years ago l.troxler ▴ 10

0

Entering edit mode

Yes, can you please show the output of these commands by l.troxler ? It helps to show the commands that you are running.

ADD REPLY • link 3.8 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Yes, I did this. This is how I got the errors

So the code do be exact was this:

countData <- read.csv('Mydata_mostly_exp.csv', header = TRUE, sep = ",")
no_rownames_counts<- countData[,2:15]
rownames(countData)<- countData[,1]
metaData <- read.csv('Design_ins.csv', header = TRUE, sep = ",") metaData dds <- DESeqDataSetFromMatrix(countData=round(no_rownames_counts), colData=metaData, design=~dex, tidy = TRUE)

In this case, I got the error : "Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘0’, ‘16’, ‘17’, ‘18’, ‘19’, ‘20’, ‘21’, ‘22’, ‘23’, ‘24’, ‘25’, ‘26’, ‘27’, ‘28’ "

If i run it with the genes col like this:

dds <- DESeqDataSetFromMatrix(countData=round(countData), 
                          colData=metaData, 
                          design=~dex, tidy = TRUE)

I get this error: "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'ncol': non-numeric variable(s) in data frame: X"

ADD REPLY • link 3.8 years ago dorina.jamniczky • 0

0

Entering edit mode

Then, you do still have duplicate gene names. What is the output of:

table(duplicated(countData[,1]))

, and:

head(sort(countData[,1]))

Can you verify that you have / don't have genes named as '0', '16', etc?

ADD REPLY • link 3.8 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

> table(duplicated(countData[,1]))

FALSE 1408

head(sort(countData[,1])) [1] "AAGAB" "AAK1" "AATF" "ABCA2" "ABCC1" "ABCC4"

I do not have gene names as numbers. All of them are made up of letters. This is why I thought the function is trying to use the first col with expression values in it (as when I put round(countData) those numbers are integers, but when simply countData they are 23.425332 and such numbers)

I do have some, where the gene names are: "DHX16", "SEC16A", so it contains those numbers but I don't have any, where the entire cell content is only a number.

ADD REPLY • link 3.8 years ago dorina.jamniczky • 0

0

Entering edit mode

Hmm, that is weird. Directly after you read in the count data, what is the output of str()?

countData <- read.csv('Mydata_mostly_exp.csv', header = TRUE, sep = ",")
str(countData)

...or, this may work, please try:

countData <- read.csv('Mydata_mostly_exp.csv', header = TRUE,
  stringsAsFactors = FALSE, sep = ',', row.names = 1)

THen, rownames should be set automatically when read.

ADD REPLY • link 3.8 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

'data.frame': 32463 obs. of 9 variables:

$ Gene : chr "ENSG00000164054.15" "ENSG00000257527.1" "ENSG00000140830.8" "ENSG00000180613.10" ...

$ AS_patient_1: num 6.61 0 4.4 0 0 0.43 0.17 0.16 1.09 1.2 ...

$ AS_patient_2: num 8.49 0.09 4.27 0 1.35 0.15 0 0.13 2.45 2.7 ...

$ AS_patient_3: num 10.77 0 3.68 0 0.42 ...

$ AS_patient_4: num 11.82 0 3.43 0 0.42 ...

$ AS_patient_5: num 9.83 0 2.46 0 1.91 0.49 0 0.26 1.33 3.06 ...

$ Health_1 : num 16.55 0 6 0 1.46 ...

$ Health_2 : num 8.63 0 5.39 0 0.54 0.22 0 0.19 2.78 3.38 ...

$ Health_3 : num 10.96 0 3.61 0.04 2.33 ...

This is the output of

countData <- read.csv('Mydata_mostly_exp.csv', header = TRUE, sep = ",")
str(countData)

ADD REPLY • link 2.7 years ago Sanatan • 0