duplicated row names when creating DESeqDataSetFromMatrix
1
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 3 months ago
Germany

Hi,

I am working with a workflow for Cripr/Cas9 screening and try to analyze the data with `DESeq2`.

my count table looks like that:

> head(countdata)
                         CTRL CTRL TREAT TREAT
BAX_GAAACATGTCAGCTGCCACT   87  267   511   353
BAX_GAACTCACCCCTGAAGCAAA  340  474   772  1063
BAX_GAAGCGCATCGGGGACGAAC   88  117   365   461
BAX_GACAGGGGCCCTTTTGCTTC  731  690    99   450
BAX_GACCGGGTCCAGGGCCAGCT  374  649   150   230
BAX_GACCTTGAGCACCAGTTTGC  425  634   258   203
...
random_GGGCGGACGCACCGACCAAA  159  155    21     4
random_GGGGAACGGACGCCGAACGG  302  320   156   120
random_GGGGACGCGAGGCACGCGAC  233  134     0     0
random_GGGGACGCGGGCCCGCACAA  306  334   251   549
random_GGGGCGGCAACGAAAACGCG    7   42     0     0
random_GGGGGAACGAAACACGAGCG  296  260    40    39

When I'm trying to read it into a dds object I get the following error:

> dds <- DESeq2::DESeqDataSetFromMatrix(countData = countdata,
+     colData = coldata, design = ~condition)
Error in `rownames<-`(`*tmp*`, value = colnames(countData)) :
  duplicate rownames not allowed

But when I test the row names for duplications I can't find any.

> anyDuplicated(rownames(countdata))
[1] 0

> table(table(rownames(countdata)))
    1
12402 

What am I missing here? 

How can I find out why this error is occuring?

thanks

Assa

 

The countdata object is a data.frame

> str(countdata)
'data.frame': 12402 obs. of  4 variables:
$ CTRL : num  87 340 88 731 374 425 753 279 151 249 ...
$ CTRL : num  267 474 117 690 649 634 957 375 145 374 ...
$ TREAT: num  511 772 365 99 150 ...
$ TREAT: num  353 1063 461 450 230 ...

 

​R.version
               _                          
platform       x86_64-pc-linux-gnu        
arch           x86_64                     
os             linux-gnu                  
system         x86_64, linux-gnu          
status                                    
major          3                          
minor          4.0                        
year           2017                       
month          04                         
day            21                         
svn rev        72570                      
language       R                          
version.string R version 3.4.0 (2017-04-21)
nickname       You Stupid Darkness  

deseq2 DESeqDataSetFromMatrix duplicate • 3.0k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 12 minutes ago
United States

Here is the error:

Error in `rownames<-`(`*tmp*`, value = colnames(countData)) 

Do you now see what the problem is? Hint: colnames(countData)

ADD COMMENT
0
Entering edit mode

Do the column names needs to differ?

 

ADD REPLY
0
Entering edit mode

The idea behind the SummarizedExperiment class is similar to ExpressionSet, where there is this complex object that acts like a more simple object in order to streamline analyses and shield the end user from having to know too much about the messy underlying reality of what they are doing.

This is unfortunately more of an ideal than an actuality, and you really do have to know something about SummarizedExperiments if you expect to be able to confidently and expertly use them in an analysis. There is no substitute for knowing what you are doing, and there is no way for you to know what you are doing than by reading all the documentation that comes with e.g., the SummarizedExperiment package. So you should do that, because it's far more efficient than asking questions on this site.

But to answer your question, one of the slots of a SummarizedExperiment is the colData slot, which contains information about the columns of your data. The colData itself is a DataFrame, and like data.frames, you have to have unique row.names. The row.names come from the column names of your count data, and if those are not unique you get the error that you see. 

ADD REPLY

Login before adding your answer.

Traffic: 861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6