Question

DESeqDataSetFromMatrix giving error: "argument must be coercible to non-negative integer"

1

Entering edit mode

ErickF ▴ 40

@erickf-11032

Last seen 7.9 years ago

Hi, I'm trying to run DEseq2. As a test I'm using RNAseq data from 8 samples. My countdata, coldata, and rowdata objects look (to me) formatted as they should, the dimensions/lengths match, count data is correct, etc. But when I run DESeqDataSetFromMatrix() I get this error:

> ddsFull <- DESeqDataSetFromMatrix(countData = countdata,
+    colData = coldata, rowData = rowdata, design = ~ type + sex)
Error in seq_len(length(idx) - 1) : 
  argument must be coercible to non-negative integer
In addition: Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  58 duplicate rownames were renamed by adding numbers

Here is the (detailed) step-by-step. First I generate my SE object (works without problems):

> ex3 <- summarizeOverlaps(features=grl, reads=bamLst, ignore.strand=T, singleEnd=T)
> class(ex3)
[1] "RangedSummarizedExperiment"
attr(,"package")
[1] "SummarizedExperiment"

Then I create the countdata, coldata, rowdata objects (without problems):

> countdata <- assay(ex3)
> coldata <- colData(ex3)
> rowdata <- rowRanges(ex3)
> class(coldata)
[1] "DataFrame"
attr(,"package")
[1] "S4Vectors"
> class(rowdata)
[1] "GRangesList"
attr(,"package")
[1] "GenomicRanges"
> class(countdata)
[1] "matrix"
> length(rowdata)
[1] 24943
> dim(coldata)
[1] 8   6
> dim(countdata)
[1] 24943   8

> head(countdata)
         OM_003  OM_005  OM_014  OM_023
A1BG        259      69     116      69
NAT2          6      11       0       0
ADA        1785     396     964     441
CDH2        119      52      35      45 ...

> head(rowdata)
GRangesList object of length 6:
$A1BG 
GRanges object with 15 ranges and 2 metadata columns:
       seqnames               ranges strand |   exon_id   exon_name
          <Rle>            <IRanges>  <Rle> | <integer> <character>
   [1]    chr19 [58346806, 58347029]      - |    264625        <NA>
   [2]    chr19 [58347353, 58347640]      - |    264626        <NA> ...

> head(coldata)
DataFrame with 6 rows and 6 columns
            type      sex   status    height   weight     tech
           <factor> <factor> <factor> <numeric> <numeric> <factor>
OM_003       AA        F     yes      15.9     36.67        2
OM_005       AA        M     no       10.5     83.35        1
OM_014       BB        F     yes      14.3     31.22        7 ...

And then the error:

> ddsFull <- DESeqDataSetFromMatrix(countData = countdata,
+    colData = coldata, rowData = rowdata, design = ~ type + sex)
Error in seq_len(length(idx) - 1) : 
  argument must be coercible to non-negative integer
In addition: Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  58 duplicate rownames were renamed by adding numbers

The traceback():

7: eval(expr, envir, enclos)
6: eval(quote(list(...)), env)
5: eval(quote(list(...)), env)
4: standardGeneric("paste")
3: paste(rnms[idx[-1]], c(seq_len(length(idx) - 1)), sep = ".")
2: DESeqDataSet(se, design = design, ignoreRank)
1: DESeqDataSetFromMatrix(countData = countdata, colData = coldata, 
       rowData = rowdata, design = ~type + sex

Any thoughts?? Count data is correct (zeros and positive integers, no "negative" counts), colData is correctly formatted, rowData seems correct as well. I am not sure what paste(rnms[idx[-1]], c(seq_len(length(idx)-1), sep=".") means, but it seems like maybe that is where the error is generating??

DESeqDataSet DESeqDataSetFromMatrix rnaseq deseq2 • 8.2k views

ADD COMMENT • link 7.9 years ago ErickF ▴ 40

0

Entering edit mode

Thomas Carroll ▴ 420

@thomas-carroll-7019

Last seen 18 months ago

United States/New York/The Rockefeller …

hi,

As a thought, the manual (summarizedexperiment 1.2.2) specifies rowData accepts a DataFrame and rowRanges a GRangesList (as you supply). If you try with 'rowRanges = rowdata' instead of 'rowData=rowdata' do you see the same error?

tom

ADD COMMENT • link 7.9 years ago Thomas Carroll ▴ 420

0

Entering edit mode

Hi Tom -- thanks. I tried rowRanges=rowdata but got the same error. Then I thought maybe if I defined rowdata by using rowdata <- rowData(ex3) (instead of rowRanges); and then I just omitted rowdata altogether, and still the same problem...

The only difference I can find between my data is that the class for my colData says:

[1] "DataFrame"
attr(,"package")
[1] "S4Vectors"

Whereas class(colData(parathyroidGenesSE)) says:

[1] "DataFrame"
attr(,"package")
[1] "IRanges"

I have no idea if this is the root of the problem...

ADD REPLY • link 7.9 years ago ErickF ▴ 40

score 2 · Accepted Answer · 2016-07-02

2

Entering edit mode

ErickF ▴ 40

@erickf-11032

Last seen 7.9 years ago

Figured it out!

The problem: I had replaced the original rownames in ex3 (the SE object) from entrezID numbers (which make zero sense to me) to gene symbols (which at least make some sense). I didn't think changing the rownames would matter so long as they matched rowdata, but it seems that was the big problem.

Solution: Anyway, I re-generated the SE object (took a while!) without replacing rownames and now DESeqDataSetFromMatrix worked without a problem. I was able to run the DESeq2 pipeline to results. Then I can add the gene symbols/names to the results data frame.

ADD COMMENT • link 7.9 years ago ErickF ▴ 40

0

Entering edit mode

Gene names are not syntactically valid row names. You can use make.names to convert row names into syntactically valid names. But then you have mutated gene names for downstream analyses.

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.

ADD REPLY • link 7.3 years ago tapa741 • 0