Question

DESeq error: Error in validityMethod(object) : nrow(design) == ncol(object) is not TRUE

0

Entering edit mode

maniermk • 0

@maniermk-23613

Last seen 4.2 years ago

I'm trying to run DESeq on a dds (created from a gse using information from tximeta) in which technical replicates from the same library (run on two different lanes) have been collapsed, so instead of 20 samples, I now have 10. When I run DESeq on the collapsed dds (ddsColl), I get an error:

`Error in validityMethod(object) : nrow(design) == ncol(object) is not TRUE>`

I wonder if it's because the colData that I used for my design matrix to create dds has 20 rows, and my ddsColl has only 10 columns. I originally had 20 samples that were collapsed to 10, and my colData rows represented samples. I'm not sure how to fix this besides rerunning DESeqDataSet with a new colData, but the order of operations seems wrong, in that I would want to run DESeqDataSet before I collapse replicates. My design formula doesn't include 'lane' information, so I don't think that would be an issue in running DESeq. It only includes condition and a term for condition:replicates.nested, to account for biological replicates nested within condition (<~SPERM + SPERM:LINE.NESTED>). My colData is as follows:

`<DataFrame with 10 rows and 8 columns
                                FASTQ_NAMES     names    SPERM     LINE    REP     LANE LINE.NESTED runsCollapsed
                                   <factor> <integer> <factor> <factor> <factor> <factor>    <factor>   <character>
 14120X1_170412_D00294_0311_ACAJ3TANXX_6         1        H      H08    A        1           1           1,2
 14120X2_170412_D00294_0311_ACAJ3TANXX_6         3        H      H08     B        2           1           3,4
 14120X3_170412_D00294_0311_ACAJ3TANXX_6         5        H      H08    C        3           1           5,6
 14120X4_170412_D00294_0311_ACAJ3TANXX_6         7        H      H20   A        4           2           7,8
 14120X5_170412_D00294_0311_ACAJ3TANXX_6         9        H      H20   B        5           2          9,10
 14120X6_170412_D00294_0311_ACAJ3TANXX_6        11        H      H20    C        6           2         11,12
 14120X7_170412_D00294_0311_ACAJ3TANXX_6        13        L      L08     A        7           1         13,14
 14120X8_170412_D00294_0311_ACAJ3TANXX_6        15        L      L08   B        8           1         15,16
 14120X11_170412_D00294_0311_ACAJ3TANXX_6        17        L      L17   B        9           2         17,18
 14120X12_170412_D00294_0311_ACAJ3TANXX_6        19        L      L17   C       10           2         19,20`

Is there another step I'm missing between collapsing replicates and DESeq? Also, I'd love confirmation that my design formula is appropriate given I'm only interested in genes that are differentially expressed between H and L SPERM, after accounting for biological replicates LINE nested within SPERM and REP nested within LINE.

deseq2 DESeq nrow == ncol is not TRUE • 2.0k views

ADD COMMENT • link updated 4.2 years ago by Kevin Blighe ★ 4.0k • written 4.2 years ago by maniermk • 0

score 0 · Answer 1 · 2020-05-28

0

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 5 weeks ago

Republic of Ireland

Hi maniermk,

It is as you stated, and means that you will have to ensure that your colData is aligned to your input raw counts data. Basically, the rows of the colData should be equal (in both order and length) to the columns of the input raw counts data.

If you originally had replicates, then I would not have collapsed them. Or, if you must, I would 'collapse' / summarise them after transformation via vst() or rld()

Kevin

ADD COMMENT • link 4.2 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thanks Kevin. So if I understand you correctly, you're suggesting that I collapse replicates (if I absolutely have to, which I think is recommended) AFTER running a vst() or rld() transformation on the dds. I will try this, but I am still not sure how to make the nrow(design)==ncol(object) when running DESeq() of the collapsed dds (ddsColl) with the original coldata (20 rows instead of 10). If I remove the even rows from my coldata file (make coldata2 with 10 rows instead of 20), make a new design matrix based on that (using model.matrix()), and try to remake the dds with the new design matrix (DESeqDataSet()), now the nrows of the new coldata don't match with the ncol of the gse. I need more explicit steps as to how to move forward with DESeq() after collapsing the technical replicates.

ADD REPLY • link 4.2 years ago maniermk • 0

0

Entering edit mode

Hi again, perhaps we need to distinguish between biological and technical replicates

biological replicates --> should not be collapsed (but can be, for whatever reason, downstream of DESeq2)
technical replicates --> can be collapsed, if desired, before performing differential expression

If you choose to collapse your technical replicates, then you still start with the metadata in its complete form with all samples and replicates included. DESeq2 will, internally, handle the change in the colData. Take a look at the following example:

Create example dataset

dds <- makeExampleDESeqDataSet(m=12)
dds$sample <- factor(sample(paste0("sample",rep(1:9, c(2,1,1,2,1,1,2,1,1)))))
dds$run <- paste0("run",1:12)
colData(dds)
DataFrame with 12 rows and 3 columns
         condition   sample         run
          <factor> <factor> <character>
sample1          A  sample9        run1
sample2          A  sample7        run2
sample3          A  sample3        run3
sample4          A  sample4        run4
sample5          A  sample7        run5
...            ...      ...         ...
sample8          B  sample1        run8
sample9          B  sample5        run9
sample10         B  sample2       run10
sample11         B  sample8       run11
sample12         B  sample4       run12

Now collapse the replicates:

ddsColl <- collapseReplicates(dds, dds$sample, dds$run)

colData(ddsColl)
DataFrame with 9 rows and 4 columns
        condition   sample         run runsCollapsed
         <factor> <factor> <character>   <character>
sample1         A  sample1        run6     run6,run8
sample2         B  sample2       run10         run10
sample3         A  sample3        run3          run3
sample4         A  sample4        run4    run4,run12
sample5         B  sample5        run9          run9
sample6         B  sample6        run7          run7
sample7         A  sample7        run2     run2,run5
sample8         B  sample8       run11         run11
sample9         A  sample9        run1          run1

Then proceed with the ddsColl object.

ADD REPLY • link 4.2 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

I am proceeding with the ddsColl object, and this is where I am getting the error

Error in validityMethod(object) : nrow(design) == ncol(object) is not TRUE>

If the colData is changed automatically, I'm not sure where this error is coming from then. I run the following:

ddsColl <- collapseReplicates(dds, dds$LANE, dds$names) colData(ddsColl) colnames(ddsColl)

and get DataFrame with 10 rows and 8 columns FASTQ_NAMES names <factor> <integer> 1 14120X1_170412_D00294_0311_ACAJ3TANXX_6 1 2 14120X2_170412_D00294_0311_ACAJ3TANXX_6 3 3 14120X3_170412_D00294_0311_ACAJ3TANXX_6 5 4 14120X4_170412_D00294_0311_ACAJ3TANXX_6 7 5 14120X5_170412_D00294_0311_ACAJ3TANXX_6 9 6 14120X6_170412_D00294_0311_ACAJ3TANXX_6 11 7 14120X7_170412_D00294_0311_ACAJ3TANXX_6 13 8 14120X8_170412_D00294_0311_ACAJ3TANXX_6 15 9 14120X11_170412_D00294_0311_ACAJ3TANXX_6 17 10 14120X12_170412_D00294_0311_ACAJ3TANXX_6 19 SPERM LINE REP LANE LINE.NESTED <factor> <factor> <factor> <factor> <factor> 1 H H08 A 1 1 2 H H08 B 2 1 3 H H08 C 3 1 4 H H20 A 4 2 5 H H20 B 5 2 6 H H20 C 6 2 7 L L08 A 7 1 8 L L08 B 8 1 9 L L17 B 9 2 10 L L17 C 10 2 runsCollapsed <character> 1 1,2 2 3,4 3 5,6 4 7,8 5 9,10 6 11,12 7 13,14 8 15,16 9 17,18 10 19,20

I double check the collapseReplicates:

matchFirstLevel <- dds$LANE == levels(dds$LANE)[1] stopifnot(all(rowSums(counts(dds[,matchFirstLevel])) == counts(ddsColl[,1])))

and it runs fine.

I filter out counts under 10:

keep <- rowSums(counts(ddsColl)) >=10 ddsColl <- ddsColl[keep,] ddsColl

which gives

class: DESeqDataSet dim: 13267 10 metadata(7): tximetaInfo quantInfo ... txdbInfo version assays(1003): counts abundance ... infRep999 infRep1000 rownames(13267): FBgn0000008 FBgn0000014 ... FBgn0286933 FBgn0286940 rowData names(7): gene_id gene_name ... symbol REFSEQ colnames(10): 1 2 ... 9 10 colData names(8): FASTQ_NAMES names ... LINE.NESTED runsCollapsed

Then proceed with DESeq:

ddsColl <- DESeq(ddsColl)

and get

Error in validityMethod(object) : nrow(design) == ncol(object) is not TRUE

Thoughts? I'm stumped. (sorry for the weird formatting, can't get markdown to work properly for me today)

ADD REPLY • link 4.2 years ago maniermk • 0

0

Entering edit mode

Just checking back to see if I can get some more help with this. Thanks.

ADD REPLY • link 4.1 years ago maniermk • 0

0

Entering edit mode

Just checking back to see if I can get some more help with this. Thanks.

ADD REPLY • link 4.1 years ago maniermk • 0

0

Entering edit mode

I see above that you are specifying a matrix as the design.

I'd recommend using ~1 as a design for the object before collapsing. Then after you're done collapsing, and you need to make a matrix for the design, you can create it using colData, and provide it to the full argument of DESeq().

ADD REPLY • link 4.1 years ago Michael Love 42k