Search
Question: What causes this DESeq2 error at "fitting model and testing"?
0
gravatar for ap3637
8 weeks ago by
ap36370
ap36370 wrote:

I am using DESeq2 on a pretty large dataset.  I have mock and infection conditions for several mouse lines at 3 time points.  My goal is to contrast each line at each timepoint with it's corresponding mock at the same timepoint.  Previously I split the data into smaller subsets by line, but now I'm trying to use the entire data set at once to get the benefits of more data.  

 

I have the following DESeq object:

class: DESeqDataSet
dim: 50686 116
metadata(1): version
assays(1): counts
rownames(50686): ENSMUSG00000099985 ENSMUSG00000030105 ... ENSMUSG00000053774 ENSMUSG00000030058
rowData names(0):
colnames(116): RIX007 RIX008 ... RIX233 RIX234
colData names(20): ID Tissue ... Replicate sizeFactor

 

My design is:

~ Infection_Line_Time

 

My plan was to make a composite field, infection_line_time, and then relevel and rerun DESeq(dds) for each condition.  For example, to get the results for mock_day1_lineA, I do the following.

dds$Infection_Line_Time <- relevel(dds$Infection_Line_Time, "mock_day1_16012X15119")
dds <- DESeq(dds)

Also the Infection_Line_Time looks like this:
> dds$Infection_Line_Time
  [1] mock_day1_16012X15119  mock_day1_16012X15119  mock_day3_16012X15119  mock_day3_16012X15119  mock_day5_16012X15119  mock_day5_16012X15119 
  [7] ebola_day1_16012X15119 ebola_day1_16012X15119 ebola_day3_16012X15119 ebola_day3_16012X15119 ebola_day5_16012X15119 ebola_day5_16012X15119
 [13] mock_day1_16513X16188  mock_day1_16513X16188  mock_day3_16513X16188  mock_day3_16513X16188  mock_day5_16513X16188  mock_day5_16513X16188 
 [19] mock_day5_16513X16188  ebola_day1_16513X16188 ebola_day3_16513X16188 ebola_day5_16513X16188 ebola_day3_16513X16188 ebola_day5_16513X16188
 [25] ebola_day5_16513X16188 ebola_day5_16513X16188 mock_day1_8034X8048    mock_day1_8034X8048    mock_day1_8034X8048    mock_day3_8034X8048   
 [31] mock_day3_8034X8048    mock_day3_8034X8048    mock_day5_8034X8048    mock_day5_8034X8048    mock_day5_8034X8048    ebola_day1_8034X8048  
 [37] ebola_day1_8034X8048   ebola_day1_8034X8048   ebola_day3_8034X8048   ebola_day3_8034X8048   ebola_day3_8034X8048   ebola_day5_8034X8048  
 [43] ebola_day5_8034X8048   ebola_day5_8034X8048   ebola_day1_16072X3393  ebola_day1_16072X3393  ebola_day1_16072X3393  ebola_day3_16072X3393 
 [49] ebola_day3_16072X3393  ebola_day3_16072X3393  ebola_day5_16072X3393  ebola_day5_16072X3393  ebola_day5_16072X3393  ebola_day5_16072X3393 
 [55] mock_day1_16072X3393   mock_day1_16072X3393   mock_day1_16072X3393   mock_day3_16072X3393   mock_day3_16072X3393   mock_day3_16072X3393  
 [61] mock_day5_16072X3393   mock_day5_16072X3393   mock_day1_CC041        mock_day3_CC041        mock_day3_CC041        mock_day5_CC041       
 [67] mock_day5_CC041        mock_day5_CC041        mock_day5_CC041        ebola_day1_CC041       ebola_day1_CC041       ebola_day1_CC041      
 [73] ebola_day3_CC041       ebola_day3_CC041       ebola_day3_CC041       mock_day1_CC041        mock_day1_CC041        mock_day1_CC041       
 [79] mock_day3_CC041        mock_day3_CC041        mock_day3_CC041        mock_day5_CC041        mock_day5_CC041        mock_day5_CC041       
 [85] ebola_day1_CC041       ebola_day1_CC041       ebola_day1_CC041       ebola_day1_CC041       ebola_day1_CC041       ebola_day3_CC041      
 [91] ebola_day3_CC041       ebola_day3_CC041       ebola_day3_CC041       ebola_day3_CC041       ebola_day5_CC041       ebola_day5_CC041      
 [97] ebola_day5_CC041       ebola_day5_CC041       ebola_day5_CC041       mock_day1_CC011        mock_day3_CC011        mock_day3_CC011       
[103] mock_day5_CC011        mock_day5_CC011        ebola_day1_CC011       ebola_day1_CC011       ebola_day3_CC011       ebola_day3_CC011      
[109] ebola_day5_CC011       ebola_day5_CC011       ebola_day1_CC011       ebola_day3_CC011       ebola_day5_CC011       ebola_day1_CC011      
[115] ebola_day3_CC011       ebola_day5_CC011      
36 Levels: mock_day1_16012X15119 ebola_day1_16012X15119 ebola_day1_16072X3393 ebola_day1_16513X16188 ebola_day1_8034X8048 ... mock_day5_CC041

 

When I attempt to run the command, 

dds <- DESeq(dds)

it looks normal until the "fitting model and testing" phase, then throws the following error:

error: matrix multiplication: incompatible matrix dimensions: 116x36 and 0x1
Error in fitBeta(ySEXP = ySEXP, xSEXP = xSEXP, nfSEXP = nfSEXP, alpha_hatSEXP = alpha_hatSEXP,  : 
  matrix multiplication: incompatible matrix dimensions: 116x36 and 0x1

 

I've never seen this before and I can't find anyone else who has encountered this problem online.  I'm actively working on resolving it, but could use some help if anyone has any input or advice.  

 

My theories are as of now that it's a memory issue, or that when I'm comparing different lines to mock_day1_lineA, that it isn't happy for some reason.  Like, for instance, ebola_day3_lineC vs mock_day1_lineA will be one possibility in the resultsNames(dds).  It's a comparison that makes no sense and I won't use, but it's still being calculated for every permutation.  I'm still not sure how to handle it.  Is there a better design or a way to do this without resorting to cutting my data into separate data sets?

 

Thanks in advance for any insight you can provide!   

ADD COMMENTlink modified 7 weeks ago • written 8 weeks ago by ap36370
0
gravatar for Michael Love
8 weeks ago by
Michael Love15k
United States
Michael Love15k wrote:

Are you using the latest version (1.16)?

ADD COMMENTlink written 8 weeks ago by Michael Love15k

Yes, using version 1.16.

ADD REPLYlink written 7 weeks ago by ap36370

I can't reproduce this error so I'm not sure what to say, e.g. 36 levels, 3 replicates each works fine with simulated data, this takes 30 seconds to fit per 500 genes:

dds <- makeExampleDESeqDataSet(n=500,m=108)
dds$condition <- factor(rep(1:36,each=3))
dds <- DESeq(dds)

Can you print your sessionInfo() so I can see if maybe there are some package incompatibilities? You can also run biocValid() from the BiocInstaller package to see if you are up to date on all packages.

By the way, you shouldn't be releveling and rerunning DESeq() for comparing groups, you only need to use results() with contrast specifying the two groups to compare.

e.g. for the above example, you only run DESeq() once, then:

results(dds, contrast=c("condition","36","35"))
ADD REPLYlink written 7 weeks ago by Michael Love15k

Thanks for your reply.  The output you requested is too many characters to post here directly, so here is a link to that information.

https://justpaste.it/1c00t

 

Thanks very much for your help, the point about the releveling is also very useful.  I'm going to try to update some packages that are older (though they seem unrelated at first glance).  If it doesn't work after that I'll try running it on a high memory machine.  Currently the rld(dds) step is taking several hours because of the data size.  Thanks again!

ADD REPLYlink written 7 weeks ago by ap36370

if that's rlog(), I highly recommend using vst() instead. It's much faster for large datasets, and this is what I tend to use now for speed.

ADD REPLYlink written 7 weeks ago by Michael Love15k

Update:

It seems like it's getting past this phase using VST and updated R packages.  At "estimating dispersions" now, so hopefully everything will go through. I did receive the following warning, but maybe it's okay.  

27 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
-- replacing outliers and refitting for 23973 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

 

 

ADD REPLYlink written 7 weeks ago by ap36370

This is ok, it's just that the increases to the likelihood are accumulating slowly. Typically the beta isn't changing much but still producing small changes to likelihood.

ADD REPLYlink written 7 weeks ago by Michael Love15k

Ok great, that's good to know.  So assuming the "estimating dispersions" goes through, for future reference for anyone who encounters this problem, try VST instead of rlog, and update your R packages.  

My guess is that the rlog might have failed somehow in the larger data set, if I had to guess what the fix was I would say varianceStabilizingTransformation(dds).  

 

Thanks for your help Michael!

ADD REPLYlink written 7 weeks ago by ap36370
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 115 users visited in the last hour