Running DESeqDataSet with 4 factors?
Entering edit mode
Lillyhchen ▴ 10
Last seen 6.1 years ago
United States


I'm currently working on a project with two tumor cell lines each from a different patient, and each one of the tumor cell lines has a control and treated group.  I'm trying to look at the changes in gene expression that occur in each tumor cell line between untreated and those treated with Treatment A using DESeq2, but i'm having trouble and keep getting a warning message when running the DESEq pipeline

I'm running DESeq2 off of a Summarized Experiment:

se <- summarizeOverlaps(features=exons, reads=list,
                        fragments=TRUE, BPPARAM = SerialParam())


For the design I used: design=~Cell + Treat + Cell:Treat

This is the error message I get when I run the DEseq pipeline:

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- standard model matrices are used for factors with two levels and an interaction,
   where the main effects are for the reference level of other factors.
   see the 'Interactions' section of the vignette for more details: vignette('DESeq2')

Is there another way to run the differential expression analysis to look at the change in expression level between the control and treated for each cell line ? I'm not sure why i keep getting this error.  I've tried running DESeq2 with 1 cell line at a time (2 groups instead of 4) and I got very different results (only 5 genes instead of 81)



summarizedexperiment deseq2 deseqdataset multiple factor design differential gene expression • 3.0k views
Entering edit mode
Last seen 2 days ago
United States

To compare control and treated for each cell line, it will be easier for you to not use an interaction term, but instead follow the steps in this previous answer:

A: Factorial Design with DESeq2; contrast problem.

Entering edit mode


Thanks for answering, does it matter if the factors are grouped together in the colData for the summarizedExperiment or should it be done with after the dds pipeline is run?  

I combined the 2 factors as a column in the se:

colData(se)$Cell_Treat<-factor(paste0(colData$Cell, "-", colData$Treat))
> colData(se)
DataFrame with 12 rows and 4 columns
                            sample     Cell     Treat Cell_Treat
                          <factor> <factor> <integer>   <factor>
1   accepted_hits.CSC.Contc.HF2303        A         1          -
2    accepted_hits.CSC.TMZa.HF2303        A         2          -
3   accepted_hits.SDC.Contb.HF2303        A         1          -
4   accepted_hits.SDC.Contc.HF2303        A         1          -
5    accepted_hits.SDC.TMZb.HF2303        A         2          -
...                            ...      ...       ...        ...
8    accepted_hits.CSC.TMZb.HF2927        B         2          -
9    accepted_hits.CSC.TMZc.HF2927        B         2          -
10  accepted_hits.SDC.Contc.HF2927        B         1          -
11   accepted_hits.SDC.TMZa.HF2927        B         2          -
12   accepted_hits.SDC.TMZc.HF2927        B         2          -

As for the design of the DESeqDataSet, I used design=~Cell_Treat, but it gave an error that the design has a single variable, with all samples having the same value and suggested I used a design of ~1.  

I read the previous answer and tried to combine the variables in the dds, however i'm confused on the design set up for DESeqDataSet.  I tried design=~Cell + Treatment and with that I was able to run DESeq, but when I tried to make a contrast of the groups I got an error that 'x' must be an array of at least two dimensions

> dds<-DESeqDataSet(se, design=~Cell + Treat)
> dds$group<-factor(paste0(dds$Cell, dds$Treat))
> design(dds)<-~group

> dds<-DESeq(dds)

 > results(dds, contrast=c("group", "A 1", "A 2"))
Error in rowSums(cts.sub == 0) : 
  'x' must be an array of at least two dimensions
Entering edit mode

You can create this new column in the SummarizedExperiment before you make the dds, or you can make this column in the dds, but before you run DESeq().

You just have a small coding error in the first chunk of code. You can find small errors like this just by eye. Look at the column you defined in colData(se): it is just "-" repeated for all samples.

You can look at individual columns like so:


(or equivalently, colData(se)$Cell_Treat).

The problem is that colData$Cell is not the right way to specify the Cell column (same with Treat). You can either do se$Cell or colData(se)$Cell.

In the second chunk of code, I'm not sure what's the problem, because I don't know what version of DESeq2 you are using:


But I'd guess the problem is that the second and third elements "A 1" and "A 2" need to be levels of dds$group:

Entering edit mode

Thanks! that helped a lot, i didn't notice I didn't specify the column from se and changed the code to colData(se)$Cell and the rest of the DESeq ran perfectly without any errors! 

I contrasted the A1 vs A2 and B1 vs B2 and filtered out the results based on padj and log2FoldChange and found 1330 differentially expressed genes in both comparisons which I think makes sense because the drug must only effect certain genes when a cell is treated.



Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6