#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Running DESeqDataSet with 4 factors?
0
3.6 years ago by
Lillyhchen10
United States
Lillyhchen10 wrote:

Hello,

I'm currently working on a project with two tumor cell lines each from a different patient, and each one of the tumor cell lines has a control and treated group.  I'm trying to look at the changes in gene expression that occur in each tumor cell line between untreated and those treated with Treatment A using DESeq2, but i'm having trouble and keep getting a warning message when running the DESEq pipeline

I'm running DESeq2 off of a Summarized Experiment:

mode="Union",
singleEnd=FALSE,
ignore.strand=FALSE,
fragments=TRUE, BPPARAM = SerialParam())

colData(se)=DataFrame(sample)

For the design I used: design=~Cell + Treat + Cell:Treat

This is the error message I get when I run the DEseq pipeline:

>dds<-DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- standard model matrices are used for factors with two levels and an interaction,
where the main effects are for the reference level of other factors.
see the 'Interactions' section of the vignette for more details: vignette('DESeq2')

Is there another way to run the differential expression analysis to look at the change in expression level between the control and treated for each cell line ? I'm not sure why i keep getting this error.  I've tried running DESeq2 with 1 cell line at a time (2 groups instead of 4) and I got very different results (only 5 genes instead of 81)

Thanks!

Lilly

modified 3.6 years ago • written 3.6 years ago by Lillyhchen10
Answer: Running DESeqDataSet with 4 factors?
1
3.6 years ago by
Michael Love21k
United States
Michael Love21k wrote:

To compare control and treated for each cell line, it will be easier for you to not use an interaction term, but instead follow the steps in this previous answer:

A: Factorial Design with DESeq2; contrast problem.

Hi,

Thanks for answering, does it matter if the factors are grouped together in the colData for the summarizedExperiment or should it be done with after the dds pipeline is run?

I combined the 2 factors as a column in the se:

colData(se)$Cell_Treat<-factor(paste0(colData$Cell, "-", colData$Treat)) > colData(se) DataFrame with 12 rows and 4 columns sample Cell Treat Cell_Treat <factor> <factor> <integer> <factor> 1 accepted_hits.CSC.Contc.HF2303 A 1 - 2 accepted_hits.CSC.TMZa.HF2303 A 2 - 3 accepted_hits.SDC.Contb.HF2303 A 1 - 4 accepted_hits.SDC.Contc.HF2303 A 1 - 5 accepted_hits.SDC.TMZb.HF2303 A 2 - ... ... ... ... ... 8 accepted_hits.CSC.TMZb.HF2927 B 2 - 9 accepted_hits.CSC.TMZc.HF2927 B 2 - 10 accepted_hits.SDC.Contc.HF2927 B 1 - 11 accepted_hits.SDC.TMZa.HF2927 B 2 - 12 accepted_hits.SDC.TMZc.HF2927 B 2 - As for the design of the DESeqDataSet, I used design=~Cell_Treat, but it gave an error that the design has a single variable, with all samples having the same value and suggested I used a design of ~1. I read the previous answer and tried to combine the variables in the dds, however i'm confused on the design set up for DESeqDataSet. I tried design=~Cell + Treatment and with that I was able to run DESeq, but when I tried to make a contrast of the groups I got an error that 'x' must be an array of at least two dimensions > dds<-DESeqDataSet(se, design=~Cell + Treat) > dds$group<-factor(paste0(dds$Cell, dds$Treat))
> design(dds)<-~group

> dds<-DESeq(dds)

> results(dds, contrast=c("group", "A 1", "A 2"))
Error in rowSums(cts.sub == 0) :
'x' must be an array of at least two dimensions

You can create this new column in the SummarizedExperiment before you make the dds, or you can make this column in the dds, but before you run DESeq().

You just have a small coding error in the first chunk of code. You can find small errors like this just by eye. Look at the column you defined in colData(se): it is just "-" repeated for all samples.

You can look at individual columns like so:

se$Cell_Treat (or equivalently, colData(se)$Cell_Treat).

The problem is that colData$Cell is not the right way to specify the Cell column (same with Treat). You can either do se$Cell or colData(se)$Cell. In the second chunk of code, I'm not sure what's the problem, because I don't know what version of DESeq2 you are using: packageVersion("DESeq2") But I'd guess the problem is that the second and third elements "A 1" and "A 2" need to be levels of dds$group:

levels(dds$group) ADD REPLYlink written 3.6 years ago by Michael Love21k Thanks! that helped a lot, i didn't notice I didn't specify the column from se and changed the code to colData(se)$Cell and the rest of the DESeq ran perfectly without any errors!

I contrasted the A1 vs A2 and B1 vs B2 and filtered out the results based on padj and log2FoldChange and found 1330 differentially expressed genes in both comparisons which I think makes sense because the drug must only effect certain genes when a cell is treated.