Hi, I am analyzing RNA-seq data with four factors: Condition (WT and MT), CellLine (2 cell lines in each condition), Replicate (3 replicates in each cell line) and library (paired pull-down and control for a replicate): total 2*2*3*2=24 samples (Table is the below). I would like to find the difference of pull-down/control (by paired) between the two conditions: WT and MT controlling CellLines. The issue is that cell lines are nested in conditions and replicates are nested in cell lines. How can I design the formula for DESeq2? Based on the DESeq vignette, I created the variable to recode cell line with two levels called "1" and "2" as the last column in the table and tried two designs: (1) Condition + Condition:CellLineSuppressed + Condition:CellLineSuppressed:Replicate + Condition:Library (2) Condition + Condition:CellLineSuppressed + Condition:Replicate + Condition:Library. Please let me know your advise !! Also, note that I didn't include a single "library" term in the design. Is it okay?
Sample | Condition | CellLine | Replicate | Library | CellLineSuppressed |
Sample-1 | WT | Wta | 1 | Input | 1 |
Sample-2 | WT | Wta | 2 | Input | 1 |
Sample-3 | WT | Wta | 3 | Input | 1 |
Sample-4 | WT | Wta | 1 | Pulldown | 1 |
Sample-5 | WT | Wta | 2 | Pulldown | 1 |
Sample-6 | WT | Wta | 3 | Pulldown | 1 |
Sample-7 | WT | WTb | 1 | Input | 2 |
Sample-8 | WT | WTb | 2 | Input | 2 |
Sample-9 | WT | WTb | 3 | Input | 2 |
Sample-10 | WT | WTb | 1 | Pulldown | 2 |
Sample-11 | WT | WTb | 2 | Pulldown | 2 |
Sample-12 | WT | WTb | 3 | Pulldown | 2 |
Sample-13 | MT | MTc | 1 | Input | 1 |
Sample-14 | MT | MTc | 2 | Input | 1 |
Sample-15 | MT | MTc | 3 | Input | 1 |
Sample-16 | MT | MTc | 1 | Pulldown | 1 |
Sample-17 | MT | MTc | 2 | Pulldown | 1 |
Sample-18 | MT | MTc | 3 | Pulldown | 1 |
Sample-19 | MT | MTd | 1 | Input | 2 |
Sample-20 | MT | MTd | 2 | Input | 2 |
Sample-21 | MT | MTd | 3 | Input | 2 |
Sample-22 | MT | MTd | 1 | Pulldown | 2 |
Sample-23 | MT | MTd | 2 | Pulldown | 2 |
Sample-24 | MT | MTd | 3 | Pulldown | 2 |
Hi Michael, thanks for your answer. I used the following scripts but got warning or error depending on design formula at the creation of deseq object. Can you check?
Script 1 generating error
deseq2.obj <- DESeqDataSetFromMatrix(countData=fCount, colData=sampleSheet ,design= ~ Library)
mm <- model.matrix(~Line + Line:Replicate + Condition:Library, sampleSheet)
mm <- mm[,-grep("LibraryInput",colnames(mm))]
deseq2.obj <- DESeq(deseq2.obj[apply(counts(deseq2.obj),1,min)>=3,], modelMatrixType = mm)
<Error>
Error in fitBeta(ySEXP = ySEXP, xSEXP = xSEXP, nfSEXP = nfSEXP, alpha_hatSEXP = alpha_hatSEXP, :
join_cols() / join_vert(): number of columns must be the same
In addition: Warning messages:
1: In if (modelMatrixType == "expanded" & !betaPrior) { :
the condition has length > 1 and only the first element will be used
2: In if (renameCols) { :
the condition has length > 1 and only the first element will be used
3: In if (modelMatrixType == "expanded") { :
the condition has length > 1 and only the first element will be used
4: In if (modelMatrixType == "expanded") { :
the condition has length > 1 and only the first element will be used
5: In if (modelMatrixType == "standard") { :
the condition has length > 1 and only the first element will be used
Script 2 generating warning
deseq2.obj <- DESeqDataSetFromMatrix(countData=fCount, colData=sampleSheet ,design= ~ Condition + Condition: Library)
mm <- model.matrix(~Line + Line:Replicate + Condition:Library, sampleSheet)
mm <- mm[,-grep("LibraryInput",colnames(mm))]
deseq2.obj <- DESeq(deseq2.obj[apply(counts(deseq2.obj),1,min)>=3,], modelMatrixType = mm)
<Warning>
Warning message:
In if (modelMatrixType == "expanded" & !betaPrior) { :
the condition has length > 1 and only the first element will be used
You should pass the matrix to the argument “full”. See ?DESeq
Thanks for your answer. By the way, what if the effect of condition is dependent on cell line? there are four cell lines total: two from condition WT and two from condition MT: basically nested sample in the condition. That was why I considered two designs in the question. Could you recommend the design?
But you can’t possibly estimate it because you have different cell lines for WT and MT. You’ve labeled them A-D.
Thanks for follow-up !! Yes, you are correct that they are different cell lines. However, that's why I used "CellLineSuppressed" in my two designs: (1) Condition + Condition:CellLineSuppressed + Condition:CellLineSuppressed:Replicate + Condition:Library (2) Condition + Condition:CellLineSuppressed + Condition:Replicate + Condition:Library. Does either look okay?
I’ve explained the effects that can be estimated and the design to use, and the ones that can’t be estimated. I’d suggest for further questions on why these can’t be estimated here you could discuss with a statistician.