Hi all!
I am analyzing RNA-seq data with DESeq2, and I have two confounding variables. One is the batch number and the other is whether cells were washed or not.
The data is comprised of samples of two different cell-lines that are un/treated with dox and un/modified with Luc/pax5/pax5-ita gene. These samples come from two batches, where the first contain only one type of cell line(NAM6) and the second batch contains two(NAM6/MHHCAL_2). This an abridged version of the metadata table
cohort | cell_line | mod | treatment |
---|---|---|---|
2 | MHHCALL2 | Luc | ctr |
2 | MHHCALL2 | Luc | dox |
2 | MHHCALL2 | P5 | ctr |
2 | MHHCALL2 | P5 | dox |
2 | MHHCALL2 | P5X | ctr |
2 | MHHCALL2 | P5X | dox |
2 | NALM6 | Luc | ctr |
2 | NALM6 | Luc | dox |
2 | NALM6 | P5 | ctr |
2 | NALM6 | P5 | dox |
2 | NALM6 | P5X | ctr |
2 | NALM6 | P5X | dox |
1 | NALM6 | Luc | ctr |
1 | NALM6 | P5 | ctr |
1 | NALM6 | P5X | ctr |
1 | NALM6 | Luc | dox |
1 | NALM6 | P5 | dox |
1 | NALM6 | P5X | dox |
To study the effect of different combinations of modifications and treatments in the cell lines, I modified the table as follows
cohort | cell_line | celllinecohort | mod | treatment | samplegroupsimple |
---|---|---|---|---|---|
2 | MHHCALL2 | MHHCALL2_2 | Luc | ctr | Luc_ctr |
2 | MHHCALL2 | MHHCALL2_2 | Luc | dox | Luc_dox |
2 | MHHCALL2 | MHHCALL2_2 | P5 | ctr | P5_ctr |
2 | MHHCALL2 | MHHCALL2_2 | P5 | dox | P5_dox |
2 | MHHCALL2 | MHHCALL2_2 | P5X | ctr | P5X_ctr |
2 | MHHCALL2 | MHHCALL2_2 | P5X | dox | P5X_dox |
2 | NALM6 | NALM6_2 | Luc | ctr | Luc_ctr |
2 | NALM6 | NALM6_2 | Luc | dox | Luc_dox |
2 | NALM6 | NALM6_2 | P5 | ctr | P5_ctr |
2 | NALM6 | NALM6_2 | P5 | dox | P5_dox |
2 | NALM6 | NALM6_2 | P5X | ctr | P5X_ctr |
2 | NALM6 | NALM6_2 | P5X | dox | P5X_dox |
1 | NALM6 | NALM6_1 | Luc | ctr | Luc_ctr |
1 | NALM6 | NALM6_1 | P5 | ctr | P5_ctr |
1 | NALM6 | NALM6_1 | P5X | ctr | P5X_ctr |
1 | NALM6 | NALM6_1 | Luc | dox | Luc_dox |
1 | NALM6 | NALM6_1 | P5 | dox | P5_dox |
1 | NALM6 | NALM6_1 | P5X | dox | P5X_dox |
As you can see, I have combined the mod and treatment columns into one column called sample_group_simple. As well as combining the cohort and cell_line columns into cell_line_cohort column. Finally, the following design for the analysis.
~ cell_line_cohort + cell_line_cohort:sample_group_simple
Unfortunately, the complex enough situation got more complicated when we found out that only a subset of samples have been washed by PBS. This variable is confounding the batch variable since all samples of the first batch have been washed, unlike the second one. As a result, any attempt to account for it in the analysis design leads to a not-full-rank model matrix. The final table looks like this
PBS | cohort | cell_line | celllinecohort | mod | treatment | samplegroupsimple |
---|---|---|---|---|---|---|
wash | 2 | MHHCALL2 | MHHCALL2_2 | Luc | ctr | Luc_ctr |
wash | 2 | MHHCALL2 | MHHCALL2_2 | Luc | dox | Luc_dox |
wash | 2 | MHHCALL2 | MHHCALL2_2 | P5 | ctr | P5_ctr |
wash | 2 | MHHCALL2 | MHHCALL2_2 | P5 | dox | P5_dox |
wash | 2 | MHHCALL2 | MHHCALL2_2 | P5X | ctr | P5X_ctr |
wash | 2 | MHHCALL2 | MHHCALL2_2 | P5X | dox | P5X_dox |
no_wash | 2 | NALM6 | NALM6_2 | Luc | ctr | Luc_ctr |
no_wash | 2 | NALM6 | NALM6_2 | Luc | dox | Luc_dox |
no_wash | 2 | NALM6 | NALM6_2 | P5 | ctr | P5_ctr |
no_wash | 2 | NALM6 | NALM6_2 | P5 | dox | P5_dox |
no_wash | 2 | NALM6 | NALM6_2 | P5X | ctr | P5X_ctr |
no_wash | 2 | NALM6 | NALM6_2 | P5X | dox | P5X_dox |
wash | 1 | NALM6 | NALM6_1 | Luc | ctr | Luc_ctr |
wash | 1 | NALM6 | NALM6_1 | P5 | ctr | P5_ctr |
wash | 1 | NALM6 | NALM6_1 | P5X | ctr | P5X_ctr |
wash | 1 | NALM6 | NALM6_1 | Luc | dox | Luc_dox |
wash | 1 | NALM6 | NALM6_1 | P5 | dox | P5_dox |
wash | 1 | NALM6 | NALM6_1 | P5X | dox | P5X_dox |
My question is how can I test for the effect of different combinations of treatment and modifications on different cell lines, while accounting for the two confounders, namely PBS and cell_line_cohort?
Thanks in advance, Mohamed
Thanks for the prompt reply, Micheal!
This means that I should do the following: 1) Combine cell_line, cohort, and PBS into one variable
cell_line_cohort_PBS <- cell_line + cohort + PBS
2) Use the created variable without modifying the inital design formula.cell_line_cohort_PBS + cell_line_cohort_PBS :sample_group_simple
Am I right?Sorry, I missed the fact that the confounding is with a condition of interest not another nuisance variable. In that case you can't really control for the nuisance variables.