Question

DESeq2 accounting for two confounders with interaction in the design matrix

0

Entering edit mode

mohamedrefaat.1.1992 • 0

@mohamedrefaat11992-22243

Last seen 2.4 years ago

Austria

Hi all!

I am analyzing RNA-seq data with DESeq2, and I have two confounding variables. One is the batch number and the other is whether cells were washed or not.

The data is comprised of samples of two different cell-lines that are un/treated with dox and un/modified with Luc/pax5/pax5-ita gene. These samples come from two batches, where the first contain only one type of cell line(NAM6) and the second batch contains two(NAM6/MHHCAL_2). This an abridged version of the metadata table

cohort	cell_line	mod	treatment
2	MHHCALL2	Luc	ctr
2	MHHCALL2	Luc	dox
2	MHHCALL2	P5	ctr
2	MHHCALL2	P5	dox
2	MHHCALL2	P5X	ctr
2	MHHCALL2	P5X	dox
2	NALM6	Luc	ctr
2	NALM6	Luc	dox
2	NALM6	P5	ctr
2	NALM6	P5	dox
2	NALM6	P5X	ctr
2	NALM6	P5X	dox
1	NALM6	Luc	ctr
1	NALM6	P5	ctr
1	NALM6	P5X	ctr
1	NALM6	Luc	dox
1	NALM6	P5	dox
1	NALM6	P5X	dox

To study the effect of different combinations of modifications and treatments in the cell lines, I modified the table as follows

cohort	cell_line	celllinecohort	mod	treatment	samplegroupsimple
2	MHHCALL2	MHHCALL2_2	Luc	ctr	Luc_ctr
2	MHHCALL2	MHHCALL2_2	Luc	dox	Luc_dox
2	MHHCALL2	MHHCALL2_2	P5	ctr	P5_ctr
2	MHHCALL2	MHHCALL2_2	P5	dox	P5_dox
2	MHHCALL2	MHHCALL2_2	P5X	ctr	P5X_ctr
2	MHHCALL2	MHHCALL2_2	P5X	dox	P5X_dox
2	NALM6	NALM6_2	Luc	ctr	Luc_ctr
2	NALM6	NALM6_2	Luc	dox	Luc_dox
2	NALM6	NALM6_2	P5	ctr	P5_ctr
2	NALM6	NALM6_2	P5	dox	P5_dox
2	NALM6	NALM6_2	P5X	ctr	P5X_ctr
2	NALM6	NALM6_2	P5X	dox	P5X_dox
1	NALM6	NALM6_1	Luc	ctr	Luc_ctr
1	NALM6	NALM6_1	P5	ctr	P5_ctr
1	NALM6	NALM6_1	P5X	ctr	P5X_ctr
1	NALM6	NALM6_1	Luc	dox	Luc_dox
1	NALM6	NALM6_1	P5	dox	P5_dox
1	NALM6	NALM6_1	P5X	dox	P5X_dox

As you can see, I have combined the mod and treatment columns into one column called sample_group_simple. As well as combining the cohort and cell_line columns into cell_line_cohort column. Finally, the following design for the analysis.

~ cell_line_cohort + cell_line_cohort:sample_group_simple

Unfortunately, the complex enough situation got more complicated when we found out that only a subset of samples have been washed by PBS. This variable is confounding the batch variable since all samples of the first batch have been washed, unlike the second one. As a result, any attempt to account for it in the analysis design leads to a not-full-rank model matrix. The final table looks like this

PBS	cohort	cell_line	celllinecohort	mod	treatment	samplegroupsimple
wash	2	MHHCALL2	MHHCALL2_2	Luc	ctr	Luc_ctr
wash	2	MHHCALL2	MHHCALL2_2	Luc	dox	Luc_dox
wash	2	MHHCALL2	MHHCALL2_2	P5	ctr	P5_ctr
wash	2	MHHCALL2	MHHCALL2_2	P5	dox	P5_dox
wash	2	MHHCALL2	MHHCALL2_2	P5X	ctr	P5X_ctr
wash	2	MHHCALL2	MHHCALL2_2	P5X	dox	P5X_dox
no_wash	2	NALM6	NALM6_2	Luc	ctr	Luc_ctr
no_wash	2	NALM6	NALM6_2	Luc	dox	Luc_dox
no_wash	2	NALM6	NALM6_2	P5	ctr	P5_ctr
no_wash	2	NALM6	NALM6_2	P5	dox	P5_dox
no_wash	2	NALM6	NALM6_2	P5X	ctr	P5X_ctr
no_wash	2	NALM6	NALM6_2	P5X	dox	P5X_dox
wash	1	NALM6	NALM6_1	Luc	ctr	Luc_ctr
wash	1	NALM6	NALM6_1	P5	ctr	P5_ctr
wash	1	NALM6	NALM6_1	P5X	ctr	P5X_ctr
wash	1	NALM6	NALM6_1	Luc	dox	Luc_dox
wash	1	NALM6	NALM6_1	P5	dox	P5_dox
wash	1	NALM6	NALM6_1	P5X	dox	P5X_dox

My question is how can I test for the effect of different combinations of treatment and modifications on different cell lines, while accounting for the two confounders, namely PBS and cell_line_cohort?

Thanks in advance, Mohamed

deseq2 design matrix multi-confounders linear models rna-seq • 1.3k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by mohamedrefaat.1.1992 • 0

score 0 · Answer 1 · 2020-03-04

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

If a nuisance variable is confounded with another nuisance variable, you can just combine the two:

nuisance <- factor(paste(nuisance1, nuisance2))

And then just use this one variable.

ADD COMMENT • link 5.7 years ago Michael Love 43k

0

Entering edit mode

Thanks for the prompt reply, Micheal!

This means that I should do the following: 1) Combine cell_line, cohort, and PBS into one variable cell_line_cohort_PBS <- cell_line + cohort + PBS 2) Use the created variable without modifying the inital design formula. cell_line_cohort_PBS + cell_line_cohort_PBS :sample_group_simple Am I right?

ADD REPLY • link 5.7 years ago mohamedrefaat.1.1992 • 0

0

Entering edit mode

Sorry, I missed the fact that the confounding is with a condition of interest not another nuisance variable. In that case you can't really control for the nuisance variables.

ADD REPLY • link 5.7 years ago Michael Love 43k