Question

edgeR - Design matrix for comparisons of all and subsets of samples

0

Entering edit mode

Geo • 0

@0192d047

Last seen 9 weeks ago

Greece

Hello everyone!

I have a large dataset of patients with Systemic Lupus Erythematosus (SLE) and Healthy Controls and I would like to perform various DE comparisons using edgeR, but I am having some doubt about my design matrices.

A little more info to better clarify the situation. SLE Patients in my dataset are split into 2 categories, patients with Lupus Nephritis (LN) and non-LN patients. And patients with LN are further split into Active LN and Inactive LN patients. So the Structure looks like this:

Condition	Active
LN	Yes
LN	No
non_LN	NA
Healthy	NA

The comparisons I want to perform are:

LN vs Healthy

LN vs non-LN

Active LN vs Inactive LN

--So, my question is, what should my matrix formula and contrasts be?

Proposed design: Formula: ~ Condition + Active + 0

Model Matrix would look like:

ConditionLN	Conditionnon_LN	ConditionHealthy	ActiveYes	ActiveNo
1	0	0	1	0
1	0	0	1	0
1	0	0	0	1
0	1	0	NA	NA
0	1	0	NA	NA
0	0	1	NA	NA
0	0	1	NA	NA

Contrasts:

LN vs Healthy: c(1,0,-1,1,1)

LN vs non_LN: c(1,-1,0,1,1)

Active vs Inactive: c(0,0,0,1,-1)

Apologies for the bad post quality. It is my first post here and I could not figure out how to better lay it out.

Thank you very much in advance for all your help.

edgeR • 423 views

ADD COMMENT • link updated 3 months ago by James W. MacDonald 65k • written 3 months ago by Geo • 0

score 2 · Answer 1 · 2024-01-12

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 day ago

United States

The easy way is to follow section 3.3.1 in the edgeR User's Guide, and make a combination of your two factors and then use explicit contrasts.

ADD COMMENT • link 3 months ago James W. MacDonald 65k

0

Entering edit mode

Thank you very much for your answer! I hadn't noticed the more complex examples mentioned in the guide. I will look more into them.

ADD REPLY • link 3 months ago Geo • 0

score 0 · Answer 2 · 2024-01-29

0

Entering edit mode

Geo • 0

@0192d047

Last seen 9 weeks ago

Greece

Hi again! So, update on the analysis.

I ended up combining the columns into a single one and using the levels: LN_active, LN_inactive, non_LN and HC.

Wanting to to compare LN vs HC, I used the contrast (1,1,0,-1). I ended up getting only negative logFC values, spanning from -0.03 to -24 (which is outrageous for a logFC value). Did I not understand correctly?

Thank you very much for all your help!

ADD COMMENT • link 3 months ago Geo • 0

2

Entering edit mode

Please don't ask another question using the ADD ANSWER button. It's not an answer if it's a question.

A contrast is a set of indicator values that sum to zero. Yours sums to 1, so is not a contrast. Instead you are asking if the sum of two coefficients is larger than another coefficient, which doesn't make sense. Ideally you would use makeContrasts which automates that sort of thing, and you would also indicate that you want the mean of the LN groups instead of the sum.

That said, you are getting results that don't make sense, particularly for the contrast you say is resulting in those results. Without any code I can't say for sure, but I would bet that the contrast you really did was something with one 1 and two -1, which would result in all negative logFC values.

ADD REPLY • link 3 months ago James W. MacDonald 65k