Paired samples, two time points, two groups and adjust for a continuous variable in edgeR
2
0
Entering edit mode
Sindre ▴ 70
@sindre-6693
Last seen 8.0 years ago

I want to find correlations between change in log2(gene expression) and change in log2(VO2max) from time A1 -> B1, adjusted for group and including the fact that the samples are paired. I get this error:

 

"Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
 Design matrix not of full rank.  The following coefficients not estimable:
 ID79"

 

Is this due to some linear dependencies? How get I get the results I want?

 

VO2max <- as.numeric(pData.myoglu[ 1:52, 6]) # (Random numbers as I can't post the original here)

Time <- factor(pData.myoglu[ 1:52, 4], levels=c("A1", "B1"))

Status <- factor(pData.myoglu[ 1:52, 2], levels=c("control", "IGT"))

ID <- factor(pData.myoglu[ 1:52, 1])

design <- model.matrix(~ 0 + log2(VO2max) + Status + Time + ID)

 

Here is the design (not enough space to paste):

http://imgur.com/4H9TUS2

edger limma • 2.2k views
ADD COMMENT
1
Entering edit mode

It would be more concise and helpful to show us the three factors rather than the design matrix:

data.frame(Status,Time,ID)
ADD REPLY
0
Entering edit mode

Here is data.frame(Status,Time,ID):

 

Status Time ID
IGT A1 2
IGT A1 4
IGT A1 17
IGT A1 21
IGT A1 53
IGT A1 55
IGT A1 61
IGT A1 62
IGT A1 67
IGT A1 73
IGT A1 76
IGT A1 77
IGT A1 79
control A1 1
control A1 8
control A1 13
control A1 70
control A1 71
control A1 72
control A1 75
control A1 81
control A1 82
control A1 83
control A1 86
control A1 87
control A1 88
IGT B1 2
IGT B1 4
IGT B1 17
IGT B1 21
IGT B1 53
IGT B1 55
IGT B1 61
IGT B1 62
IGT B1 67
IGT B1 73
IGT B1 76
IGT B1 77
IGT B1 79
control B1 1
control B1 8
control B1 13
control B1 70
control B1 71
control B1 72
control B1 75
control B1 81
control B1 82
control B1 83
control B1 86
control B1 87
control B1 88
ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 14 hours ago
The city by the bay

Yes, it is because of linear dependencies. There's no way to tell what exactly the problem is, because you don't show what the experimental design actually is. I'd suspect that it is something to do with the three factors, i.e., time or status or ID, and how they correspond to each other. As a hypothetical example, if all samples at a certain time also had the same status, then there'd be no point putting both in the design matrix.

ADD COMMENT
0
Entering edit mode

I included the three factors!

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

You have what is called a multi-level design in the limma and edgeR User Guides. You haven't told us what 'ID" means, but let's assume that ID corresponds to patient, for the sake of argument. The study has one within-patient factor (Time) and one between-patient factor (Status). You can't just add everything together in a linear model because ID is confounded with Status. Have a read of the User Guides.

All my comments about multi-level designs remain true whether or not the continuous variable log2(VO2max) is included in the model. But let me add that it is unusual for a continuous variable to be at all sensible in the paired design. You are basically trying to fit too many things at once to the same data. I suspect that you would to do better to remove either or VO2max or ID from the model.

Whenever you add a factor or covariate to the linear model, it means that all other coefficients in the model are adjusted for that factor. When you add ID to the model, it means that all the other coefficients can only be estimated from the within-patient comparisons. Since the same patient always has the same status, it becomes impossible to test any difference between IGT vs controls.

ADD COMMENT
0
Entering edit mode

Ok, thank you. I want to get the genes correlating with VO2max. Specifically, what change in gene expression correlate with the change in VO2max from time 1 to 2.

You are correct about the ID. I will re-read the user Guides once more.

Thank you!

ADD REPLY
1
Entering edit mode

If you want the overall correlation between VO2max and expression, then just use

design <- model.matrix(~log2(VO2max))

If you want to measure only the correlation within the status groups (this usually will be smaller) then:

design <- model.matrix(~Status+log2(VO2max))

If you want to measure the correlation separately for the two groups then:

design <- model.matrix(~Status+Status:log2(VO2max))

If you want to check that the VO2 and expression change in the same direction between the two times for individual patients, then:

design <- model.matrix(~ID+log2(VO2max))

Finally, if you want to correlate VO2 and expression for individual patients, and check whether this is different for the two groups of patients, then:

design <- model.matrix(~ID+Status:log2(VO2max))
ADD REPLY
0
Entering edit mode

Thank you very much! This is indeed what I was looking for. 

ADD REPLY
0
Entering edit mode

Hello,

Just hoping to ask a clarification question- your fourth model matrix above checks whether VO2max and expression change in the same direction- does it also check for genes whose expression changes in the opposite direction? I.e., if expression of a gene decreases between the two time points while VO2max goes up, will that gene show up as significant, just with a negative fold change/estimate?

Thank you, and thanks for the original question and answer!

ADD REPLY

Login before adding your answer.

Traffic: 525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6