Question: Paired samples, two time points, two groups and adjust for a continuous variable in edgeR
0
3.5 years ago by
Sindre70
Sindre70 wrote:

I want to find correlations between change in log2(gene expression) and change in log2(VO2max) from time A1 -> B1, adjusted for group and including the fact that the samples are paired. I get this error:

"Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :
Design matrix not of full rank.  The following coefficients not estimable:
ID79"

Is this due to some linear dependencies? How get I get the results I want?

VO2max <- as.numeric(pData.myoglu[ 1:52, 6]) # (Random numbers as I can't post the original here)

Time <- factor(pData.myoglu[ 1:52, 4], levels=c("A1", "B1"))

Status <- factor(pData.myoglu[ 1:52, 2], levels=c("control", "IGT"))

ID <- factor(pData.myoglu[ 1:52, 1])

design <- model.matrix(~ 0 + log2(VO2max) + Status + Time + ID)

Here is the design (not enough space to paste):

http://imgur.com/4H9TUS2

limma edger • 788 views
modified 3.5 years ago by Gordon Smyth38k • written 3.5 years ago by Sindre70
1

It would be more concise and helpful to show us the three factors rather than the design matrix:

data.frame(Status,Time,ID)

Here is data.frame(Status,Time,ID):

 Status Time ID IGT A1 2 IGT A1 4 IGT A1 17 IGT A1 21 IGT A1 53 IGT A1 55 IGT A1 61 IGT A1 62 IGT A1 67 IGT A1 73 IGT A1 76 IGT A1 77 IGT A1 79 control A1 1 control A1 8 control A1 13 control A1 70 control A1 71 control A1 72 control A1 75 control A1 81 control A1 82 control A1 83 control A1 86 control A1 87 control A1 88 IGT B1 2 IGT B1 4 IGT B1 17 IGT B1 21 IGT B1 53 IGT B1 55 IGT B1 61 IGT B1 62 IGT B1 67 IGT B1 73 IGT B1 76 IGT B1 77 IGT B1 79 control B1 1 control B1 8 control B1 13 control B1 70 control B1 71 control B1 72 control B1 75 control B1 81 control B1 82 control B1 83 control B1 86 control B1 87 control B1 88
Answer: Paired samples, two time points, two groups and adjust for a continuous variable
2
3.5 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Yes, it is because of linear dependencies. There's no way to tell what exactly the problem is, because you don't show what the experimental design actually is. I'd suspect that it is something to do with the three factors, i.e., time or status or ID, and how they correspond to each other. As a hypothetical example, if all samples at a certain time also had the same status, then there'd be no point putting both in the design matrix.

I included the three factors!

Answer: Paired samples, two time points, two groups and adjust for a continuous variable
1
3.5 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

You have what is called a multi-level design in the limma and edgeR User Guides. You haven't told us what 'ID" means, but let's assume that ID corresponds to patient, for the sake of argument. The study has one within-patient factor (Time) and one between-patient factor (Status). You can't just add everything together in a linear model because ID is confounded with Status. Have a read of the User Guides.

All my comments about multi-level designs remain true whether or not the continuous variable log2(VO2max) is included in the model. But let me add that it is unusual for a continuous variable to be at all sensible in the paired design. You are basically trying to fit too many things at once to the same data. I suspect that you would to do better to remove either or VO2max or ID from the model.

Whenever you add a factor or covariate to the linear model, it means that all other coefficients in the model are adjusted for that factor. When you add ID to the model, it means that all the other coefficients can only be estimated from the within-patient comparisons. Since the same patient always has the same status, it becomes impossible to test any difference between IGT vs controls.

Ok, thank you. I want to get the genes correlating with VO2max. Specifically, what change in gene expression correlate with the change in VO2max from time 1 to 2.

You are correct about the ID. I will re-read the user Guides once more.

Thank you!

1

If you want the overall correlation between VO2max and expression, then just use

design <- model.matrix(~log2(VO2max))

If you want to measure only the correlation within the status groups (this usually will be smaller) then:

design <- model.matrix(~Status+log2(VO2max))

If you want to measure the correlation separately for the two groups then:

design <- model.matrix(~Status+Status:log2(VO2max))

If you want to check that the VO2 and expression change in the same direction between the two times for individual patients, then:

design <- model.matrix(~ID+log2(VO2max))

Finally, if you want to correlate VO2 and expression for individual patients, and check whether this is different for the two groups of patients, then:

design <- model.matrix(~ID+Status:log2(VO2max))

Thank you very much! This is indeed what I was looking for.

Hello,

Just hoping to ask a clarification question- your fourth model matrix above checks whether VO2max and expression change in the same direction- does it also check for genes whose expression changes in the opposite direction? I.e., if expression of a gene decreases between the two time points while VO2max goes up, will that gene show up as significant, just with a negative fold change/estimate?

Thank you, and thanks for the original question and answer!