SVA for repeated measures design
Entering edit mode
Last seen 2.6 years ago

Hi all,

I have RNA-seq data (5 subjects are measured on 4 time points) and would like to do a SVA first to be able to include potential confounders into the statistical model (Deseq2 pipeline).

I am having troubles how to define my null and full model in the SVA:

Full model ~ TIME + SUBJECT.ID

Null model ~ SUBJECT.ID OR Null model ~ 1

Should the subjects ID be treated as a factor of interest or as a confounding factor?

Thanks in advance!



SVA deseq2 repeated measures • 652 views
Entering edit mode
Last seen 16 hours ago
United States

You should probably use just an intercept for your null model. In general, if you have repeated measures (which I assume you do, given the subject ID), AND given that you have complete repeated measures (where you have measurements from each subject at each time), then the subject-specific changes are orthogonal to the measure of interest, and blocking on subject is the way to go. It also makes it easier to interpret your coefficients.

Put a different way, sva is intended to generate surrogate variables for unobserved variability. The subjects are by definition observed, so if you wanted to use the sva package to do something with them, you could consider them to be batch effects and use ComBat (note that I am not advocating this, but just noting that sva is for things you don't observe and ComBat is for things you know about.)

Entering edit mode

Thanks James.

Yes, I have repeated measures. Subject.ID is not my factor of interest, but of course I want to include it in the model. For one subject, I have two missing time points though.

I do not want ComBat to correct for Subject.ID, but rather want SVAseq to find confounding factors (other than Time and Subject.ID). The design model I intend to use in deseq is: ~Subject + SV1 + ... + Time.

I will use the null model ~1 in SVAseq, as suggested.

Entering edit mode
Robert Castelo ★ 2.9k
Last seen 19 days ago
Barcelona/Universitat Pompeu Fabra


I would say the answer is to include SUBJECT.ID in the null model because, as argued by Jeff Leek, author of SVA, in this thread about a similar design case, SUBJECT.ID will be used in the ultimate linear model you intend to fit to test for the effect of your variable of interest.



Entering edit mode

I agree, that was what I was thinking at first.

But subject.ID is not just a covariate, it is a random factor. So it is still not clear to me whether to include it in the null model or not. Not including it in the null model, makes it kind of a variable of interest itself.

Entering edit mode

If SUBJECT.ID is a random factor, then you should not put it into the design matrix and use duplicateCorrelation() and the arguments correlation and block in the call to lmFit(); see section on Multi-level experiments from the limma User's Guide. If you don't need surrogate variables, then you can just follow that documentation.

The complication comes when you want to combine it with surrogate variables estimated with SVA. You can try to have a full model with TIME only and the null with the intercept. Then, estimate surrogate variables, paste them into the design matrix and proceed with the duplicateCorrelation() blocking on SUBJECT.ID. However, it may happen that SVA has already estimated part of the SUBJECT.ID variablity and this may lead to problems with duplicateCorrelation(); see this thread about that possibility. So, I'd suggest to include SUBJECT.ID in the full and null models that you give to SVA (next to TIME), just to ensure that the SUBJECT.ID variability is not picked up by SVA. Then, place TIME and the surrogate variables in a new design matrix, i.e., without SUBJECT.ID, and proceed with duplicateCorrelation() blocking on SUBJECT.ID.


Login before adding your answer.

Traffic: 407 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6