Question: SVA for repeated measures design
0
gravatar for sara.blocquiaux
6 weeks ago by
sara.blocquiaux0 wrote:

Hi all,

I have RNA-seq data (5 subjects are measured on 4 time points) and would like to do a SVA first to be able to include potential confounders into the statistical model (Deseq2 pipeline).

I am having troubles how to define my null and full model in the SVA:

Full model ~ TIME + SUBJECT.ID

Null model ~ SUBJECT.ID OR Null model ~ 1

Should the subjects ID be treated as a factor of interest or as a confounding factor?

Thanks in advance!

Best,

Sara

sva deseq2 repeated measures • 134 views
ADD COMMENTlink modified 5 weeks ago by Robert Castelo2.3k • written 6 weeks ago by sara.blocquiaux0
Answer: SVA for repeated measures design
1
gravatar for James W. MacDonald
5 weeks ago by
United States
James W. MacDonald51k wrote:

You should probably use just an intercept for your null model. In general, if you have repeated measures (which I assume you do, given the subject ID), AND given that you have complete repeated measures (where you have measurements from each subject at each time), then the subject-specific changes are orthogonal to the measure of interest, and blocking on subject is the way to go. It also makes it easier to interpret your coefficients.

Put a different way, sva is intended to generate surrogate variables for unobserved variability. The subjects are by definition observed, so if you wanted to use the sva package to do something with them, you could consider them to be batch effects and use ComBat (note that I am not advocating this, but just noting that sva is for things you don't observe and ComBat is for things you know about.)

ADD COMMENTlink written 5 weeks ago by James W. MacDonald51k

Thanks James.

Yes, I have repeated measures. Subject.ID is not my factor of interest, but of course I want to include it in the model. For one subject, I have two missing time points though.

I do not want ComBat to correct for Subject.ID, but rather want SVAseq to find confounding factors (other than Time and Subject.ID). The design model I intend to use in deseq is: ~Subject + SV1 + ... + Time.

I will use the null model ~1 in SVAseq, as suggested.

ADD REPLYlink written 5 weeks ago by sara.blocquiaux0
Answer: SVA for repeated measures design
0
gravatar for Robert Castelo
5 weeks ago by
Robert Castelo2.3k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.3k wrote:

Hi,

I would say the answer is to include SUBJECT.ID in the null model because, as argued by Jeff Leek, author of SVA, in this thread about a similar design case, SUBJECT.ID will be used in the ultimate linear model you intend to fit to test for the effect of your variable of interest.

cheers,

robert.

ADD COMMENTlink written 5 weeks ago by Robert Castelo2.3k

I agree, that was what I was thinking at first.

But subject.ID is not just a covariate, it is a random factor. So it is still not clear to me whether to include it in the null model or not. Not including it in the null model, makes it kind of a variable of interest itself.

ADD REPLYlink written 5 weeks ago by sara.blocquiaux0
1

If SUBJECT.ID is a random factor, then you should not put it into the design matrix and use duplicateCorrelation() and the arguments correlation and block in the call to lmFit(); see section on Multi-level experiments from the limma User's Guide. If you don't need surrogate variables, then you can just follow that documentation.

The complication comes when you want to combine it with surrogate variables estimated with SVA. You can try to have a full model with TIME only and the null with the intercept. Then, estimate surrogate variables, paste them into the design matrix and proceed with the duplicateCorrelation() blocking on SUBJECT.ID. However, it may happen that SVA has already estimated part of the SUBJECT.ID variablity and this may lead to problems with duplicateCorrelation(); see this thread about that possibility. So, I'd suggest to include SUBJECT.ID in the full and null models that you give to SVA (next to TIME), just to ensure that the SUBJECT.ID variability is not picked up by SVA. Then, place TIME and the surrogate variables in a new design matrix, i.e., without SUBJECT.ID, and proceed with duplicateCorrelation() blocking on SUBJECT.ID.

ADD REPLYlink written 5 weeks ago by Robert Castelo2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 219 users visited in the last hour