Question: Finding the correct covariate for RNAseq experiment
gravatar for mat.lesche
23 months ago by
mat.lesche70 wrote:


I'm running an experiment with two conditions, one is the wildtype and the other is a knock out of a gene. Each sample is from a single mouse and the samples were isolated on different days. Here is an overview

Sample GT Sex Date
ss11 KO male may
ss12 KO male may
ss13 WT male may
ss14 WT male may
ss15 KO female june
ss16 WT female june
ss17 KO female june
ss18 WT male june

I ran several PCAs because the initially the samples did not cluster ( initial PCA)

Next, I checked for Sex and Date and Sex + Date as covariates and ran a PCA as well. I used removeBatchEffect from Limma and the transformed counts from DESeq2 and the PCA looked like this:

Correction for Sex,Correction for Date,Correction for Sex and Date

As one can see, if I correct for Sex PC2 is showing the difference in my condition of interest but the correction for date or date+sex brings this to the PC1.

I also ran sva with the following model:

mod  <- model.matrix(~ GT, colData(ddsrun))
mod0 <- model.matrix(~   1, colData(ddsrun))
svseq <- svaseq(dat, mod, mod0, = 2)

and the result is

  1 2
ss11 -0.30120343 0.16577893
ss12 -0.31295115 -0.05975287
ss13 -0.24305134 -0.05975287
ss14 -0.43343732 0.20823567
ss15 0.54045732 -0.28756674
ss16 0.29909472 0.23194841
ss17 0.42691938 0.50661027
ss18 0.02417181 -0.72876957

As a side question, the first column cleary show the effect for the isolation date but the second column doesn't correlate with Sex or Date. So I would not use this. Is there any good way how to interpret this? Right now, I would only try to overlap it with already know and likely covariate and not use it! Otherwise I introduce a batch which I don't know what it means???

With this, I would think that the sex doesn't have such a big impact on the data, but the date has. That means I will use  ~ Date +  GT as design formula for DESeq. Additionaly, I did an mds plot of the euclidean distance and it suggest that the data has an higher impact on the data compared to sex too.

Now I was wondering if these steps are in the correct order and my conclusion is correct? I have another experiment with the same set-up but here the GT effect is less and overlayed by Data and/or Sex



ADD COMMENTlink modified 23 months ago by Michael Love24k • written 23 months ago by mat.lesche70

Is there a question?

ADD REPLYlink written 23 months ago by James W. MacDonald50k

Hi James, now there is. Sorry but the Submit was to early. There used to be a preview button. I realised that submit is creating the thread before it was too late.

ADD REPLYlink written 23 months ago by mat.lesche70
Answer: Finding the correct covariate for RNAseq experiment
gravatar for Michael Love
23 months ago by
Michael Love24k
United States
Michael Love24k wrote:


I usually recommend people to put into the model those terms that they think might affect gene expression, even if only for some genes, and so long as they have an experimental design that allows it. You have nearly perfect confounding of date and sex, so you can pick the one that has a bigger effect in the PCA plots and put that one in. You can't do much more than that due to the confounding. So I'd also use the design you have suggested, ~date + genotype.

ADD COMMENTlink written 23 months ago by Michael Love24k

Thanks Michael.

ADD REPLYlink written 23 months ago by mat.lesche70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 214 users visited in the last hour