Question

Finding the correct covariate for RNAseq experiment

0

Entering edit mode

mat.lesche ▴ 90

@matlesche-6835

Last seen 23 months ago

Germany

Hey,

I'm running an experiment with two conditions, one is the wildtype and the other is a knock out of a gene. Each sample is from a single mouse and the samples were isolated on different days. Here is an overview

Sample	GT	Sex	Date
ss11	KO	male	may
ss12	KO	male	may
ss13	WT	male	may
ss14	WT	male	may
ss15	KO	female	june
ss16	WT	female	june
ss17	KO	female	june
ss18	WT	male	june

I ran several PCAs because the initially the samples did not cluster ( )

Next, I checked for Sex and Date and Sex + Date as covariates and ran a PCA as well. I used removeBatchEffect from Limma and the transformed counts from DESeq2 and the PCA looked like this:

,,

As one can see, if I correct for Sex PC2 is showing the difference in my condition of interest but the correction for date or date+sex brings this to the PC1.

I also ran sva with the following model:

mod  <- model.matrix(~ GT, colData(ddsrun))
mod0 <- model.matrix(~   1, colData(ddsrun))
svseq <- svaseq(dat, mod, mod0, n.sv = 2)

and the result is

	1	2
ss11	-0.30120343	0.16577893
ss12	-0.31295115	-0.05975287
ss13	-0.24305134	-0.05975287
ss14	-0.43343732	0.20823567
ss15	0.54045732	-0.28756674
ss16	0.29909472	0.23194841
ss17	0.42691938	0.50661027
ss18	0.02417181	-0.72876957

As a side question, the first column cleary show the effect for the isolation date but the second column doesn't correlate with Sex or Date. So I would not use this. Is there any good way how to interpret this? Right now, I would only try to overlap it with already know and likely covariate and not use it! Otherwise I introduce a batch which I don't know what it means???

With this, I would think that the sex doesn't have such a big impact on the data, but the date has. That means I will use ~ Date + GT as design formula for DESeq. Additionaly, I did an mds plot of the euclidean distance and it suggest that the data has an higher impact on the data compared to sex too.

Now I was wondering if these steps are in the correct order and my conclusion is correct? I have another experiment with the same set-up but here the GT effect is less and overlayed by Data and/or Sex

Thanks

Mathias

deseq2 removebatcheffect() sva differential gene expression • 1.9k views

ADD COMMENT • link updated 6.7 years ago by Michael Love 41k • written 6.7 years ago by mat.lesche ▴ 90

0

Entering edit mode

Is there a question?

ADD REPLY • link 6.7 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James, now there is. Sorry but the Submit was to early. There used to be a preview button. I realised that submit is creating the thread before it was too late.

ADD REPLY • link 6.7 years ago mat.lesche ▴ 90

score 1 · Answer 1 · 2017-08-08

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 24 minutes ago

United States

hi,

I usually recommend people to put into the model those terms that they think might affect gene expression, even if only for some genes, and so long as they have an experimental design that allows it. You have nearly perfect confounding of date and sex, so you can pick the one that has a bigger effect in the PCA plots and put that one in. You can't do much more than that due to the confounding. So I'd also use the design you have suggested, ~date + genotype.