Question

Splitting a dataset and control sample batch

0

Entering edit mode

mat.lesche ▴ 110

@matlesche-6835

Last seen 9 weeks ago

Germany

The experiment design is the following. There are two groups (NR and RE) and each comes in triplicates. These triplicates were grown on a dish. At timepoint 0 (T0) samples were collected. Afterwards, they were treated (RAP) or not (Co) and at timepoint 6 (T6), samples were collected again. Here is the condition table and a PCA (with and without samples correction) for the overview.

Donor

Cond

Time

C

17-0117

NR_Co

T0

T0_NR_Co

17-0077

NR_Co

T0

T0_NR_Co

15-0019

NR_Co

T0

T0_NR_Co

14-0162

RE_Co

T0

T0_RE_Co

16-0384

RE_Co

T0

T0_RE_Co

16-0343

RE_Co

T0

T0_RE_Co

17-0117

NR_Co

T6

T6_NR_Co

17-0077

NR_Co

T6

T6_NR_Co

15-0019

NR_Co

T6

T6_NR_Co

14-0162

RE_Co

T6

T6_RE_Co

16-0384

RE_Co

T6

T6_RE_Co

16-0343

RE_Co

T6

T6_RE_Co

17-0117

NR_RAP

T6

T6_NR_RAP

17-0077

NR_RAP

T6

T6_NR_RAP

15-0019

NR_RAP

T6

T6_NR_RAP

14-0162

RE_RAP

T6

T6_RE_RAP

16-0384

RE_RAP

T6

T6_RE_RAP

16-0343

RE_RAP

T6

T6_RE_RAP

https://ibb.co/eeqc2U

https://ibb.co/d0xDbp

https://ibb.co/bRCDbp

For the following questions I decided to do a grouping of Cond and Time because it’s best to answer the following questions

dds$C ← merge(dds$Time, dds$Cond)

a) Are there no differences between T6_NR_Co and T6_NR_RAP?

b) Are there no differences between T6_RE_Co and T6_RE_RAP?

c) Are there no differences between T6_NR_Co and T6_RE_Co?

d) Are there no differences between T6_NR_RAP and T6_RE_RAP?

The problem here is, that I can’t control for the samples itself because the design ~Id+C causes an error “Error in checkFullRank(modelMatrix) :”.

I can only use ~ C which I think is not appropriate

Therefore I was wondering if it would be best to split the data set into RE samples and NR samples? This would make it possible to answer a) and b) and use the whole dataset for c) and d)

Just as a confirmation if I want to look at the effect of the treatment on the two groups I would need to use the interaction design: Type + Treatment + Type:Treatment

And my last question is for the following

If I do a comparison of

e) T6_NR_Co vs T0_NR_Co

f) T6_NR_RAP vs T0_NR_Co

I get about e) 1,500 genes and f) 2,000 DEGs. An overlap tells me that 50% are DE in e) and f), 15 % are only in e) and the rest in f). But a comparison of T6_NR_RAP vs T6_NR_Co gives me 0 DEGs which means there is no difference between RAP and Co for T6, even though e) and f) show DEGs. I have to mention as well that for T6_NR_RAP vs T6_NR_Co the pvalue histogram shows a curve towards 1 and the padj values are all identical being close to 1.

What would be the best design and contrast to ask for differences that only come from RAP over time? Could I only use the 35% from the overlap between e) and f), even though these are not DEGs for T6_NR_RAP vs T6_NR_Co.

Thanks

Mathias

deseq2 interactions batch effect correction grouping variable • 1.3k views

ADD COMMENT • link updated 5.8 years ago by Michael Love 42k • written 5.8 years ago by mat.lesche ▴ 110

0

Entering edit mode

Can you explain what you mean by "control for the samples itself"? You mean controlling for donor as listed above?

ADD REPLY • link 5.8 years ago Michael Love 42k

0

Entering edit mode

Yes. Sorry I meant Donor and not samples. The comparison T6_RE_Co vs T6_NR_RAP should need a the design Donor + C because the same Donors are in both Treatments.

ADD REPLY • link 5.8 years ago mat.lesche ▴ 110

score 0 · Answer 1 · 2018-10-03

hi Mathias,

To help see the structure, I recoded the levels of these factors:

   Donor Group   Cond Time
1      1     1  NR_Co   T0
2      2     1  NR_Co   T0
3      3     1  NR_Co   T0
4      4     2  RE_Co   T0
5      5     2  RE_Co   T0
6      6     2  RE_Co   T0
7      1     3  NR_Co   T6
8      2     3  NR_Co   T6
9      3     3  NR_Co   T6
10     4     4  RE_Co   T6
11     5     4  RE_Co   T6
12     6     4  RE_Co   T6
13     1     5 NR_RAP   T6
14     2     5 NR_RAP   T6
15     3     5 NR_RAP   T6
16     4     6 RE_RAP   T6
17     5     6 RE_RAP   T6
18     6     6 RE_RAP   T6

Now you can more easily see why you can't have donor and group in the design together, because they are linearly dependent. For example, Group 2 + 5 + 6 = Donor 4 + 5 + 6.

There are some comparisons you can make, with fixed effects, for example comparing group 3 to group 1 while controlling for Donor, or additionally, comparing the group 3 vs 1 effect and the group 4 vs 2 effect.

However, some of your desired comparisons are not possible with fixed effects, while controlling for donor, e.g. comparing group 2 to group 1.

I'd recommend you use limma-voom and the duplicateCorrelation, which will allow you to mark which samples belong to which donor, and analyze the entire dataset for all your desired contrasts.