Question

RNA-seq analysis edgeR

1

Entering edit mode

es874 ▴ 20

@es874-11802

Last seen 7.4 years ago

I have an experiment with 30 samples that I aligned using STAR and created a counts file using featureCounts. I'm struggling to properly set up the design matrix as cursory analysis shows a clear donor effect in the MDS plot.

Sample	Donor	Group
1	1	Baseline
2	1	IL2
3	1	IL2ZA
4	1	IL2OKT3
5	1	IL2OKT3ZA
6	2	Baseline
7	2	IL2
8	2	IL2ZA
9	2	IL2OKT3
10	2	IL2OKT3ZA
11	3	Baseline
12	3	IL2
13	3	IL2ZA
14	3	IL2OKT3
15	3	IL2OKT3ZA
16	4	Baseline
17	4	IL2
18	4	IL2ZA
19	4	IL2OKT3
20	4	IL2OKT3ZA
21	5	Baseline
22	5	IL2
23	5	IL2ZA
24	5	IL2OKT3
25	5	IL2OKT3ZA
26	6	Baseline
27	6	IL2
28	6	IL2ZA
29	6	IL2OKT3
30	6	IL2OKT3ZA

I've set up the following using edgeR, but I'm not sure that this design is properly accounting for the donor effect I see in the MDS plot:

names(read.counts) <- c("Donor1_T0", "Donor1_IL2", "Donor1_IL2ZA", "Donor1_IL2OKT3", "Donor1_IL2OKT3ZA", "Donor2_T0", "Donor2_IL2", "Donor2_IL2ZA", "Donor2_IL2OKT3", "Donor2_IL2OKT3ZA", "Donor3_T0", "Donor3_IL2", "Donor3_IL2ZA", "Donor3_IL2OKT3", "Donor3_IL2OKT3ZA", "Donor4_T0", "Donor4_IL2", "Donor4_IL2ZA", "Donor4_IL2OKT3", "Donor4_IL2OKT3ZA", "Donor5_T0", "Donor5_IL2", "Donor5_IL2ZA", "Donor5_IL2OKT3", "Donor5_IL2OKT3ZA", "Donor6_T0", "Donor6_IL2", "Donor6_IL2ZA", "Donor6_IL2OKT3", "Donor6_IL2OKT3ZA") 

sample.type <- factor(c ("T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA", "T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA"))

sample.type <- relevel(sample.type, "T0")
edgeR.DGElist <- DGEList(counts = read.counts , group = sample.type)
keep <- rowSums( cpm(edgeR.DGElist) > 1) >= 6
edgeR.DGElist <- edgeR.DGElist[keep ,]
edgeR.DGElist$samples$lib.size <- colSums(edgeR.DGElist$counts)
edgeR.DGElist <- calcNormFactors(edgeR.DGElist , method = "TMM")
design <- model.matrix(~0+group, data=edgeR.DGElist$samples)
colnames(design) <- levels(edgeR.DGElist$samples$group)

Does this look correct?

rna-seq edger • 1.4k views

ADD COMMENT • link updated 7.4 years ago by Ryan C. Thompson ★ 7.9k • written 7.4 years ago by es874 ▴ 20

score 2 · Answer 1 · 2016-11-15

2

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

If you want to account for Donor as a batch effect, all you need to do is add it to the design formula in the second to last line of your code. Use ~0+Group+Donor as your design. You will get the same columns in your design matrix representing the groups, plus some additional columns representing the differences between donors.

Note that when you do this, the last line of your code will no longer work as intended. Instead, I recommend you remove the "group" prefix from the column names by doing this:

colnames(design) <- sub("^Group", "", colnames(design))

ADD COMMENT • link 7.4 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

If I create,

donor <- factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6))

And modify the design matrix,

design <- model.matrix(~0+group+donor, data=edgeR.DGElist$samples)

colnames(design) <- sub("^Group", "", colnames(design))

I don't see donor1 as a group in the matrix. Am I missing something?

groupT0 groupIL2 groupIL2OKT3 groupIL2OKT3ZA groupIL2ZA donor2 donor3 donor4 donor5
1        1        0            0              0          0      0      0      0      0
2        0        1            0              0          0      0      0      0      0
3        0        0            0              0          1      0      0      0      0
4        0        0            1              0          0      0      0      0      0
5        0        0            0              1          0      0      0      0      0
6        1        0            0              0          0      1      0      0      0
7        0        1            0              0          0      1      0      0      0
8        0        0            0              0          1      1      0      0      0
9        0        0            1              0          0      1      0      0      0
10       0        0            0              1          0      1      0      0      0
11       1        0            0              0          0      0      1      0      0
12       0        1            0              0          0      0      1      0      0
13       0        0            0              0          1      0      1      0      0
14       0        0            1              0          0      0      1      0      0
15       0        0            0              1          0      0      1      0      0
16       1        0            0              0          0      0      0      1      0
17       0        1            0              0          0      0      0      1      0
18       0        0            0              0          1      0      0      1      0
19       0        0            1              0          0      0      0      1      0
20       0        0            0              1          0      0      0      1      0
21       1        0            0              0          0      0      0      0      1
22       0        1            0              0          0      0      0      0      1
23       0        0            0              0          1      0      0      0      1
24       0        0            1              0          0      0      0      0      1
25       0        0            0              1          0      0      0      0      1
26       1        0            0              0          0      0      0      0      0
27       0        1            0              0          0      0      0      0      0
28       0        0            0              0          1      0      0      0      0
29       0        0            1              0          0      0      0      0      0
30       0        0            0              1          0      0      0      0      0
   donor6
1       0
2       0
3       0
4       0
5       0
6       0
7       0
8       0
9       0
10      0
11      0
12      0
13      0
14      0
15      0
16      0
17      0
18      0
19      0
20      0
21      0
22      0
23      0
24      0
25      0
26      1
27      1
28      1
29      1
30      1
attr(,"assign")
 [1] 1 1 1 1 1 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"

attr(,"contrasts")$donor
[1] "contr.treatment"

ADD REPLY • link 7.4 years ago es874 ▴ 20

0

Entering edit mode

You have it correct. Donor 1 is implicitly included in the design because the donor coefficients represent the difference between Donor 1 and the other donors.

ADD REPLY • link 7.4 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

OK, thanks.

After I estimate dispersions and test for DE:

edgeR.DGElist <- estimateDisp(edgeR.DGElist, design, robust=TRUE)

fit <- glmQLFit(edgeR.DGElist, design, robust=TRUE)

my.contrasts <- makeContrasts(IL2vsT0=groupIL2-groupT0, IL2OKT3vsT0=groupIL2OKT3-groupT0, IL2OKT3ZAvsT0=groupIL2OKT3ZA-groupT0, IL2ZAvsT0=groupIL2ZA-groupT0, IL2OKT3vsIL2=groupIL2OKT3-groupIL2, IL2ZAvsIL2=groupIL2ZA-groupIL2, IL2OKT3ZAvsIL2OKT3=groupIL2OKT3ZA-groupIL2OKT3, IL2OKT3ZAvsIL2ZA=groupIL2OKT3ZA-groupIL2ZA, levels=design)

res.IL2vsT0 <- glmQLFTest(fit, contrast=my.contrasts[,"IL2vsT0"]) #IL2 minus baseline

topTags(res.IL2vsT0)

Do I need to also include a contrast argument between donors?

ADD REPLY • link 7.4 years ago es874 ▴ 20

0

Entering edit mode

That's a question to ask yourself. Do you care about differential expression between donors? If so, then yes, you should should assess differential expression between donors.

ADD REPLY • link 7.4 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Right, I don't, I just want to make sure the donor effect I was seeing in the MDS plot is accounted for in the model:

design <- model.matrix(~0+group+donor, data=edgeR.DGElist$samples)

If I remove the 0+ from the design, will all other columns be relative to the first column ("groupT0")? This is the baseline control, and my thought is that this is the best design since I only want to compare treatment groups and remove the baseline from all samples.

ADD REPLY • link 7.4 years ago es874 ▴ 20

1

Entering edit mode

Yes, if you don't include the "0+", that is what you will get: an intercept column representing the expression level in the baseline group, and then a column for each other group representing the difference between that group and the baseline. This is an equally valid approach, and should yield identical p-values to the contrast-based approach that you described above.

ADD REPLY • link 7.4 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

OK. Last question and I promise to quit taking up your time :)

In the (~group + donor) model, the columns are [1] 0 1 1 1 1 2 2 2 2 2. If I want to compare treatments only, then for example, comparing IL2OKT3 vs IL2 looks like:

fit <- glmQLFit(edgeR.DGElist, contrast=c(0,1,-1,0,0,0,0,0,0,0))

Correct?

ADD REPLY • link 7.4 years ago es874 ▴ 20

0

Entering edit mode

Without seeing the actual columns of the design matrix, I can't tell you if that contrast is correct. Just use the makeContrasts function to create your contrasts, and you don't have to worry about it.

ADD REPLY • link 7.4 years ago Ryan C. Thompson ★ 7.9k