RNA-seq analysis edgeR
1
1
Entering edit mode
es874 ▴ 10
@es874-11802
Last seen 6.0 years ago

I have an experiment with 30 samples that I aligned using STAR and created a counts file using featureCounts. I'm struggling to properly set up the design matrix as cursory analysis shows a clear donor effect in the MDS plot.

 Sample Donor Group 1 1 Baseline 2 1 IL2 3 1 IL2ZA 4 1 IL2OKT3 5 1 IL2OKT3ZA 6 2 Baseline 7 2 IL2 8 2 IL2ZA 9 2 IL2OKT3 10 2 IL2OKT3ZA 11 3 Baseline 12 3 IL2 13 3 IL2ZA 14 3 IL2OKT3 15 3 IL2OKT3ZA 16 4 Baseline 17 4 IL2 18 4 IL2ZA 19 4 IL2OKT3 20 4 IL2OKT3ZA 21 5 Baseline 22 5 IL2 23 5 IL2ZA 24 5 IL2OKT3 25 5 IL2OKT3ZA 26 6 Baseline 27 6 IL2 28 6 IL2ZA 29 6 IL2OKT3 30 6 IL2OKT3ZA

I've set up the following using edgeR, but I'm not sure that this design is properly accounting for the donor effect I see in the MDS plot:

names(read.counts) <- c("Donor1_T0", "Donor1_IL2", "Donor1_IL2ZA", "Donor1_IL2OKT3", "Donor1_IL2OKT3ZA", "Donor2_T0", "Donor2_IL2", "Donor2_IL2ZA", "Donor2_IL2OKT3", "Donor2_IL2OKT3ZA", "Donor3_T0", "Donor3_IL2", "Donor3_IL2ZA", "Donor3_IL2OKT3", "Donor3_IL2OKT3ZA", "Donor4_T0", "Donor4_IL2", "Donor4_IL2ZA", "Donor4_IL2OKT3", "Donor4_IL2OKT3ZA", "Donor5_T0", "Donor5_IL2", "Donor5_IL2ZA", "Donor5_IL2OKT3", "Donor5_IL2OKT3ZA", "Donor6_T0", "Donor6_IL2", "Donor6_IL2ZA", "Donor6_IL2OKT3", "Donor6_IL2OKT3ZA")

sample.type <- factor(c ("T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA","T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA", "T0", "IL2", "IL2ZA", "IL2OKT3", "IL2OKT3ZA"))

sample.type <- relevel(sample.type, "T0")
edgeR.DGElist <- DGEList(counts = read.counts , group = sample.type)
keep <- rowSums( cpm(edgeR.DGElist) > 1) >= 6
edgeR.DGElist <- edgeR.DGElist[keep ,]
edgeR.DGElist$samples$lib.size <- colSums(edgeR.DGElist$counts) edgeR.DGElist <- calcNormFactors(edgeR.DGElist , method = "TMM") design <- model.matrix(~0+group, data=edgeR.DGElist$samples)
colnames(design) <- levels(edgeR.DGElist$samples$group)

Does this look correct?

rna-seq edger • 890 views
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 2.3 years ago
Scripps Research, La Jolla, CA

If you want to account for Donor as a batch effect, all you need to do is add it to the design formula in the second to last line of your code. Use ~0+Group+Donor as your design. You will get the same columns in your design matrix representing the groups, plus some additional columns representing the differences between donors.

Note that when you do this, the last line of your code will no longer work as intended. Instead, I recommend you remove the "group" prefix from the column names by doing this:

colnames(design) <- sub("^Group", "", colnames(design))
0
Entering edit mode

If I create,

donor <- factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6))

And modify the design matrix,

design <- model.matrix(~0+group+donor, data=edgeR.DGElist$samples) colnames(design) <- sub("^Group", "", colnames(design)) I don't see donor1 as a group in the matrix. Am I missing something? groupT0 groupIL2 groupIL2OKT3 groupIL2OKT3ZA groupIL2ZA donor2 donor3 donor4 donor5 1 1 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 3 0 0 0 0 1 0 0 0 0 4 0 0 1 0 0 0 0 0 0 5 0 0 0 1 0 0 0 0 0 6 1 0 0 0 0 1 0 0 0 7 0 1 0 0 0 1 0 0 0 8 0 0 0 0 1 1 0 0 0 9 0 0 1 0 0 1 0 0 0 10 0 0 0 1 0 1 0 0 0 11 1 0 0 0 0 0 1 0 0 12 0 1 0 0 0 0 1 0 0 13 0 0 0 0 1 0 1 0 0 14 0 0 1 0 0 0 1 0 0 15 0 0 0 1 0 0 1 0 0 16 1 0 0 0 0 0 0 1 0 17 0 1 0 0 0 0 0 1 0 18 0 0 0 0 1 0 0 1 0 19 0 0 1 0 0 0 0 1 0 20 0 0 0 1 0 0 0 1 0 21 1 0 0 0 0 0 0 0 1 22 0 1 0 0 0 0 0 0 1 23 0 0 0 0 1 0 0 0 1 24 0 0 1 0 0 0 0 0 1 25 0 0 0 1 0 0 0 0 1 26 1 0 0 0 0 0 0 0 0 27 0 1 0 0 0 0 0 0 0 28 0 0 0 0 1 0 0 0 0 29 0 0 1 0 0 0 0 0 0 30 0 0 0 1 0 0 0 0 0 donor6 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 16 0 17 0 18 0 19 0 20 0 21 0 22 0 23 0 24 0 25 0 26 1 27 1 28 1 29 1 30 1 attr(,"assign") [1] 1 1 1 1 1 2 2 2 2 2 attr(,"contrasts") attr(,"contrasts")$group
[1] "contr.treatment"

attr(,"contrasts")$donor [1] "contr.treatment" ADD REPLY 0 Entering edit mode You have it correct. Donor 1 is implicitly included in the design because the donor coefficients represent the difference between Donor 1 and the other donors. ADD REPLY 0 Entering edit mode OK, thanks. After I estimate dispersions and test for DE: edgeR.DGElist <- estimateDisp(edgeR.DGElist, design, robust=TRUE) fit <- glmQLFit(edgeR.DGElist, design, robust=TRUE)  my.contrasts <- makeContrasts(IL2vsT0=groupIL2-groupT0, IL2OKT3vsT0=groupIL2OKT3-groupT0, IL2OKT3ZAvsT0=groupIL2OKT3ZA-groupT0, IL2ZAvsT0=groupIL2ZA-groupT0, IL2OKT3vsIL2=groupIL2OKT3-groupIL2, IL2ZAvsIL2=groupIL2ZA-groupIL2, IL2OKT3ZAvsIL2OKT3=groupIL2OKT3ZA-groupIL2OKT3, IL2OKT3ZAvsIL2ZA=groupIL2OKT3ZA-groupIL2ZA, levels=design) res.IL2vsT0 <- glmQLFTest(fit, contrast=my.contrasts[,"IL2vsT0"]) #IL2 minus baseline topTags(res.IL2vsT0) Do I need to also include a contrast argument between donors? ADD REPLY 0 Entering edit mode That's a question to ask yourself. Do you care about differential expression between donors? If so, then yes, you should should assess differential expression between donors. ADD REPLY 0 Entering edit mode Right, I don't, I just want to make sure the donor effect I was seeing in the MDS plot is accounted for in the model: design <- model.matrix(~0+group+donor, data=edgeR.DGElist$samples)

If I remove the 0+ from the design, will all other columns be relative to the first column ("groupT0")? This is the baseline control, and my thought is that this is the best design since I only want to compare treatment groups and remove the baseline from all samples.

1
Entering edit mode

Yes, if you don't include the "0+", that is what you will get: an intercept column representing the expression level in the baseline group, and then a column for each other group representing the difference between that group and the baseline. This is an equally valid approach, and should yield identical p-values to the contrast-based approach that you described above.

0
Entering edit mode

OK. Last question and I promise to quit taking up your time :)

In the (~group + donor) model, the columns are [1] 0 1 1 1 1 2 2 2 2 2. If I want to compare treatments only, then for example, comparing IL2OKT3 vs IL2 looks like:

fit <- glmQLFit(edgeR.DGElist, contrast=c(0,1,-1,0,0,0,0,0,0,0))

Correct?

0
Entering edit mode

Without seeing the actual columns of the design matrix, I can't tell you if that contrast is correct. Just use the makeContrasts function to create your contrasts, and you don't have to worry about it.