Question

edgeR design matrix question

0

Entering edit mode

Kemal Akat • 0

@kemal-akat-7178

Last seen 3.5 years ago

USA/New York/The Rockefeller University

Dear Bioconductor Community,

I have an RNAseq experiment and was wondering if I could get some input on my design matrix (here used for an analysis in edgeR).

Tissue samples from 7 healthy individuals were evaluated after treatment with a drug (XTC) or placebo at two time points, 1 and 2 h after treatment, for all participants, and after 6 h for 3 of the participants. For every time point, except the 6 h time point, we have matching samples from each patient.

The target frame looks like this:

Patient Treatment TimeHours P1XTC2 P1 XTC 2 P1XTC1 P1 XTC 1 P1Placebo2 P1 Placebo 2 P1Placebo1 P1 Placebo 1 P2XTC2 P2 XTC 2 P2XTC1 P2 XTC 1 P2XTC6 P2 XTC 6 P2Placebo2 P2 Placebo 2 P2Placebo1 P2 Placebo 1 P3XTC2 P3 XTC 2 P3XTC1 P3 XTC 1 P3XTC6 P3 XTC 6 P3Placebo2 P3 Placebo 2 P3Placebo1 P3 Placebo 1 P4XTC2 P4 XTC 2 P4XTC1 P4 XTC 1 P4XTC6 P4 XTC 6 P4Placebo2 P4 Placebo 2 P4Placebo1 P4 Placebo 1 P5XTC2 P5 XTC 2 P5XTC1 P5 XTC 1 P5XTC6 P5 XTC 6 P5Placebo2 P5 Placebo 2 P5Placebo1 P5 Placebo 1 P6XTC2 P6 XTC 2 P6XTC1 P6 XTC 1 P6Placebo2 P6 Placebo 2 P6Placebo1 P6 Placebo 1 P7XTC2 P7 XTC 2 P7XTC1 P7 XTC 1 P7Placebo2 P7 Placebo 2 P7Placebo1 P7 Placebo 1

The count matrix like this:

Gene P1XTC2 P1XTC1 P1Placebo2 P1Placebo1 P2XTC2 P2XTC1 P2XTC6 Gene1 130041 294709 693018 392083 271540 149505 450974 Gene2 125 473 548 251 926 327 171 Gene3 0 22 25 16 34 6 12 Gene4 56870 195167 194034 230998 140803 83530 344095 Gene5 256 540 1153 999 808 291 485 Gene6 18813 51631 101162 74585 27923 19618 68625 Gene7 19 27 69 41 88 10 27 Gene8 7817 18459 33485 20042 22944 9367 22605 Gene9 776 1449 2652 1985 1303 911 1448 Gene10 5089 9470 24026 13697 11781 4119 10764

Now, fitting and testing the model:

> library(edgeR) > Group = factor(paste(targets$Treatment, targets$TimeHours, sep = ".")) > Group = relevel(Group, ref = "Placebo.1") > cbind(targets, Group) > d = DGEList(counts, group = Group) > d = calcNormFactors(d) > cpm_d = cpm(d, normalized.lib.sizes = TRUE) > d = d[rowSums(cpm_d > 1) >= 2, ] > d = estimateDisp(d) > patient = factor(targets$Patient) > design = model.matrix(~ 0 + patient + Group) > colnames(design) = gsub("patient|Group", "", colnames(design)) > fit = glmFit(d, design)

If defined like this, the comparisons are based on the 1 h placebo (Placebo.1),
which as expected does not appear as a coefficient in the design matrix.

> colnames(fit)

[1] "P1" "P2" "P3" "P4" "P5" "P6" "P7" "Placebo.2" [9] "XTC.1" "XTC.2" "XTC.6”

Now testing for the different comparisons of interest.

## 1 hour treatment vs. 1 hour placebo

> lrt = glmLRT(fit, coef = 9)

## 2 hour treatment vs. 2 hour placebo

> lrt = glmLRT(fit, contrast = c(-1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0))

## 6 hour treatment vs. 1 hour placebo, no 6 hour placebo available.

> lrt = glmLRT(fit)) # last coefficient, no need to specify.

I have tested a variety of design matrices (with and without intercept, not considering the pairing etc.) and the results are quite robust across all these tests. But I would be thankful for some insightful comments from the community - of course pointing at obvious mistakes/misconceptions, or giving examples of some potentially informative alternative approaches.

Cheers,
Kemal

--
Dr. Kemal Akat
HHMI/The Rockefeller University
Laboratory of RNA Molecular Biology
1230 York Avenue, Box #186
New York, NY 10065

R> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] locfit_1.5-9.1 edgeR_3.8.2 limma_3.22.1 colorout_1.0-3 setwidth_1.0-3 loaded via a namespace (and not attached): [1] grid_3.1.2 lattice_0.20-29 tools_3.1.2

edgeR limma rnaseq design and contrast matrix • 1.7k views

ADD COMMENT • link updated 9.3 years ago by Gordon Smyth 50k • written 9.4 years ago by Kemal Akat • 0

score 1 · Answer 1 · 2014-12-16

Your call to estimateDisp should be replaced with:

d = estimateDisp(d, design)

This means that the dispersion estimation will account for the patient blocking factors, rather than assuming that corresponding samples from different patients are direct replicates in a one-way layout.

In addition, for your comparison between 2 hour placebo and treatment, it seems that the correct call should be:

lrt = glmLRT(fit, contrast = c(0, 0, 0, 0, 0, 0, 0, -1, 0, 1, 0))

This should give the desired contrast of XTC.2 - Placebo.2 = 0, according to your design matrix.

score 0 · Answer 2 · 2014-12-16

0

Entering edit mode

Kemal Akat • 0

@kemal-akat-7178

Last seen 3.5 years ago

USA/New York/The Rockefeller University

Thank you Aaron! Much appreciated.

ADD COMMENT • link 9.4 years ago Kemal Akat • 0