Question

Limma contrast matrix question/concern

0

Entering edit mode

choofd • 0

@choofd-7709

Last seen 8.7 years ago

Canada

Dear all,

I would like to confirm that I am using the best model for my experimental design before starting to validate the candidate genes obtained in vitro.

My thread is similar to this previous one but lacking some details on the right contrast to ask which is my concern for this current thread Limma Design for Paired Samples Question.

I have two types of patients: Diseased (n=5) and Normal (n=4) and for each patient, I have two cell populations derived from the same tissue and isolated by flow cytometry sort : Kpos and Kneg .

Which gives me 18 arrays total (Affymetrix).

I applied the Multi Level Experiments design (paragraph 9.7), using a block on patient.

Here is my pData(eset):

	SibShip	Treatment	Condition	FileName	Index
Diseased1Kneg.CEL	1	Kneg	Dis	Diseased1Kneg.CEL	1
Diseased1Kpos.CEL	1	Kpos	Dis	Diseased1Kpos.CEL	2
Diseased2Kneg.CEL	2	Kneg	Dis	Diseased2Kneg.CEL	3
Diseased2Kpos.CEL	2	Kpos	Dis	Diseased2Kpos.CEL	4

etc

Normal5Kpos.CEL	9	Kneg	Nor	Normal5Kpos.CEL	17
Normal5Kneg.CEL	9	Kpos	Nor	Normal5Kneg.CEL	18

And I have the following Design Matrix :

>Person=Targets$SibShip

>Treat=factor(paste(Targets$Condition,Targets$Treatment,sep="."))

>design <- model.matrix(~0+Treat)

>colnames(design) <- levels(Treat)

>design

Dis.Kneg Dis.Kpos Nor.Kneg Nor.Kpos

1 1 0 0 0

2 0 1 0 0

3 1 0 0 0

4 0 1 0 0

5 1 0 0 0

6 0 1 0 0

7 1 0 0 0

8 0 1 0 0

9 1 0 0 0

10 0 1 0 0

11 0 0 1 0

12 0 0 0 1

13 0 0 1 0

14 0 0 0 1

15 0 0 1 0

16 0 0 0 1

17 0 0 1 0

18 0 0 0 1

attr(,"assign")

[1] 1 1 1 1

attr(,"contrasts")

attr(,"contrasts")$Treat

[1] "contr.treatment"

Because I want to make comparisons both within and between patients, I treated patient as a random effect and used duplicateCorrelation to estimate the variations between the measurements made on a same patient.

>corfit <- duplicateCorrelation(eset,design,block=Person)

> corfit$consensus

> fit <- lmFit(eset,design,block=Person,correlation=corfit$consensus)

The comparisons I am the most interested in are :

- Genes differentially expressed in tissue Kpos compared to Kneg in diseased patients

- Genes differentially expressed in tissue Kpos compared to Kneg in normal patients

- The most important for my analysis : genes specifically differentially expressed in tissue Kpos compared to Kneg in diseased patients and not in normal patients.

In other words, my last question is which genes specifically drives my Kpos phenotype in diseased patients but not in the general population represented by normal patients.

I have used the following contrast matrix and I wonder if my last question shouldn’t be answered with a Venn diagram instead ? I am afraid I am just pulling out genes that don’t vary to the same extend in my diseased and normal patients using the Diff contrast.

> cm <- makeContrasts(Dis.KposvsKneg= Dis.Kpos - Dis.Kneg,

Nor.KposvsKneg= Nor.Kpos - Nor.Kneg,

Diff=(Dis.Kpos-Dis.Kneg)-(Nor.Kpos-Nor.Kneg),

levels=design)

Is that the right way to ask my question #3 ?

As a side note, does the order of my coeficient matters in my Design Matrix or should I consider to do an approach similar to 17.3.8 in the Limma userguide where the levels of the factors were reorderd in order to look specifically at the Dis.Kpos

And in general, is there any contraindications to perform all contrasts in a single contrast matrix?

Any help/input would be very (very !) appreciated.

Thanks for making such a powerful tool available to the community.

Best,

Catherine

limma block design and contrast matrix • 1.8k views

ADD COMMENT • link 8.8 years ago choofd • 0

score 3 · Answer 1 · 2015-06-24

For your third research question; if you want to find genes that are DE between cell types in diseased patients, but are not DE in normal patients, it will get messy. This is because the significance of similarly-expressed genes is not easy to define. You'll need to intersect those DE genes from your first contrast with the non-DE genes from your second contrast. I'd probably do something like this (assuming you've run contrasts.fit with cm on fit, and then eBayes to get some object that I'll call refit here):

res.dis <- topTable(refit, coef="Dis.KposvsKneg", sort.by="none", n=Inf)
is.de.dis <- res.dis$adj.P.Val <= 0.05
res.norm <- topTable(refit, coef="Nor.KposvsKneg", sort.by="none", n=Inf, confint=0.95)
is.nde.norm <- res.norm$adj.P.Val >= 0.05 & res.norm$CI.L >= -1 & res.norm$CI.R <= 1
keep <- is.de.dis & is.nde.norm
relevant.genes <- res.dis[keep,]

The idea with non-DE'ness is to find those genes that are not significant and have log-fold changes within the [-1, 1] interval (well, a 95% confidence interval within that range, anyway). This ensures that the lack of DE for a particular gene isn't just due to a lack of statistical power, but is a result of a log-fold change that is actually close to zero.

The third contrast that you specify in the matrix will identify genes that change across cell types differently between normal and diseased patients. There is no guarantee that these genes will be non-DE in the normal patients, so it doesn't really answer your third research question. However, I think that this contrast is still interesting, possibly more so than the question it was (incorrectly) designed to answer. For example, if a gene was DE across cell types in both normal and diseased patients, but was 2-fold up in the former and 10-fold up in the latter, wouldn't you want to know about it?

As for your other questions; no, the order doesn't matter, because you specify the coefficients by name anyway during your construction of the contrast matrix. And no, there's no problem with having everything in one contrast matrix, as you pull out the specific comparisons you're interested in with topTable by specifying coef (either by name or number).

--

Edit: as an aside, your duplicateCorrelation approach is perfectly valid, but with the DE comparisons you're making, you can afford to block explicitly on each patient. The first two contrasts are made between cell types within each patient, so the blocking factor won't interfere with them. Similarly, the last contrast is made between the log-fold changes between cell types - blocking will have no effect as the log-fold changes are computed within patients.

Person <- factor(Person)
design2 <- model.matrix(~0 + Treat + Person)
design2 <- design2[,-ncol(design2)] # to get to full rank
colnames(design2)[1:4] <- levels(Treat)

Linear modelling with design2 may be more accurate as it avoids some assumptions of duplicateCorrelation - namely, that all genes have correlations equal to the consensus value.

score 0 · Answer 2 · 2015-06-25

Thanks a lot Aaron for your input!

I agree that the Diff contrast is also interesting and I will combine both Diff and the list of relevant genes DE for diseased but not for normal to determine the best targets to validate in vitro.

The design2 approach appears to give very similar results than the duplicateCorrelation (which is reassuring).

Thank you for your time and suggestions, really appreciate it.

Catherine