Search
Question: Limma contrast matrix question/concern
0
3.1 years ago by
choofd0
choofd0 wrote:

Dear all,

I would like to confirm that I am using the best model for my experimental design before starting to validate the candidate genes obtained in vitro.

My thread is similar to this previous one but lacking some details on the right contrast to ask which is my concern for this current thread Limma Design for Paired Samples Question.

I have two types of patients: Diseased (n=5) and Normal (n=4) and for each patient, I have two cell populations derived from the same tissue and isolated by flow cytometry sort : Kpos and Kneg .

Which gives me 18 arrays total (Affymetrix).

I applied the Multi Level Experiments design (paragraph 9.7), using a block on patient.

Here is my pData(eset):

 SibShip Treatment Condition FileName Index Diseased1Kneg.CEL 1 Kneg Dis Diseased1Kneg.CEL 1 Diseased1Kpos.CEL 1 Kpos Dis Diseased1Kpos.CEL 2 Diseased2Kneg.CEL 2 Kneg Dis Diseased2Kneg.CEL 3 Diseased2Kpos.CEL 2 Kpos Dis Diseased2Kpos.CEL 4

etc

 Normal5Kpos.CEL 9 Kneg Nor Normal5Kpos.CEL 17 Normal5Kneg.CEL 9 Kpos Nor Normal5Kneg.CEL 18

And I have the following Design Matrix :

>Person=Targets$SibShip >Treat=factor(paste(Targets$Condition,Targets$Treatment,sep=".")) >design <- model.matrix(~0+Treat) >colnames(design) <- levels(Treat) >design Dis.Kneg Dis.Kpos Nor.Kneg Nor.Kpos 1 1 0 0 0 2 0 1 0 0 3 1 0 0 0 4 0 1 0 0 5 1 0 0 0 6 0 1 0 0 7 1 0 0 0 8 0 1 0 0 9 1 0 0 0 10 0 1 0 0 11 0 0 1 0 12 0 0 0 1 13 0 0 1 0 14 0 0 0 1 15 0 0 1 0 16 0 0 0 1 17 0 0 1 0 18 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$Treat

[1] "contr.treatment"

Because I want to make comparisons both within and between patients, I treated patient as a random effect and used duplicateCorrelation to estimate the variations between the measurements made on a same patient.

>corfit <- duplicateCorrelation(eset,design,block=Person)

> corfit$consensus > fit <- lmFit(eset,design,block=Person,correlation=corfit$consensus)

The comparisons I am the most interested in are :

-       Genes differentially expressed in tissue Kpos compared to Kneg in diseased patients

-       Genes differentially expressed in tissue Kpos compared to Kneg in normal patients

-       The most important for my analysis : genes specifically differentially expressed in tissue Kpos compared to Kneg in diseased patients and not in normal patients.

In other words, my last question is  which genes specifically drives my Kpos phenotype in diseased patients but not in the general population represented by normal patients.

I have used the following contrast matrix and I wonder if my last question shouldn’t be answered with a Venn diagram instead ? I am afraid I am just pulling out genes that don’t vary to the same extend in my diseased and normal patients using the Diff contrast.

> cm <- makeContrasts(Dis.KposvsKneg= Dis.Kpos - Dis.Kneg,

Nor.KposvsKneg= Nor.Kpos - Nor.Kneg,

Diff=(Dis.Kpos-Dis.Kneg)-(Nor.Kpos-Nor.Kneg),

levels=design)

Is that the right way to ask my question #3 ?

As a side note, does the order of my coeficient matters in my Design Matrix or should I consider to do an approach similar to 17.3.8 in the Limma userguide where the levels of the factors were reorderd in order to look specifically at the Dis.Kpos

And in general, is there any contraindications to perform all contrasts in a single contrast matrix?

Any help/input would be very (very !) appreciated.

Thanks for making such a powerful tool available to the community.

Best,

Catherine

modified 3.1 years ago • written 3.1 years ago by choofd0
3
3.1 years ago by
Aaron Lun20k
Cambridge, United Kingdom
Aaron Lun20k wrote:

For your third research question; if you want to find genes that are DE between cell types in diseased patients, but are not DE in normal patients, it will get messy. This is because the significance of similarly-expressed genes is not easy to define. You'll need to intersect those DE genes from your first contrast with the non-DE genes from your second contrast. I'd probably do something like this (assuming you've run contrasts.fit with cm on fit, and then eBayes to get some object that I'll call refit here):

res.dis <- topTable(refit, coef="Dis.KposvsKneg", sort.by="none", n=Inf)
is.de.dis <- res.dis$adj.P.Val <= 0.05 res.norm <- topTable(refit, coef="Nor.KposvsKneg", sort.by="none", n=Inf, confint=0.95) is.nde.norm <- res.norm$adj.P.Val >= 0.05 & res.norm$CI.L >= -1 & res.norm$CI.R <= 1
keep <- is.de.dis & is.nde.norm
relevant.genes <- res.dis[keep,]

The idea with non-DE'ness is to find those genes that are not significant and have log-fold changes within the [-1, 1] interval (well, a 95% confidence interval within that range, anyway). This ensures that the lack of DE for a particular gene isn't just due to a lack of statistical power, but is a result of a log-fold change that is actually close to zero.

The third contrast that you specify in the matrix will identify genes that change across cell types differently between normal and diseased patients. There is no guarantee that these genes will be non-DE in the normal patients, so it doesn't really answer your third research question. However, I think that this contrast is still interesting, possibly more so than the question it was (incorrectly) designed to answer. For example, if a gene was DE across cell types in both normal and diseased patients, but was 2-fold up in the former and 10-fold up in the latter, wouldn't you want to know about it?

As for your other questions; no, the order doesn't matter, because you specify the coefficients by name anyway during your construction of the contrast matrix. And no, there's no problem with having everything in one contrast matrix, as you pull out the specific comparisons you're interested in with topTable by specifying coef (either by name or number).

--

Edit: as an aside, your duplicateCorrelation approach is perfectly valid, but with the DE comparisons you're making, you can afford to block explicitly on each patient. The first two contrasts are made between cell types within each patient, so the blocking factor won't interfere with them. Similarly, the last contrast is made between the log-fold changes between cell types - blocking will have no effect as the log-fold changes are computed within patients.

Person <- factor(Person)
design2 <- model.matrix(~0 + Treat + Person)
design2 <- design2[,-ncol(design2)] # to get to full rank
colnames(design2)[1:4] <- levels(Treat)

Linear modelling with design2 may be more accurate as it avoids some assumptions of duplicateCorrelation - namely, that all genes have correlations equal to the consensus value.

Hi Aaron,

I am coming back to this question as I would like some more (theorical) precisions. In your design2 approach, you explained to me that I could block explicitely on each patient because none of my contrast were impacted by my blocking factor.

But what if ones tries to do a contrast comparing conditions belonging to separate blocks, for example in my case, Dis.Kneg with Norm.Kneg. Could you please let me know what would be the problem with such an approach and how the blocking factor would influence the results?

Thanks a lot,

Catherine

In such cases, you need to use duplicateCorrelation in the manner described in the original post. This is because the conditions being compared (e.g., Dis.Kneg and Norm.Kneg) belong to different patients. Blocking on Patient in design2 will confound any comparison between these conditions, because the patient blocking factors will absorb any differences between patients (and hence, any differences between conditions). In fact, if you set a contrast with Dis.Kneg - Norm.Kneg in makeContrasts with design2, you'll actually be comparing between Kneg samples in patients 1 and 9 only. Differences in all other patients are absorbed by their corresponding blocking factors.

On a similar note, variance estimates with design2 do not consider variability in absolute expression between patients, as this is absorbed into the patient blocking factors. However, if you intend to compare between conditions with different patients, this variability is arguably relevant and should be considered during significance calculations. This is possible with the one-way layout in design, described in the original post. Of course, if you're comparing between conditions within the same patient, then this variability is irrelevant and its incorporation into the variance estimate will result in loss of power.

Many thanks for the clarifications!

0
3.1 years ago by
choofd0
choofd0 wrote:

Thanks a lot Aaron for your input!

I  agree that the Diff contrast is also interesting and I will combine both Diff and the list of relevant genes DE for diseased but not for normal to determine the best targets to validate in vitro.

The design2 approach appears to give very similar results than the duplicateCorrelation (which is reassuring).

Thank you for your time and suggestions, really appreciate it.

Catherine