Limma - Two-color group differences *and* correlation with individual differences.
1
0
Entering edit mode
@calin-jageman-robert-6431
Last seen 5.3 years ago

I'm using limma in BioConductor to make a simple paired comparison in gene expression.  Now I'd like to run an auxiliary analysis to see if there are individual differences which correlate with changes in gene expression.  I'm having difficulty understanding how to fold this additional research question in.

First - the simple paired comparison.  I have animals which learn a task on only 1 side of the body.  I then harvest CNS tissue from the trained and untrained sides.  Gene expression due to learning is then identified by comparing the trained and untrained samples on a two-color array (each array is within-subjects with one dye for the trained sample, the other for the untrained sample; dyes are counterbalanced across condition; 16 biological replicates (i.e. 16 different microarrays, each with trained/untrained samples from 1 animal)).  The essential part of the R script I'm using is:

design <- modelMatrix(targets, ref="Control")
fit <- lmFit(MA2.avg, design)
regulated <- treat(fit, lfc=0.137503524)

Even though all the animals have learned to criterion, there are large individual differences in their ability to retain the information.  So each animal has a 'memory score' that measures the degree to which the learned behavior has decayed between training and harvesting the samples.

So: I'd like to find changes in gene expression that relate to memory scores.  To be clear, I'm interested in relating memory scores to the log-fold-change scores (the comparisons of trained/untrained), though I suppose it could also be possible to look for correlations with expression overall (e.g. the average expression across both sides).

I know that limma can handle quantitative conditions.  But none of the examples/papers I've been able to find so far seem to quite fit the design I have.  Hope I'm not asking an obvious our previously answered question.

Any help or pointers much appreciated,

Bob

Just to be clear, here's the targets array for the paired comparison.

SampleNumber    FileName    Cy3    Cy5
1   Animal1.txt    Control    Trained
...
16    Animal16.txt    Trained    Control

So I could extend it with the memory scores:

SampleNumber    FileName    Cy3    Cy5    MScore
1    Animal1.txt    Control    Trained    2.1
...
16    Animal168.txt    Trained    Control    1.3

but then I'm pretty lost as to how to make a proper contrast to find transcripts which have a LFC correlated with the memory scores.

limma regression • 771 views
2
Entering edit mode
@gordon-smyth
Last seen 22 minutes ago
WEHI, Melbourne, Australia

First, I strongly recommend you include a probe-specific dye effect in the design matrix:

design <- cbind(Dye=1, design)

This exploits your dye-swap design and often improves the analysis markedly for Agilent arrays.

For the MScore, you need to make the sign of this variable match the sign of the M-values:

i <- targets\$Cy5=="Control"
MScore[i] <- -MScore[i]

Then just add the MScore to the design matrix:

design <- cbind(design, MScore=Mscore)

Then test for that coefficient:

fit <- lmFit(MA2.avg, design)
fit <- treat(fit, lfc=log2(1.1))
topTreat(fit, coef="MScore")


1
Entering edit mode

Thanks so much.  limma is incredible, and you are extremely generous to answer so many questions.

I tried out the approach you suggested and it seems to work like a charm.  I haven't done this study yet, so I tested using data from a previous study and set the mscores to be the exact LFCs for one transcript over the biological replicates.  limma immediately identified the transcript I had used as the basis of the mscores, and also identified the other transcripts on the array with highly correlated LFCs.  Then I re-ran using a different transcript as the basis of the mscores, and limma again picked it out.  As expected, success depended on having lots of variance in the quantitative factor.  When I based the mscores on a transcript with low variation it was not at the top of the list and no correlations were significant.  More testing to do, but this seems really promising for use with real data.

I also really appreciate the tip on estimating dye effects.  I did some reading about it and then tried it out with a previous study.  As expected, the LFCs didn't change, but p values were generally shifted lower, leading to an expansion of the regulated transcript list by about 20% (using treat for significantly greater than 10% change in either direction; n = 8).  I'll look into how transcripts in the expanded region hold up to qPCR validation.

Finally, I just wanted say the results we've been obtaining using limma have been remarkable.  We just completed an analysis of genes regulated 1 day after learning (n = 8).  We checked the results from limma using qPCR in an independent set of samples (n = 11).  So far, we've run qPCR on 45 transcripts, and the correlation between microarray and qPCR is .87 (figure below).  Of the 45 transcripts tested, 30 were ones predicted by microarray to be strongly regulated.  All 30 of these showed statistically significant and strong regulation in the independent samples via qPCR.  I know that's just a drop in the bucket of the many theoretical and empirical assessments of the approach limma uses.  But I still can't help feeling impressed and reassured to see things working out so well in our hands.  Many thanks!

0
Entering edit mode

Thanks, it's good to hear that limma has been useful for you.