Question

Limma Design Matrix for two channel agilent arrays (direct experimental design)

0

Entering edit mode

pm2015 • 0

@pm2015-8878

Last seen 8.4 years ago

United States

Hi

I am trying to analyse an agilent two-color dataset with a direct experimental design (no reference). It consists of paired (treated and untreated samples) for each time point hybridized to a single array. The data is in two cell lines (denoted as M & P). I am trying to construct a design matrix.I am fairly new to Limma/R and am still struggling after hours of searching. Any help would be appreciated.

Here are my samples:

Name Cy3 Cy5

M_day7 M_mock_day7 M_trt_day7

M_day10 M_mock_day10 M_trt_day10

P_day7 P_mock_day7 P_trt_day7

P_day10 P_mock_day10 P_trt_day10

Thanks!

Limma limma design matrix agilent twochannel • 1.7k views

ADD COMMENT • link 8.8 years ago pm2015 • 0

0

Entering edit mode

For the second and fourth arrays, you have the same mock group for both dyes. Is this actually the case?

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for pointing it out. Sorry for the confusion. I have edited my post.

ADD REPLY • link 8.8 years ago pm2015 • 0

score 2 · Answer 1 · 2016-01-26

2

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

You can't do a differential expression analysis of this data because you don't have any replication. There are four comparisons, and each is done exactly once. There is no extra replication from which to estimate biological variation.

I assume you have already normalized this data. In that case you must have already computed the M-values (log2-fold-changes) for all probes for each of the four comparisons. That is all you do. You could explore MA-plots such as those shown on page 27 of the limma User's Guide.

ADD COMMENT • link 8.8 years ago Gordon Smyth 51k

0

Entering edit mode

Thanks for your reply. I do understand that this is not the best dataset as it is without replicates. Cost issues prohibited the lab to include replicates and it seems the experiment was designed by someone not very familiar with statistical analyses for microarrays. I am trying my best to get some meaningful results (which will be validated by other methods in lab). However, such situations are not uncommon especially when working with limited amount of samples (such as human samples).

I have normalized the data. However, the point where I was stuck is while making a design matrix. I was specifically asking for help regarding that. I realized that I didn't clarify that in my original post and have edited it. Any help regarding design matrix for my dataset would be of great help. Thanks!

ADD REPLY • link 8.8 years ago pm2015 • 0

1

Entering edit mode

Thanks for the clarification, but I already understood your original post.

The reason why I didn't tell you how to make a design matrix is that it can't possibly do you any good. You have no possible use for a design matrix because you can't do any differential expression analysis. I have already told you what you can usefully do.

If you ever analyze a two-color microarray experiment with replication, making a design matrix is easy:

design <- modelMatrix(targets, ref="M_mock_day7")

ADD REPLY • link 8.8 years ago Gordon Smyth 51k

score 0 · Answer 2 · 2016-01-26

0

Entering edit mode

pm2015 • 0

@pm2015-8878

Last seen 8.4 years ago

United States

Thanks again. I appreciate you taking time to answer my questions.

So for my dataset, would fold change be the only criteria to filter the genes to further interrogate? In the past, when analyzing similar datasets with other tools, I used a combination of fold change and p-values. Is there a way to get statistical significance for the comparisons being made (e.g. M_mock_day7 vs. M_trt_day7)? I have mostly working with single channel data and this type of data is new to me.

Regarding the design matrix, my understanding was the use of reference was only appropriate when the same sample (reference) was hybridized to all arrays. Is the choice of reference appropriate and random for my direct experimental design?

ADD COMMENT • link 8.8 years ago pm2015 • 0

2

Entering edit mode

You can't get a sensible p-value if you don't have replicates, as you won't be able to gauge the reliability (and thus, significance) of the estimated non-zero log-fold change. You could try to manufacture some replication, e.g., by assuming that the treatment effect is the same across days and/or across cell lines. However, if the treatment effect is genuinely different between your "replicates", it will be absorbed into the variance estimate. As a result, you'll end up with larger p-values and (potentially quite severe) conservativeness, so much so that you might have been better off just looking at the log-fold changes instead. And obviously, you won't be unable to do specific comparisons between mock and treatment for a given cell line/time point combination; you can only do so for each cell line or for each time point, as you've averaged across the other factor to get your replication.

The best solution is to ~~whine ceaselessly to~~ remind your wet lab people to design experiments with replicates. Never mind whether you can analyze it with limma or not, it's just good scientific practice to show that your results are reproducible.