Question: Dealing with missing samples in limma and design matrix questions
0
3.6 years ago by
clevy10
United States
clevy10 wrote:

Hi Bioconductors,

I am working on analyzing data from an Illumina beadchip microarray with samples from the following experimental set-up:

7 tissue donors x 3 treatments x 3 RNA collection time points

Treatments: Mock infection, virus A infection, virus B infection

RNA collected from tissue at time points 1,2 and 3

At least initially, I am planning to analyze the data separately for each time point, so I am not including time as a factor in the design matrices that I am making. I would like to do an analysis where virus A treatment and virus B treatment are paired with the Mock treatment of their common tissue donor and look at mock vs A and mock vs B and A vs B

Problem: Some of the samples failed on the array, including two of the mock treatments, so for those donors, I am left with just virusA and virusB infection.

So far I have been looking at pages 43 and 44 of the limma manual to figure out how to make appropriate design matrices, but was wondering how limma works when a comparison is indicated in the design or contrast matrix and a  sample is missing one of the replicates necessary to make that comparison.

Claire Levy

University of Washington OBGYN

Fred Hutchinson Cancer Research Center VIDD

modified 3.6 years ago • written 3.6 years ago by clevy10
Answer: How does limma deal with missing samples?
2
3.6 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

Limma just uses whatever observations are available. If some donors only have A and B, then obviously those donors will contribute to the A vs B comparison but not to A vs mock or to B vs mock. This happens automatically.

Generally it is better to analyze all the data at once, rather than do separate analyses for each time point.

Answer: How does limma deal with missing samples?
0
3.6 years ago by
clevy10
United States
clevy10 wrote:

Great, that is kind of what I hoped would happen. By "better to analyze all the data at once", do you mean statistically, or more from an interpretation of results perspective?

I am keen on the "analyze all at once" approach so I will take a closer look at the Time Course Experiments section on pg48. I think I could maybe use a similar approach to the example except have 3 treatments instead of just WT and mutant and add a factor for Tissue donor so any potential differences between donors (which I'm not really interested in) don't confound the results. Thanks for your quick response.

Thanks for the quick response, it is much appreciated

Claire

I meant statistically, but it is obviously better from an interpretation perspective as well.

You don't even need to worry about the special features of time course experiments. You can just view your study as 9 different treatments on each donor, where each treatment is an infection by time point combination.

Fantastic, I have been following the "Multilevel Experiments" section (pg 51) and it more or less makes sense but I am not sure about the  duplicateCorrelation part.  The difference between my experiment and the example is that each of my subjects is exposed to all conditions, while in the example, some subjects have diseased tissues and others have normal (rather than all subjects having both diseased and normal).

I went through, following the example like so:

#In my exp, Treatment is analogous #to "condition" in the example, Time is analogous to Tissue # and TissueID is to subject.

#combine the Treatment and Time parameters into one Treat <-factor(paste(targets$Treatment,targets$Time, sep="."))

design<-model.matrix (~0 + Treat)

#Then we estimate the correlation between #measurements made on the same subject

corfit<- duplicateCorrelation(expressedProbes.lumi,design,                               block=targets$TissueID) corfit$consensus #[1] 0.03368044 #I guess this means the TissueIDs are not significantly correlated?

#Then this inter-subject correlation is #input into the linear model fit

fit <- lmFit(expressedProbes.lumi,design,block=targets$TissueID, correlation=corfit$consensus) #Now we can make any comparisons #between the experimental conditions cm<-makeContrasts(   SD90.3vsMock.3 = TreatA.3-TreatMock.3,   V186.3vsMock.3 = TreatB.3-TreatMock.3,   levels = design )

fit2<-contrasts.fit(fit,cm)

fit2 <-eBayes(fit2)

topTable(fit2,coef="TreatA.3vsMock.3")

But, the final paragraph of the section on pg 52 mentions that

"Here the comparison between tissues can be made within subjects [true for my experiment], and hence should be more precise than the comparison between diseased and normal, which must be made between subjects [not necessarily true for my experiment, since all subjects have all treatments]."

So maybe I need to do something different?

However, even though I can do all my comparisons within subjects, I still want to be able to say something about differences in gene expression between the different treatments in general, not just about these particular tissue donors. If my googling of the definition of random effects has given me the right answer, it seems more right to keep the analysis as is.

1

You don't have a multilevel experiment. You just have an ordinary blocked experiment. You can just follow the suggestion of Section 9.4.2 with

design <- model.matrix(~ Donor + Treat)

It's one of most common and standard experimental designs and I think you might be seeing more complications than there actually are.

Some people like to rearrange the model formula:

design <- model.matrix(~ 0 + Treat + Donor)

This is equivalent to the previous formula, but allows you use use the same contrast definitions as you are using now.

Alternatively you could use duplicateCorrelation to handle the donors, as you seem to have done above (I can't really tell, is TissueID the same as Donor?), instead of putting them in the design matrix. If the donors are not spectacularly different, then that may be a good approach. It can give more information for an unbalanced experiment, when not all treatments are measured on all subjects.

Great, I'm glad things are not always as complicated as they seem. I'll have another go at it.  TissueID is the same thing as donor by the way. Thanks for your help!