limma - complex design matrix / blocks / random effects

0

Entering edit mode

Yannick Wurm ▴ 220

@yannick-wurm-2314

Last seen 9.6 years ago

Hello List, I've got a bit of a complex design - perhaps you can offer me some advice. Thank you very much in advance for taking the time to read this mail. (I've highlighted my questions with *** if you just want to skip down) I have ~100 two-color spotted cDNA chips, making up a 20-timepoint timecourse, replicated with four different strains. (so ~20 timepoints/strain in a loop, plus a few hybs against a reference RNA). But a few timepoints have only three replicated strains because something went wrong in the lab. Until now, I've been inspired by the end of section 8.2 of limma_2.12.0's limmaUsersGuide(), as follows: ### below, my four strains are named A, D, G and H design <- modelMatrix(targets, ref = "REF") fit <- lmFit(MA, design) cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4 - (A0h+D0h +G0h+K0h)/4, T_6h_vs0h= (A6h+D6h+G6h)/3 - (A0h+D0h+G0h+K0h)/4, T_12h_vs0h= (A12h+D12h+G12h)/3 - (A0h+D0h+G0h+K0h)/4, T_18h_vs0h= (A18h+D18h+G18h+K18h)/4 - (A0h+D0h+G0h+K0h)/4, T_24h_vs0h= (A24h+D24h+G24h+K24h)/4 - (A0h+D0h+G0h+K0h)/4, T_36h_vs0h= (A36h+D36h+G36h+K36h)/4 - (A0h+D0h+G0h+K0h)/4, T_48h_vs0h= (A48h+D48h+G48h+K48h)/4 - (A0h+D0h+G0h+K0h)/4, T_3_vs0h= (A3+D3+G3+K3)/4 - (A0h+D0h+G0h+K0h)/4, T_4_vs0h= (A4+D4+G4+K4)/4 - (A0h+D0h+G0h+K0h)/4, T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4, T_6_vs0h= (A6+D6+G6+K6)/4 - (A0h+D0h+G0h+K0h)/4, T_7_vs0h= (A7+D7+G7+K7)/4 - (A0h+D0h+G0h+K0h)/4, T_8_vs0h= (A8+D8+G8+K8)/4 - (A0h+D0h+G0h+K0h)/4, T_9_vs0h= (A9+D9+G9+K9)/4 - (A0h+D0h+G0h+K0h)/4, T_10_vs0h= (A10+D10+G10+K10)/4 - (A0h+D0h+G0h+K0h)/4, T_11_vs0h= (A11+D11+G11+K11)/4 - (A0h+D0h+G0h+K0h)/4, T_12_vs0h= (A12+D12+G12+K12)/4 - (A0h+D0h+G0h+K0h)/4, T_13_vs0h= (A13+D13+G13+K13)/4 - (A0h+D0h+G0h+K0h)/4, T_14_vs0h= (A14+D14+G14+K14)/4 - (A0h+D0h+G0h+K0h)/4, T_15_vs0h= (A15+D15+G15+K15)/4 - (A0h+D0h+G0h+K0h)/4, levels=design) fit2 <- contrasts.fit(fit, cont.matrix) Most RNA samples are used twice (eg: the "3hour" sample from strain A is used for both: A0h->A3h and: A3h->A6h), but some are used three times (this would be the case for A3h if I also had a A3h->REF chip). Limma is unable to estimate coefficients for many of my spots, ie fit2 $coefficients give "NA" rows for too many spots. If I understand correctly, this could be because for example both "K0h" chips are have 5000 flagged spots in common (indeed, several chips have thousands of genepix auto-flagged weak spots). NAs are frustrating. One workaround that came up when consulting my local statistician: to treat technical replicates and biological replicates equally. But technical replicates have higher correlation than biological replicates (I calculated a few correlations by hand): - technical replicate (same biological sample, same dye, different chip): median correlation = 0.55 - biological replicate (different sample, but same timepoint and dye, different chip): median correlation = 0.35 Thus I think it would be incorrect to treat my technical replicates as biological replicates. So I'm examining alternative means of creating a design matrix. Ideally, I'd like to use "strain" as a random effect here. *** Is it correct that the only way to add a random effect is with lmFit's block argument?: lmFit(myMA, design=myDesign, block=factor(targets$Colony), correlation=0.55) duplicateCorrelation cannot be used on a complex design. *** Is it correct to use the block argument on such a design, with a correlation I calculated "by hand"? *** Alternatively, I'll just put everything as fixed factors, but it feels wrong: extraFactors = model.matrix(~0+factor(targets$Strain)+factor(targets $Batch)) # several strains, only two possible batches colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch") design_bothFixed = cbind(modelMatrix(targets, ref="REF"), extraFactors) *** Would the design matrix I just created be a correct way of proceeding? ***Since limma is limited to a single random factor, how should I choose which between Strain and Batch to use as fixed and which as random? Thanks very much :o) Yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html

limma limma • 1.6k views

ADD COMMENT • link updated 15.9 years ago by Naomi Altman ★ 6.0k • written 15.9 years ago by Yannick Wurm ▴ 220

1

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

maanova purports to handle several random effects. --Naomi At 05:01 PM 6/3/2008, Yannick Wurm wrote: >Hello List, > >I've got a bit of a complex design - perhaps you can offer me some >advice. Thank you very much in advance for taking the time to read >this mail. (I've highlighted my questions with *** if you just want >to skip down) > > > >I have ~100 two-color spotted cDNA chips, making up a 20-timepoint >timecourse, replicated with four different strains. (so ~20 >timepoints/strain in a loop, plus a few hybs against a reference >RNA). But a few timepoints have only three replicated strains because >something went wrong in the lab. > >Until now, I've been inspired by the end of section 8.2 of >limma_2.12.0's limmaUsersGuide(), as follows: > ### below, my four strains are named A, D, G and H > design <- modelMatrix(targets, ref = "REF") > fit <- lmFit(MA, design) > cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4 > - (A0h+D0h +G0h+K0h)/4, > T_6h_vs0h= (A6h+D6h+G6h)/3 - > (A0h+D0h+G0h+K0h)/4, > T_12h_vs0h= (A12h+D12h+G12h)/3 - > (A0h+D0h+G0h+K0h)/4, > T_18h_vs0h= (A18h+D18h+G18h+K18h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_24h_vs0h= (A24h+D24h+G24h+K24h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_36h_vs0h= (A36h+D36h+G36h+K36h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_48h_vs0h= (A48h+D48h+G48h+K48h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_3_vs0h= (A3+D3+G3+K3)/4 - > (A0h+D0h+G0h+K0h)/4, > T_4_vs0h= (A4+D4+G4+K4)/4 - > (A0h+D0h+G0h+K0h)/4, > T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4, > T_6_vs0h= (A6+D6+G6+K6)/4 - > (A0h+D0h+G0h+K0h)/4, > T_7_vs0h= (A7+D7+G7+K7)/4 - > (A0h+D0h+G0h+K0h)/4, > T_8_vs0h= (A8+D8+G8+K8)/4 - > (A0h+D0h+G0h+K0h)/4, > T_9_vs0h= (A9+D9+G9+K9)/4 - > (A0h+D0h+G0h+K0h)/4, > T_10_vs0h= (A10+D10+G10+K10)/4 - > (A0h+D0h+G0h+K0h)/4, > T_11_vs0h= (A11+D11+G11+K11)/4 - > (A0h+D0h+G0h+K0h)/4, > T_12_vs0h= (A12+D12+G12+K12)/4 - > (A0h+D0h+G0h+K0h)/4, > T_13_vs0h= (A13+D13+G13+K13)/4 - > (A0h+D0h+G0h+K0h)/4, > T_14_vs0h= (A14+D14+G14+K14)/4 - > (A0h+D0h+G0h+K0h)/4, > T_15_vs0h= (A15+D15+G15+K15)/4 - > (A0h+D0h+G0h+K0h)/4, > levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > >Most RNA samples are used twice (eg: the "3hour" sample from strain A >is used for both: A0h->A3h and: A3h->A6h), but some are used three >times (this would be the case for A3h if I also had a A3h->REF chip). > >Limma is unable to estimate coefficients for many of my spots, ie >fit2 $coefficients give "NA" rows for too many spots. If I understand >correctly, this could be because for example both "K0h" chips are >have 5000 flagged spots in common (indeed, several chips have >thousands of genepix auto-flagged weak spots). > >NAs are frustrating. One workaround that came up when consulting my >local statistician: to treat technical replicates and biological >replicates equally. But technical replicates have higher correlation >than biological replicates (I calculated a few correlations by hand): > - technical replicate (same biological sample, same dye, different >chip): median correlation = 0.55 > - biological replicate (different sample, but same timepoint and >dye, different chip): median correlation = 0.35 >Thus I think it would be incorrect to treat my technical replicates >as biological replicates. > >So I'm examining alternative means of creating a design matrix. > >Ideally, I'd like to use "strain" as a random effect here. *** Is it >correct that the only way to add a random effect is with lmFit's >block argument?: > lmFit(myMA, design=myDesign, block=factor(targets$Colony), >correlation=0.55) >duplicateCorrelation cannot be used on a complex design. *** Is it >correct to use the block argument on such a design, with a >correlation I calculated "by hand"? *** > >Alternatively, I'll just put everything as fixed factors, but it >feels wrong: > extraFactors = > model.matrix(~0+factor(targets$Strain)+factor(targets $Batch)) # > several strains, only two possible batches > colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch") > design_bothFixed = cbind(modelMatrix(targets, ref="REF"), > extraFactors) >*** Would the design matrix I just created be a correct way of >proceeding? > >***Since limma is limited to a single random factor, how should I >choose which between Strain and Batch to use as fixed and which as >random? > >Thanks very much :o) > >Yannick > > >-------------------------------------------- > yannick . wurm @ unil . ch >Ant Genomics, Ecology & Evolution @ Lausanne > http://www.unil.ch/dee/page28685_fr.html > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 15.9 years ago Naomi Altman ★ 6.0k

Login before adding your answer.