limma - complex design matrix / blocks / random effects
1
0
Entering edit mode
Yannick Wurm ▴ 220
@yannick-wurm-2314
Last seen 10.2 years ago
Hello List, I've got a bit of a complex design - perhaps you can offer me some advice. Thank you very much in advance for taking the time to read this mail. (I've highlighted my questions with *** if you just want to skip down) I have ~100 two-color spotted cDNA chips, making up a 20-timepoint timecourse, replicated with four different strains. (so ~20 timepoints/strain in a loop, plus a few hybs against a reference RNA). But a few timepoints have only three replicated strains because something went wrong in the lab. Until now, I've been inspired by the end of section 8.2 of limma_2.12.0's limmaUsersGuide(), as follows: ### below, my four strains are named A, D, G and H design <- modelMatrix(targets, ref = "REF") fit <- lmFit(MA, design) cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4 - (A0h+D0h +G0h+K0h)/4, T_6h_vs0h= (A6h+D6h+G6h)/3 - (A0h+D0h+G0h+K0h)/4, T_12h_vs0h= (A12h+D12h+G12h)/3 - (A0h+D0h+G0h+K0h)/4, T_18h_vs0h= (A18h+D18h+G18h+K18h)/4 - (A0h+D0h+G0h+K0h)/4, T_24h_vs0h= (A24h+D24h+G24h+K24h)/4 - (A0h+D0h+G0h+K0h)/4, T_36h_vs0h= (A36h+D36h+G36h+K36h)/4 - (A0h+D0h+G0h+K0h)/4, T_48h_vs0h= (A48h+D48h+G48h+K48h)/4 - (A0h+D0h+G0h+K0h)/4, T_3_vs0h= (A3+D3+G3+K3)/4 - (A0h+D0h+G0h+K0h)/4, T_4_vs0h= (A4+D4+G4+K4)/4 - (A0h+D0h+G0h+K0h)/4, T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4, T_6_vs0h= (A6+D6+G6+K6)/4 - (A0h+D0h+G0h+K0h)/4, T_7_vs0h= (A7+D7+G7+K7)/4 - (A0h+D0h+G0h+K0h)/4, T_8_vs0h= (A8+D8+G8+K8)/4 - (A0h+D0h+G0h+K0h)/4, T_9_vs0h= (A9+D9+G9+K9)/4 - (A0h+D0h+G0h+K0h)/4, T_10_vs0h= (A10+D10+G10+K10)/4 - (A0h+D0h+G0h+K0h)/4, T_11_vs0h= (A11+D11+G11+K11)/4 - (A0h+D0h+G0h+K0h)/4, T_12_vs0h= (A12+D12+G12+K12)/4 - (A0h+D0h+G0h+K0h)/4, T_13_vs0h= (A13+D13+G13+K13)/4 - (A0h+D0h+G0h+K0h)/4, T_14_vs0h= (A14+D14+G14+K14)/4 - (A0h+D0h+G0h+K0h)/4, T_15_vs0h= (A15+D15+G15+K15)/4 - (A0h+D0h+G0h+K0h)/4, levels=design) fit2 <- contrasts.fit(fit, cont.matrix) Most RNA samples are used twice (eg: the "3hour" sample from strain A is used for both: A0h->A3h and: A3h->A6h), but some are used three times (this would be the case for A3h if I also had a A3h->REF chip). Limma is unable to estimate coefficients for many of my spots, ie fit2 $coefficients give "NA" rows for too many spots. If I understand correctly, this could be because for example both "K0h" chips are have 5000 flagged spots in common (indeed, several chips have thousands of genepix auto-flagged weak spots). NAs are frustrating. One workaround that came up when consulting my local statistician: to treat technical replicates and biological replicates equally. But technical replicates have higher correlation than biological replicates (I calculated a few correlations by hand): - technical replicate (same biological sample, same dye, different chip): median correlation = 0.55 - biological replicate (different sample, but same timepoint and dye, different chip): median correlation = 0.35 Thus I think it would be incorrect to treat my technical replicates as biological replicates. So I'm examining alternative means of creating a design matrix. Ideally, I'd like to use "strain" as a random effect here. *** Is it correct that the only way to add a random effect is with lmFit's block argument?: lmFit(myMA, design=myDesign, block=factor(targets$Colony), correlation=0.55) duplicateCorrelation cannot be used on a complex design. *** Is it correct to use the block argument on such a design, with a correlation I calculated "by hand"? *** Alternatively, I'll just put everything as fixed factors, but it feels wrong: extraFactors = model.matrix(~0+factor(targets$Strain)+factor(targets $Batch)) # several strains, only two possible batches colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch") design_bothFixed = cbind(modelMatrix(targets, ref="REF"), extraFactors) *** Would the design matrix I just created be a correct way of proceeding? ***Since limma is limited to a single random factor, how should I choose which between Strain and Batch to use as fixed and which as random? Thanks very much :o) Yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html
limma limma • 1.7k views
ADD COMMENT
1
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 3.6 years ago
United States
maanova purports to handle several random effects. --Naomi At 05:01 PM 6/3/2008, Yannick Wurm wrote: >Hello List, > >I've got a bit of a complex design - perhaps you can offer me some >advice. Thank you very much in advance for taking the time to read >this mail. (I've highlighted my questions with *** if you just want >to skip down) > > > >I have ~100 two-color spotted cDNA chips, making up a 20-timepoint >timecourse, replicated with four different strains. (so ~20 >timepoints/strain in a loop, plus a few hybs against a reference >RNA). But a few timepoints have only three replicated strains because >something went wrong in the lab. > >Until now, I've been inspired by the end of section 8.2 of >limma_2.12.0's limmaUsersGuide(), as follows: > ### below, my four strains are named A, D, G and H > design <- modelMatrix(targets, ref = "REF") > fit <- lmFit(MA, design) > cont.matrix = makeContrasts(T_3h_vs0h= (A3h+D3h+G3h+K3h)/4 > - (A0h+D0h +G0h+K0h)/4, > T_6h_vs0h= (A6h+D6h+G6h)/3 - > (A0h+D0h+G0h+K0h)/4, > T_12h_vs0h= (A12h+D12h+G12h)/3 - > (A0h+D0h+G0h+K0h)/4, > T_18h_vs0h= (A18h+D18h+G18h+K18h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_24h_vs0h= (A24h+D24h+G24h+K24h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_36h_vs0h= (A36h+D36h+G36h+K36h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_48h_vs0h= (A48h+D48h+G48h+K48h)/4 > - (A0h+D0h+G0h+K0h)/4, > T_3_vs0h= (A3+D3+G3+K3)/4 - > (A0h+D0h+G0h+K0h)/4, > T_4_vs0h= (A4+D4+G4+K4)/4 - > (A0h+D0h+G0h+K0h)/4, > T_5_vs0h= (A5+G5+K5)/3 - (A0h+D0h+G0h+K0h)/4, > T_6_vs0h= (A6+D6+G6+K6)/4 - > (A0h+D0h+G0h+K0h)/4, > T_7_vs0h= (A7+D7+G7+K7)/4 - > (A0h+D0h+G0h+K0h)/4, > T_8_vs0h= (A8+D8+G8+K8)/4 - > (A0h+D0h+G0h+K0h)/4, > T_9_vs0h= (A9+D9+G9+K9)/4 - > (A0h+D0h+G0h+K0h)/4, > T_10_vs0h= (A10+D10+G10+K10)/4 - > (A0h+D0h+G0h+K0h)/4, > T_11_vs0h= (A11+D11+G11+K11)/4 - > (A0h+D0h+G0h+K0h)/4, > T_12_vs0h= (A12+D12+G12+K12)/4 - > (A0h+D0h+G0h+K0h)/4, > T_13_vs0h= (A13+D13+G13+K13)/4 - > (A0h+D0h+G0h+K0h)/4, > T_14_vs0h= (A14+D14+G14+K14)/4 - > (A0h+D0h+G0h+K0h)/4, > T_15_vs0h= (A15+D15+G15+K15)/4 - > (A0h+D0h+G0h+K0h)/4, > levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > >Most RNA samples are used twice (eg: the "3hour" sample from strain A >is used for both: A0h->A3h and: A3h->A6h), but some are used three >times (this would be the case for A3h if I also had a A3h->REF chip). > >Limma is unable to estimate coefficients for many of my spots, ie >fit2 $coefficients give "NA" rows for too many spots. If I understand >correctly, this could be because for example both "K0h" chips are >have 5000 flagged spots in common (indeed, several chips have >thousands of genepix auto-flagged weak spots). > >NAs are frustrating. One workaround that came up when consulting my >local statistician: to treat technical replicates and biological >replicates equally. But technical replicates have higher correlation >than biological replicates (I calculated a few correlations by hand): > - technical replicate (same biological sample, same dye, different >chip): median correlation = 0.55 > - biological replicate (different sample, but same timepoint and >dye, different chip): median correlation = 0.35 >Thus I think it would be incorrect to treat my technical replicates >as biological replicates. > >So I'm examining alternative means of creating a design matrix. > >Ideally, I'd like to use "strain" as a random effect here. *** Is it >correct that the only way to add a random effect is with lmFit's >block argument?: > lmFit(myMA, design=myDesign, block=factor(targets$Colony), >correlation=0.55) >duplicateCorrelation cannot be used on a complex design. *** Is it >correct to use the block argument on such a design, with a >correlation I calculated "by hand"? *** > >Alternatively, I'll just put everything as fixed factors, but it >feels wrong: > extraFactors = > model.matrix(~0+factor(targets$Strain)+factor(targets $Batch)) # > several strains, only two possible batches > colnames(extraFactors) = c(levels(factor(targets$Strain), "Batch") > design_bothFixed = cbind(modelMatrix(targets, ref="REF"), > extraFactors) >*** Would the design matrix I just created be a correct way of >proceeding? > >***Since limma is limited to a single random factor, how should I >choose which between Strain and Batch to use as fixed and which as >random? > >Thanks very much :o) > >Yannick > > >-------------------------------------------- > yannick . wurm @ unil . ch >Ant Genomics, Ecology & Evolution @ Lausanne > http://www.unil.ch/dee/page28685_fr.html > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD COMMENT

Login before adding your answer.

Traffic: 498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6