Nested design in limma

0

Entering edit mode

Caroline TRUNTZER ▴ 50

@caroline-truntzer-506

Last seen 9.7 years ago

Dear list, My question is a follow-up of the thread about handling nested design using limma posted by Tao Shi (please see https://stat.ethz.ch/pipermail/bioconductor/2007-January/015717.html). I have a data set which has a similar design as Tao Shi: 14 patients (7 in one group, 7 in another group), 2 biological samples for each patients (corresponding to 2 different extractions), and each extraction is hybridized to 2 arrays and I have triplicate sets of probes. I would like to identify genes that have differential expression between the 2 groups. I read the responses written to Tao on how to analyse this data set, but there are some things I didn't understand. The advice was to use avedups() to average over the triplicate probes, and then to treat the patients as biological replicates (as blocks using duplicateCorrelation). But by doing so I do not understand how the two other replication levels are treated, that is extraction and hybridization. Is it possible to keep the information of this two replication levels in the analysis? Is it possible to set different levels in blocks (given the help for the duplicateCorrelation fonction I think it is not possible but perhaps someone found a mean to do that)? Moreover I think I'm confused with what should be put in the design matrix and what should rather be put in the blocks vector. I'm sorry for this naive question... Thanks in advance for your help Caroline

• 717 views

ADD COMMENT • link updated 17.1 years ago by Kasper Daniel Hansen ★ 6.5k • written 17.1 years ago by Caroline TRUNTZER ▴ 50

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 10 months ago

United States

On Apr 17, 2007, at 2:23 AM, <caroline.truntzer at="" chu-lyon.fr=""> <caroline.truntzer at="" chu-lyon.fr=""> wrote: > Dear list, > My question is a follow-up of the thread about handling nested > design using > limma posted by Tao Shi (please see > https://stat.ethz.ch/pipermail/bioconductor/2007-January/015717.html). > I have a data set which has a similar design as Tao Shi: 14 > patients (7 in > one group, 7 in another group), 2 biological samples for each patients > (corresponding to 2 different extractions), and each extraction is > hybridized to 2 arrays and I have triplicate sets of probes. I > would like > to identify genes that have differential expression between the 2 > groups. > I read the responses written to Tao on how to analyse this data > set, but > there are some things I didn't understand. > The advice was to use avedups() to average over the triplicate > probes, and > then to treat the patients as biological replicates (as blocks using > duplicateCorrelation). But by doing so I do not understand how the two > other replication levels are treated, that is extraction and > hybridization. > Is it possible to keep the information of this two replication > levels in > the analysis? Is it possible to set different levels in blocks > (given the > help for the duplicateCorrelation fonction I think it is not > possible but > perhaps someone found a mean to do that)? > Moreover I think I'm confused with what should be put in the design > matrix > and what should rather be put in the blocks vector. I'm sorry for this > naive question... > Thanks in advance for your help > Caroline This will be a quick answer. You are right that you have many levels of dependency in your design: 3 probes measuring the same transcript, 2 samples per patient and 2 hybridizations per sample. That should (from a certain perspective) be analyzed using a model with several random effects (ie. several levels of dupCor). Unfortunately limma cannot handle more than one level, so in that case you need to focus on what dependency you think is most important to model. The recommendations in the thread you are referring to (which I only skimmed _very_ quickly) essentially deals with this question. Kasper

ADD COMMENT • link 17.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Dear Caroline, The key to the response about averaging is that in a purely nested completely balanced design like this, with random effects at each level but the highest, the analysis of each factor of the design depends only on the averages within the levels below. So, hypotheses about the differences between groups can answered using the genewise averages of all the observations for each patient. The levels of subsampling can be used to determine the main sources of variation in the study, which is useful for planning further studies, but not for testing differences between groups. If you need to understand the sources of variance in your study, you could handle this in limma by analyzing each group separately, level by level. Alternatively, you could use SAS to estimate the variance components for each level of replication. I think that MAANOVA in Bioconductor may also do this analysis, but I have not used it. --Naomi At 10:47 PM 4/17/2007, Kasper Daniel Hansen wrote: >On Apr 17, 2007, at 2:23 AM, <caroline.truntzer at="" chu-lyon.fr=""> ><caroline.truntzer at="" chu-lyon.fr=""> wrote: > > > Dear list, > > My question is a follow-up of the thread about handling nested > > design using > > limma posted by Tao Shi (please see > > https://stat.ethz.ch/pipermail/bioconductor/2007-January/015717.html). > > I have a data set which has a similar design as Tao Shi: 14 > > patients (7 in > > one group, 7 in another group), 2 biological samples for each patients > > (corresponding to 2 different extractions), and each extraction is > > hybridized to 2 arrays and I have triplicate sets of probes. I > > would like > > to identify genes that have differential expression between the 2 > > groups. > > I read the responses written to Tao on how to analyse this data > > set, but > > there are some things I didn't understand. > > The advice was to use avedups() to average over the triplicate > > probes, and > > then to treat the patients as biological replicates (as blocks using > > duplicateCorrelation). But by doing so I do not understand how the two > > other replication levels are treated, that is extraction and > > hybridization. > > Is it possible to keep the information of this two replication > > levels in > > the analysis? Is it possible to set different levels in blocks > > (given the > > help for the duplicateCorrelation fonction I think it is not > > possible but > > perhaps someone found a mean to do that)? > > Moreover I think I'm confused with what should be put in the design > > matrix > > and what should rather be put in the blocks vector. I'm sorry for this > > naive question... > > > Thanks in advance for your help > > Caroline > >This will be a quick answer. You are right that you have many levels >of dependency in your design: 3 probes measuring the same transcript, >2 samples per patient and 2 hybridizations per sample. That should >(from a certain perspective) be analyzed using a model with several >random effects (ie. several levels of dupCor). Unfortunately limma >cannot handle more than one level, so in that case you need to focus >on what dependency you think is most important to model. The >recommendations in the thread you are referring to (which I only >skimmed _very_ quickly) essentially deals with this question. > >Kasper > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 17.1 years ago Naomi Altman ★ 6.0k

Login before adding your answer.