Microarray experiment design issues

0

Entering edit mode

Ali Tofigh ▴ 30

@ali-tofigh-4123

Last seen 9.6 years ago

Our goal is to measure the effects of a treatment on a specific cell line using gene expression microarrays (agilent 2-color). There are two possible experimental designs: A) perform the entire experiment in one day: split cells into 6 groups, treat 3 with compound and leave 3 untreated. This setup minimizes technical variation, but the list of differentially expressed genes will include some that are differentially expressed mainly due to the specific conditions on the day of the experiment (humidty levels, temperature, oxygen levels, etc). B) perform the experiment on three separate occasions: each day, split cells into two groups, treat only one with compound. An paired analyis would be appropriate here. This setup introduces noise (technical noise because of separate handling of the three pairs and noise from daily variation of the environmental conditions) and so we lose some statistical power. However, since the experiment is performed under slightly different environmental conditions, some of the condition-specific genes will no longer show up as differentially expressed and the list of genes would in this sense be more robust/reproducible. Does anyone have experience with both setups? I would like to know if the amount of variance that is introduced in setup B can be expected to be low enough to not lose too much power while producing a more robust set of differentially expressed genes. Cheers /Ali [[alternative HTML version deleted]]

• 1.0k views

ADD COMMENT • link updated 11.9 years ago by Wolfgang Huber ★ 13k • written 11.9 years ago by Ali Tofigh ▴ 30

0

Entering edit mode

David Westergaard ▴ 280

@david-westergaard-5119

Last seen 9.6 years ago

Hi Ali, I don't think this list is appropriate to answer these questions, since it doesn't generally involve any bioconductor packages. However, I don't really see why you would have a problem in A. Are both cell lines not exposed to the same variation in humidty levels, temperature, oxygen levels, etc, so that you would expect the same variation due to these factors in both treated and untreated, and thus the total variation due to these factors would be approximately zero? Also, could the technically noise not potentially cause too great a distance between arrays to pick up any variation, in setup B? I don't really have any experience with experimental setup, so the above comments are just my logical conclusions from working with microarray data. Best, David 2012/6/15 Ali Tofigh <alix.tofigh at="" gmail.com="">: > Our goal is to measure the effects of a treatment on a specific cell line > using gene expression microarrays (agilent 2-color). There are two possible > experimental designs: > > A) perform the entire experiment in one day: split cells into 6 groups, > treat 3 with compound and leave 3 untreated. This setup minimizes technical > variation, but the list of differentially expressed genes will include some > that are differentially expressed mainly due to the specific conditions on > the day of the experiment (humidty levels, temperature, oxygen levels, > etc). > > B) perform the experiment on three separate occasions: each day, split > cells into two groups, treat only one with compound. An paired analyis > would be appropriate here. This setup introduces noise (technical noise > because of separate handling of the three pairs and noise from daily > variation of the environmental conditions) and so we lose some statistical > power. However, since the experiment is performed under slightly different > environmental conditions, some of the condition-specific genes will no > longer show up as differentially expressed and the list of genes would in > this sense be more robust/reproducible. > > Does anyone have experience with both setups? I would like to know if the > amount of variance that is introduced in setup B can be expected to be low > enough to not lose too much power while producing a more robust set of > differentially expressed genes. > > Cheers > /Ali > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.9 years ago David Westergaard ▴ 280

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 16 days ago

EMBL European Molecular Biology Laborat…

Hi Ali I think it is fine to discuss this type of question on this list (and it seems to be consistent with the statement on http://www.bioconductor.org/help/mailing-list ). If your aim is to make a general statement about the compound effect, rather than on what happened on a particular day, and all other costs being equal, then B is preferable. I would not term this "loss of power": effects that you see in A but not in B are not easily reproducible, and thus plausibly of lower interest. However, I am not sure how important this issue is compared to many other of your choices, and the biases and errors that they might introduce, such as: choice of cell line, choice of compound dose and incubation time, the sensitivity and specificity of the particular array platform. If you are worried about robust inference, then perhaps you should also consider which of these factors need to be scanned. Most importantly, what will the resulting gene list be used for next? Nobody expects these lists to end up as "standalone truths", they usually have a purpose (e.g. hit picking for subsequent single gene research; search for biological themes and stories that give you warm fuzzy feeling; elucidation of the molecular target(s) of the drug, and perhaps again their downstream targets; clustering of drugs by similarity of the response; etc.) I often find that once you have sorted out these question, the data analytic strategy also becomes more apparent. Best wishes Wolfgang Ali Tofigh scripsit 06/15/2012 11:35 AM: > Our goal is to measure the effects of a treatment on a specific cell line > using gene expression microarrays (agilent 2-color). There are two possible > experimental designs: > > A) perform the entire experiment in one day: split cells into 6 groups, > treat 3 with compound and leave 3 untreated. This setup minimizes technical > variation, but the list of differentially expressed genes will include some > that are differentially expressed mainly due to the specific conditions on > the day of the experiment (humidty levels, temperature, oxygen levels, > etc). > > B) perform the experiment on three separate occasions: each day, split > cells into two groups, treat only one with compound. An paired analyis > would be appropriate here. This setup introduces noise (technical noise > because of separate handling of the three pairs and noise from daily > variation of the environmental conditions) and so we lose some statistical > power. However, since the experiment is performed under slightly different > environmental conditions, some of the condition-specific genes will no > longer show up as differentially expressed and the list of genes would in > this sense be more robust/reproducible. > > Does anyone have experience with both setups? I would like to know if the > amount of variance that is introduced in setup B can be expected to be low > enough to not lose too much power while producing a more robust set of > differentially expressed genes. > > Cheers > /Ali > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 11.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Thanks for the feedback! I'd like to explain what I meant with "loss of power" in setup B. Assume first that I had the resources to replicate setup A in three different days (three replications of the experiment, with three replicates each). For a gene that is "robustly" differentially expressed due to treatment-- here robust means that it will be differentially expressed irrespective of environmental conditions--I might get the following values (I'm just making up numbers to make a point): day1 T: 4.2, 3.8, 4.0 UnT: 2.0, 2.2, 2.1 day2 T: 4.5, 4.6, 4.3 UnT: 2.9, 3.2, 3.1 day3 T: 3.6, 3.4, 3.3 UnT: 1.8, 2.1, 2.2 In any of the days, I would be getting very strong indications of differential expression and low variances. But were I now to take one pair of arrays from each day as in setup B, I might end up with the following numbers: T: 4.2, 4.5, 3.6 UnT: 2.2, 3.2, 1.8 Although still different, we have introduce much more noise. Some genes with subtle but robust differences between treated and non-treated cells might then be lost in our analysis, which is why I would call it a loss of power. So I'm not worried about losing day-specific effects, but that I would have a less sensitive test for genes that are DE across all days. A paired analysis will probably be a good approach here to increase sensitivity. But another factor is that because the handling of the samples in different days will be more variable than the handling of samples within one day, that introduces yet another level of noise. So I guess my question boils down to whether or not setup B is a sensible alternative if I am more interested in the genes that are robustly differentially expressed. I can't quantify how many of the robustly DE genes I would lose in setup B compared to how many of the day-specific genes would go away. This makes it hard for me to predict the benefit of setup B, so if anyone has had any experience with these two setups, I would value your input. /ali On Fri, Jun 15, 2012 at 8:48 AM, Wolfgang Huber <whuber@embl.de> wrote: > Hi Ali > > I think it is fine to discuss this type of question on this list (and it > seems to be consistent with the statement on http://www.bioconductor.org/* > *help/mailing-list <http: www.bioconductor.org="" help="" mailing-list=""> ). > > If your aim is to make a general statement about the compound effect, > rather than on what happened on a particular day, and all other costs being > equal, then B is preferable. I would not term this "loss of power": effects > that you see in A but not in B are not easily reproducible, and thus > plausibly of lower interest. > > However, I am not sure how important this issue is compared to many other > of your choices, and the biases and errors that they might introduce, such > as: choice of cell line, choice of compound dose and incubation time, the > sensitivity and specificity of the particular array platform. If you are > worried about robust inference, then perhaps you should also consider which > of these factors need to be scanned. > > Most importantly, what will the resulting gene list be used for next? > Nobody expects these lists to end up as "standalone truths", they usually > have a purpose (e.g. hit picking for subsequent single gene research; > search for biological themes and stories that give you warm fuzzy feeling; > elucidation of the molecular target(s) of the drug, and perhaps again their > downstream targets; clustering of drugs by similarity of the response; > etc.) I often find that once you have sorted out these question, the data > analytic strategy also becomes more apparent. > > Best wishes > Wolfgang > > > > > > > > > Ali Tofigh scripsit 06/15/2012 11:35 AM: > >> Our goal is to measure the effects of a treatment on a specific cell line >> using gene expression microarrays (agilent 2-color). There are two >> possible >> experimental designs: >> >> A) perform the entire experiment in one day: split cells into 6 groups, >> treat 3 with compound and leave 3 untreated. This setup minimizes >> technical >> variation, but the list of differentially expressed genes will include >> some >> that are differentially expressed mainly due to the specific conditions on >> the day of the experiment (humidty levels, temperature, oxygen levels, >> etc). >> >> B) perform the experiment on three separate occasions: each day, split >> cells into two groups, treat only one with compound. An paired analyis >> would be appropriate here. This setup introduces noise (technical noise >> because of separate handling of the three pairs and noise from daily >> variation of the environmental conditions) and so we lose some statistical >> power. However, since the experiment is performed under slightly different >> environmental conditions, some of the condition-specific genes will no >> longer show up as differentially expressed and the list of genes would in >> this sense be more robust/reproducible. >> >> Does anyone have experience with both setups? I would like to know if the >> amount of variance that is introduced in setup B can be expected to be low >> enough to not lose too much power while producing a more robust set of >> differentially expressed genes. >> >> Cheers >> /Ali >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/**units/genome_biology/huber<http: www.="" embl.de="" research="" units="" genome_biology="" huber=""> > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 11.9 years ago Ali Tofigh ▴ 30

Login before adding your answer.