Biological replication (was RNA degradation problem)

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

The question of what is appropriate biological replication is a tough one. The objective is to obtain results that are valid in the population of interest, which usually is not plants grown in a single batch in the green house. But how much variability should we induce? Each batch of plants grown separately but in the same building (different growth chambers), grown in different labs? different universities? In my very first Affy experiment, the investigator did the following: 2 batches of plants grown separately, 2 samples of plants from 1 of the batches, 2 microarrays from one of the samples, for 4 arrays in all. The correlation among the results was 2 arrays from same sample > 2 samples from same batch > 2 batches. This should be no surprise, even though we did not have enough replication to do any formal testing. I think at minimum that you want to achieve results that would be replicable within your own lab. That would suggest batches of plants grown separately from separate batches of seed. The best plan is a randomized complete block design, with every condition sampled in every block. If the conditions are tissues, this is readily achieved. Personally, I look at the density plots of the probes on the arrays. If they have the same "shape" (which is usually a unimodal distribution with long tail to the right on the log2 scale) then I cross my fingers (that is supposed to bring good luck) and use RMA. Most of the experiments I have been involved with using arabidopsis arrays have involved tissue differences, and the amount of differential expression has been huge on the probeset scale (over 60% of genes), but these probe densities have been pretty similar. --Naomi At 05:02 PM 1/19/2006, Matthew Hannah wrote: > > >________________________________ > >From: fhong at salk.edu [mailto:fhong at salk.edu] >Sent: Thu 19/01/2006 21:27 >To: Matthew Hannah >Cc: bioconductor at stat.math.ethz.ch >Subject: Re: [BioC] RNA degradation problem > > > >Hi Matthew, > >Thank you very much for your help. > > > >It's amazing how many > >> lab plant biologists see pooled samples from a bulk of plants grown at > >> the same time as biological replicates when they are clearly not. > >I would think that all plants under experiment shoudl be grown at the same > >time without different conditions/treatments. Biological replicates should > >be tissue samples from differnt groupd of plants, say sample from 50 > >plants as replicate1 and sample from another 50 as replicate 2. > >Do you think that biological replicates should be grown at different time? > > >Absolutely! Biological replication must be either single plants >grown in the same experiment (but noone wants to risk single plants >for arrays) or large pools of plants from INDEPENDENT experiments >(or the pools must be smaller than sample size - doesn't really >happen for arrays) otherwise what biological variability are you >sampling? Say you have 150 plants growing in the greenhouse and you >harvest 3 random pools of 50 as your 3 'biological replicates' then >you will have eliminated all variability from them and the arrays >will be as good as technical replicates and any statistical testing is invalid. > > >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by > >> plot(as.data.frame(exprs(eset.rma))) can answer in most cases for why it > >> didn't work, or won't work - in the rare case when someone asks for QC > > >before rather than after they realise the data is strange ;-) > >This actually pull out another question: when % of differential genes is > >large, which normalization better works better? >I've posted on this alot about 1.5 years ago, you should find it in >the archives - but simply noone knows or has tested it > > > >http://cactus.salk.edu/temp/QC_t.doc > >Take a look at the last plot, which clearly indicate homogeneous within > >replicates and heterogeneous among samples. > >(1) Will stem top and stem base differ so much? Or it is the preparation > >process bring in extra correlaton within replicates. > >(2) when % of differential genes is large, which normalization better > >works better? >Looking at these scatterplots, I can honestly say I've never seen so >much DE. I would be suprised if samples such as different stem >positions were so different. Something must be wrong with the >samples or sampling in my opinion. The scatterplots are slightly >more user friendly if you use pch="." > >HTH, > >Matt > > > > > > >-------------------- >Fangxin Hong Ph.D. >Plant Biology Laboratory >The Salk Institute >10010 N. Torrey Pines Rd. >La Jolla, CA 92037 >E-mail: fhong at salk.edu >(Phone): 858-453-4100 ext 1105 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

Normalization probe affy affyPLM Normalization probe affy affyPLM • 1.9k views

ADD COMMENT • link 18.3 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Suresh Gopalan ▴ 60

@suresh-gopalan-932

Last seen 9.6 years ago

I agree that independent replication is the best bet as of now, though it has the risk of introduction of new defects/hidden variables that influence the phenotype in question, which may indeed be relevant or constitute another line of study. If one decides to take this risk or for other reasons does experiments in perfectly identical (reproducible?) conditions and take replicates pooled from a very large population (50 plants each in 3 replicates) as mentioned below: if that removes some variability inherent to each plant, so be it. Isn't the goal to study the variable of interest masking the irrelevant variables (at least in that study)? How would this make statistical testing invalid? I wonder if in either case it is any different or worse than the normalization schemes and assumptions used in many of the currently used popular analysis or summary schemes? Suresh (Suresh Gopalan, Ph.D) ----- Original Message ----- From: "Matthew Hannah" <hannah@mpimp-golm.mpg.de> To: "Naomi Altman" <naomi at="" stat.psu.edu="">; <fhong at="" salk.edu=""> Cc: <bioconductor at="" stat.math.ethz.ch=""> Sent: Friday, January 20, 2006 6:02 AM Subject: Re: [BioC] Biological replication (was RNA degradation problem) >> >> The question of what is appropriate biological replication is >> a tough one. The objective is to obtain results that are >> valid in the population of interest, which usually is not >> plants grown in a single batch in the green house. But how >> much variability should we induce? Each batch of plants >> grown separately but in the same building (different growth >> chambers), grown in different labs? different universities? > > Yes, but this is more a question of 'some' biological replication versus > none. Obviously, if you have perfect reproducability of your growth > conditions then repeat experiments will have little influence, but in my > experience independent experiments suitably accounts for slight > environmental and sampling (eg:time) variability. Plants grown under the > same conditions are highly reproducable, so even the random block design > might not be ideal depending on what the environmental factors are - > light, water, temp etc.. I would always favour separate, independent > experiments. > > As for reproducability in general this is a problem. I'm sure in all > fields that some patterns found by a certain lab, labelling, scanning > etc.. will not be reproducible. For example, I wonder how many training > set - sample set molecular diagnosis studies would continue to work if > new independent data is introduced without updating the whole study. > >> In my very first Affy experiment, the investigator did the >> following: 2 batches of plants grown separately, 2 samples >> of plants from 1 of the batches, 2 microarrays from one of >> the samples, for 4 arrays in all. >> The correlation among the results was 2 arrays from same >> sample > 2 samples from same batch > 2 batches. This should >> be no surprise, even though we did not have enough >> replication to do any formal testing. >> >> I think at minimum that you want to achieve results that >> would be replicable within your own lab. That would suggest >> batches of plants grown separately from separate batches of seed. >> >> The best plan is a randomized complete block design, with >> every condition sampled in every block. If the conditions >> are tissues, this is readily achieved. > > I assume you mean random in each independent experiment, and then > independently repeated, in which case this is the best approach. > >> Personally, I look at the density plots of the probes on the >> arrays. If they have the same "shape" (which is usually a >> unimodal distribution with long tail to the right on the log2 >> scale) then I cross my fingers (that is supposed to bring >> good luck) and use RMA. Most of the experiments I have been >> involved with using arabidopsis arrays have involved tissue >> differences, and the amount of differential expression has >> been huge on the probeset scale (over 60% of genes), but >> these probe densities have been pretty similar. > > I always look at RNAdeg and PLM as well, but in most cases this is also > seen on the density plots. > > Cheers, > MAtt > >> >From: fhong at salk.edu [mailto:fhong at salk.edu] >> >Sent: Thu 19/01/2006 21:27 >> >To: Matthew Hannah >> >Cc: bioconductor at stat.math.ethz.ch >> >Subject: Re: [BioC] RNA degradation problem >> > >> > >> > >> >Hi Matthew, >> > >> >Thank you very much for your help. >> > >> > > >It's amazing how many >> > >> lab plant biologists see pooled samples from a bulk of >> plants grown >> > >> at the same time as biological replicates when they are >> clearly not. >> > >I would think that all plants under experiment shoudl be >> grown at the >> > >same time without different conditions/treatments. Biological >> > >replicates should be tissue samples from differnt groupd >> of plants, >> > >say sample from 50 plants as replicate1 and sample from >> another 50 as replicate 2. >> > >Do you think that biological replicates should be grown at >> different time? >> > >> > >> >Absolutely! Biological replication must be either single >> plants grown >> >in the same experiment (but noone wants to risk single plants for >> >arrays) or large pools of plants from INDEPENDENT >> experiments (or the >> >pools must be smaller than sample size - doesn't really happen for >> >arrays) otherwise what biological variability are you >> sampling? Say you >> >have 150 plants growing in the greenhouse and you harvest 3 random >> >pools of 50 as your 3 'biological replicates' then you will have >> >eliminated all variability from them and the arrays will be >> as good as >> >technical replicates and any statistical testing is invalid. >> > >> > >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by >> > >> plot(as.data.frame(exprs(eset.rma))) can answer in most >> cases for >> > >> why it didn't work, or won't work - in the rare case >> when someone >> > >> asks for QC >> > > >before rather than after they realise the data is strange ;-) >> > >This actually pull out another question: when % of >> differential genes >> > >is large, which normalization better works better? >> >I've posted on this alot about 1.5 years ago, you should >> find it in the >> >archives - but simply noone knows or has tested it >> > >> > >> > >http://cactus.salk.edu/temp/QC_t.doc >> > >Take a look at the last plot, which clearly indicate homogeneous >> > >within replicates and heterogeneous among samples. >> > >(1) Will stem top and stem base differ so much? Or it is the >> > >preparation process bring in extra correlaton within replicates. >> > >(2) when % of differential genes is large, which >> normalization better >> > >works better? >> >Looking at these scatterplots, I can honestly say I've never seen so >> >much DE. I would be suprised if samples such as different stem >> >positions were so different. Something must be wrong with >> the samples >> >or sampling in my opinion. The scatterplots are slightly more user >> >friendly if you use pch="." >> > >> >HTH, >> > >> >Matt >> > >> > >> > >> > >> > >> > >> >-------------------- >> >Fangxin Hong Ph.D. >> >Plant Biology Laboratory >> >The Salk Institute >> >10010 N. Torrey Pines Rd. >> >La Jolla, CA 92037 >> >E-mail: fhong at salk.edu >> >(Phone): 858-453-4100 ext 1105 >> > >> >_______________________________________________ >> >Bioconductor mailing list >> >Bioconductor at stat.math.ethz.ch >> >https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Naomi S. Altman 814-865-3791 (voice) >> Associate Professor >> Dept. of Statistics 814-863-7114 (fax) >> Penn State University 814-865-1348 >> (Statistics) >> University Park, PA 16802-2111 >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 18.3 years ago Suresh Gopalan ▴ 60

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060120/ d95d417c/attachment.pl

ADD REPLY • link 18.3 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 9.6 years ago

> > The question of what is appropriate biological replication is > a tough one. The objective is to obtain results that are > valid in the population of interest, which usually is not > plants grown in a single batch in the green house. But how > much variability should we induce? Each batch of plants > grown separately but in the same building (different growth > chambers), grown in different labs? different universities? Yes, but this is more a question of 'some' biological replication versus none. Obviously, if you have perfect reproducability of your growth conditions then repeat experiments will have little influence, but in my experience independent experiments suitably accounts for slight environmental and sampling (eg:time) variability. Plants grown under the same conditions are highly reproducable, so even the random block design might not be ideal depending on what the environmental factors are - light, water, temp etc.. I would always favour separate, independent experiments. As for reproducability in general this is a problem. I'm sure in all fields that some patterns found by a certain lab, labelling, scanning etc.. will not be reproducible. For example, I wonder how many training set - sample set molecular diagnosis studies would continue to work if new independent data is introduced without updating the whole study. > In my very first Affy experiment, the investigator did the > following: 2 batches of plants grown separately, 2 samples > of plants from 1 of the batches, 2 microarrays from one of > the samples, for 4 arrays in all. > The correlation among the results was 2 arrays from same > sample > 2 samples from same batch > 2 batches. This should > be no surprise, even though we did not have enough > replication to do any formal testing. > > I think at minimum that you want to achieve results that > would be replicable within your own lab. That would suggest > batches of plants grown separately from separate batches of seed. > > The best plan is a randomized complete block design, with > every condition sampled in every block. If the conditions > are tissues, this is readily achieved. I assume you mean random in each independent experiment, and then independently repeated, in which case this is the best approach. > Personally, I look at the density plots of the probes on the > arrays. If they have the same "shape" (which is usually a > unimodal distribution with long tail to the right on the log2 > scale) then I cross my fingers (that is supposed to bring > good luck) and use RMA. Most of the experiments I have been > involved with using arabidopsis arrays have involved tissue > differences, and the amount of differential expression has > been huge on the probeset scale (over 60% of genes), but > these probe densities have been pretty similar. I always look at RNAdeg and PLM as well, but in most cases this is also seen on the density plots. Cheers, MAtt > >From: fhong at salk.edu [mailto:fhong at salk.edu] > >Sent: Thu 19/01/2006 21:27 > >To: Matthew Hannah > >Cc: bioconductor at stat.math.ethz.ch > >Subject: Re: [BioC] RNA degradation problem > > > > > > > >Hi Matthew, > > > >Thank you very much for your help. > > > > > >It's amazing how many > > >> lab plant biologists see pooled samples from a bulk of > plants grown > > >> at the same time as biological replicates when they are > clearly not. > > >I would think that all plants under experiment shoudl be > grown at the > > >same time without different conditions/treatments. Biological > > >replicates should be tissue samples from differnt groupd > of plants, > > >say sample from 50 plants as replicate1 and sample from > another 50 as replicate 2. > > >Do you think that biological replicates should be grown at > different time? > > > > > >Absolutely! Biological replication must be either single > plants grown > >in the same experiment (but noone wants to risk single plants for > >arrays) or large pools of plants from INDEPENDENT > experiments (or the > >pools must be smaller than sample size - doesn't really happen for > >arrays) otherwise what biological variability are you > sampling? Say you > >have 150 plants growing in the greenhouse and you harvest 3 random > >pools of 50 as your 3 'biological replicates' then you will have > >eliminated all variability from them and the arrays will be > as good as > >technical replicates and any statistical testing is invalid. > > > > >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by > > >> plot(as.data.frame(exprs(eset.rma))) can answer in most > cases for > > >> why it didn't work, or won't work - in the rare case > when someone > > >> asks for QC > > > >before rather than after they realise the data is strange ;-) > > >This actually pull out another question: when % of > differential genes > > >is large, which normalization better works better? > >I've posted on this alot about 1.5 years ago, you should > find it in the > >archives - but simply noone knows or has tested it > > > > > > >http://cactus.salk.edu/temp/QC_t.doc > > >Take a look at the last plot, which clearly indicate homogeneous > > >within replicates and heterogeneous among samples. > > >(1) Will stem top and stem base differ so much? Or it is the > > >preparation process bring in extra correlaton within replicates. > > >(2) when % of differential genes is large, which > normalization better > > >works better? > >Looking at these scatterplots, I can honestly say I've never seen so > >much DE. I would be suprised if samples such as different stem > >positions were so different. Something must be wrong with > the samples > >or sampling in my opinion. The scatterplots are slightly more user > >friendly if you use pch="." > > > >HTH, > > > >Matt > > > > > > > > > > > > > >-------------------- > >Fangxin Hong Ph.D. > >Plant Biology Laboratory > >The Salk Institute > >10010 N. Torrey Pines Rd. > >La Jolla, CA 92037 > >E-mail: fhong at salk.edu > >(Phone): 858-453-4100 ext 1105 > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 > (Statistics) > University Park, PA 16802-2111 > >

ADD COMMENT • link 18.3 years ago Matthew Hannah ▴ 940

0

Entering edit mode

Henk van den Toorn ▴ 10

@henk-van-den-toorn-1575

Last seen 9.6 years ago

Maybe this is a problem that is not especially suited for the bioconductor mailing list, but it is an interesting problem that is probably useful for many of the participants. So, here's my 2 cents. >arrays) or large pools of plants from INDEPENDENT experiments (or the >pools must be smaller than sample size - doesn't really happen for >arrays) otherwise what biological variability are you sampling? I think the last question is very important. I guess you don't need to try to INCREASE the biological variability at all cost for a single experiment. If you would be interested in combining experiments of the same lab in a single analysis, it's probably wise to follow Naomi's advice to take different replicates. A problem that might arise, is that the "biological" variation is influenced by many circumstances inside a greenhouse or growth chamber. In our lab practice, it's clear that influences like the weather and the season have a profound influence on the biology of the plant, even though our plants are kept in climate controlled growth chambers. If you would use different batches of plants, you are actually confounding these factors to the batches of plants. By using different samples of plants, although unfortunately pooled in the same circumstances, you might actually block the circumstances for later analysis, if you're willing to go that far. I'm very interested to see what other people have to say about this! Henk van den Toorn, MSc bioinformatician, Molecular Genetics group, Utrecht University -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi Altman Sent: 20 January 2006 01:42 To: Matthew Hannah; fhong at salk.edu Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Biological replication (was RNA degradation problem) The question of what is appropriate biological replication is a tough one. The objective is to obtain results that are valid in the population of interest, which usually is not plants grown in a single batch in the green house. But how much variability should we induce? Each batch of plants grown separately but in the same building (different growth chambers), grown in different labs? different universities? In my very first Affy experiment, the investigator did the following: 2 batches of plants grown separately, 2 samples of plants from 1 of the batches, 2 microarrays from one of the samples, for 4 arrays in all. The correlation among the results was 2 arrays from same sample > 2 samples from same batch > 2 batches. This should be no surprise, even though we did not have enough replication to do any formal testing. I think at minimum that you want to achieve results that would be replicable within your own lab. That would suggest batches of plants grown separately from separate batches of seed. The best plan is a randomized complete block design, with every condition sampled in every block. If the conditions are tissues, this is readily achieved. Personally, I look at the density plots of the probes on the arrays. If they have the same "shape" (which is usually a unimodal distribution with long tail to the right on the log2 scale) then I cross my fingers (that is supposed to bring good luck) and use RMA. Most of the experiments I have been involved with using arabidopsis arrays have involved tissue differences, and the amount of differential expression has been huge on the probeset scale (over 60% of genes), but these probe densities have been pretty similar. --Naomi At 05:02 PM 1/19/2006, Matthew Hannah wrote: > > >________________________________ > >From: fhong at salk.edu [mailto:fhong at salk.edu] >Sent: Thu 19/01/2006 21:27 >To: Matthew Hannah >Cc: bioconductor at stat.math.ethz.ch >Subject: Re: [BioC] RNA degradation problem > > > >Hi Matthew, > >Thank you very much for your help. > > > >It's amazing how many > >> lab plant biologists see pooled samples from a bulk of plants grown > >> at the same time as biological replicates when they are clearly not. > >I would think that all plants under experiment shoudl be grown at the > >same time without different conditions/treatments. Biological > >replicates should be tissue samples from differnt groupd of plants, > >say sample from 50 plants as replicate1 and sample from another 50 as replicate 2. > >Do you think that biological replicates should be grown at different time? > > >Absolutely! Biological replication must be either single plants grown >in the same experiment (but noone wants to risk single plants for >arrays) or large pools of plants from INDEPENDENT experiments (or the >pools must be smaller than sample size - doesn't really happen for >arrays) otherwise what biological variability are you sampling? Say you >have 150 plants growing in the greenhouse and you harvest 3 random >pools of 50 as your 3 'biological replicates' then you will have >eliminated all variability from them and the arrays will be as good as >technical replicates and any statistical testing is invalid. > > >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by > >> plot(as.data.frame(exprs(eset.rma))) can answer in most cases for > >> why it didn't work, or won't work - in the rare case when someone > >> asks for QC > > >before rather than after they realise the data is strange ;-) > >This actually pull out another question: when % of differential genes > >is large, which normalization better works better? >I've posted on this alot about 1.5 years ago, you should find it in the >archives - but simply noone knows or has tested it > > > >http://cactus.salk.edu/temp/QC_t.doc > >Take a look at the last plot, which clearly indicate homogeneous > >within replicates and heterogeneous among samples. > >(1) Will stem top and stem base differ so much? Or it is the > >preparation process bring in extra correlaton within replicates. > >(2) when % of differential genes is large, which normalization better > >works better? >Looking at these scatterplots, I can honestly say I've never seen so >much DE. I would be suprised if samples such as different stem >positions were so different. Something must be wrong with the samples >or sampling in my opinion. The scatterplots are slightly more user >friendly if you use pch="." > >HTH, > >Matt > > > > > > >-------------------- >Fangxin Hong Ph.D. >Plant Biology Laboratory >The Salk Institute >10010 N. Torrey Pines Rd. >La Jolla, CA 92037 >E-mail: fhong at salk.edu >(Phone): 858-453-4100 ext 1105 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 18.3 years ago Henk van den Toorn ▴ 10

0

Entering edit mode

Thank you all for the useful input and interesting discussion. I agree with Henk, "biologicla replicates" means to include biological variation among individual plants, not the enviromental factors, such as growth chamber and climate. It is known that batch effect and lab effect are profound factors, which might, sometimes, block the true signals. Array experiments are still relatively expensive, we would prefer to eliminate enviromental factors (conduct experiments at the same time, same growth room) and include biological variation ( different plant samples as biological replicates). Fangxin > I think the last question is very important. I guess you don't need to try > to INCREASE the biological variability at all cost for a single > experiment. > If you would be interested in combining experiments of the same lab in a > single analysis, it's probably wise to follow Naomi's advice to take > different replicates. A problem that might arise, is that the "biological" > variation is influenced by many circumstances inside a greenhouse or > growth > chamber. In our lab practice, it's clear that influences like the weather > and the season have a profound influence on the biology of the plant, even > though our plants are kept in climate controlled growth chambers. If you > would use different batches of plants, you are actually confounding these > factors to the batches of plants. By using different samples of plants, > although unfortunately pooled in the same circumstances, you might > actually > block the circumstances for later analysis, if you're willing to go that > far. > I'm very interested to see what other people have to say about this! > > Henk van den Toorn, MSc > bioinformatician, Molecular Genetics group, Utrecht University > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi Altman > Sent: 20 January 2006 01:42 > To: Matthew Hannah; fhong at salk.edu > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Biological replication (was RNA degradation problem) > > The question of what is appropriate biological replication is a tough one. > The objective is to obtain results that are valid in the population of > interest, which usually is not plants grown in a single batch in the green > house. But how much variability should we induce? Each batch of plants > grown separately but in the same building (different growth chambers), > grown > in different labs? different universities? > > In my very first Affy experiment, the investigator did the > following: 2 batches of plants grown separately, 2 samples of plants from > 1 > of the batches, 2 microarrays from one of the samples, for 4 arrays in > all. > The correlation among the results was 2 arrays from same sample > 2 > samples > from same batch > 2 batches. This should be no surprise, even though we > did > not have enough replication to do any formal testing. > > I think at minimum that you want to achieve results that would be > replicable > within your own lab. That would suggest batches of plants grown > separately > from separate batches of seed. > > The best plan is a randomized complete block design, with every condition > sampled in every block. If the conditions are tissues, this is readily > achieved. > > Personally, I look at the density plots of the probes on the arrays. If > they have the same "shape" (which is usually a unimodal distribution with > long tail to the right on the log2 scale) then I cross my fingers (that is > supposed to bring good luck) and use RMA. Most of the experiments I have > been involved with using arabidopsis arrays have involved tissue > differences, and the amount of differential expression has been huge on > the > probeset scale (over 60% of genes), but these probe densities have been > pretty similar. > > --Naomi > > At 05:02 PM 1/19/2006, Matthew Hannah wrote: >> >> >>________________________________ >> >>From: fhong at salk.edu [mailto:fhong at salk.edu] >>Sent: Thu 19/01/2006 21:27 >>To: Matthew Hannah >>Cc: bioconductor at stat.math.ethz.ch >>Subject: Re: [BioC] RNA degradation problem >> >> >> >>Hi Matthew, >> >>Thank you very much for your help. >> >> > >It's amazing how many >> >> lab plant biologists see pooled samples from a bulk of plants grown >> >> at the same time as biological replicates when they are clearly not. >> >I would think that all plants under experiment shoudl be grown at the >> >same time without different conditions/treatments. Biological >> >replicates should be tissue samples from differnt groupd of plants, >> >say sample from 50 plants as replicate1 and sample from another 50 as > replicate 2. >> >Do you think that biological replicates should be grown at different > time? >> >> >>Absolutely! Biological replication must be either single plants grown >>in the same experiment (but noone wants to risk single plants for >>arrays) or large pools of plants from INDEPENDENT experiments (or the >>pools must be smaller than sample size - doesn't really happen for >>arrays) otherwise what biological variability are you sampling? Say you >>have 150 plants growing in the greenhouse and you harvest 3 random >>pools of 50 as your 3 'biological replicates' then you will have >>eliminated all variability from them and the arrays will be as good as >>technical replicates and any statistical testing is invalid. >> >> >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by >> >> plot(as.data.frame(exprs(eset.rma))) can answer in most cases for >> >> why it didn't work, or won't work - in the rare case when someone >> >> asks for QC >> > >before rather than after they realise the data is strange ;-) >> >This actually pull out another question: when % of differential genes >> >is large, which normalization better works better? >>I've posted on this alot about 1.5 years ago, you should find it in the >>archives - but simply noone knows or has tested it >> >> >> >http://cactus.salk.edu/temp/QC_t.doc >> >Take a look at the last plot, which clearly indicate homogeneous >> >within replicates and heterogeneous among samples. >> >(1) Will stem top and stem base differ so much? Or it is the >> >preparation process bring in extra correlaton within replicates. >> >(2) when % of differential genes is large, which normalization better >> >works better? >>Looking at these scatterplots, I can honestly say I've never seen so >>much DE. I would be suprised if samples such as different stem >>positions were so different. Something must be wrong with the samples >>or sampling in my opinion. The scatterplots are slightly more user >>friendly if you use pch="." >> >>HTH, >> >>Matt >> >> >> >> >> >> >>-------------------- >>Fangxin Hong Ph.D. >>Plant Biology Laboratory >>The Salk Institute >>10010 N. Torrey Pines Rd. >>La Jolla, CA 92037 >>E-mail: fhong at salk.edu >>(Phone): 858-453-4100 ext 1105 >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > -------------------- Fangxin Hong Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong at salk.edu (Phone): 858-453-4100 ext 1105

ADD REPLY • link 18.3 years ago Fangxin Hong ▴ 810

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 9.6 years ago

Lot of discussion here. A couple of points to make, mainly looking at the practical side. Perfectly reproducible growth conditions are almost non-existance. Climate controls are slightly influenced by the seasons, water status and light conditions depend on position (edge effects) and the presence of other plants (light absorbance/humidity). Finally, no big facility is immune to the odd aphid or spot of mildew from our biotic friends. However, most of these factors (especially after pooling) are more variable in 2 separate experiments than 2 batches grown at the same time. To be confident that an 'effect' is not dependent on these factors I would like to know it can be reproduced. This also applies to sampling - a certain stage or tissue should be able to be indentified and sampled at separate times on different plants to be confident that its definition is valid and that the results obtained would be reproducible to someone repeating the experiment. ie: the variability would also measure how well YOU can sample what you are claiming to be looking at! As for pooling from the same large experiment, just to make things clear - I'm talking about large pools from 1 big batch of plants. Randomised block design (Naomi's discussion) can obviously be valid if all grown at the same time but practically it also depends on the variable factors. If there are 3 trays next to each other are there real block effects? - eg: water should be a block effect but what if light is more variable (eg: front-middle-back) and the watering is highly controlled? The best way to avoid this is different positions or chambers or better, different times. Also from a cost point of view it seems a waste of money to hybridise the same plant sample to replicate affy arrays when affy technical replica are no longer deemed useful. If you harvest 2 random sets of 50 plants from the same group then you will get a R2 of >0.995 unless there was a technical problem, save the money and design a better experiment. To my understanding a statistical test on such material will be invalid as you are allowing the test to use essentially technical variability (post-plant growth) as an estimate of biological variability. Cheers, Matt > -----Original Message----- > From: Suresh Gopalan [mailto:gopalans at comcast.net] > Sent: 20 January 2006 15:21 > To: Matthew Hannah; Naomi Altman; fhong at salk.edu > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Biological replication (was RNA > degradation problem) > > I agree that independent replication is the best bet as of > now, though it has the risk of introduction of new > defects/hidden variables that influence the phenotype in > question, which may indeed be relevant or constitute another > line of study. > > If one decides to take this risk or for other reasons does > experiments in perfectly identical (reproducible?) conditions > and take replicates pooled from a very large population (50 > plants each in 3 replicates) as mentioned > below: if that removes some variability inherent to each > plant, so be it. > Isn't the goal to study the variable of interest masking the > irrelevant variables (at least in that study)? How would > this make statistical testing invalid? > > I wonder if in either case it is any different or worse than > the normalization schemes and assumptions used in many of the > currently used popular analysis or summary schemes? > > Suresh > > (Suresh Gopalan, Ph.D) > > > ----- Original Message ----- > From: "Matthew Hannah" <hannah at="" mpimp-golm.mpg.de=""> > To: "Naomi Altman" <naomi at="" stat.psu.edu="">; <fhong at="" salk.edu=""> > Cc: <bioconductor at="" stat.math.ethz.ch=""> > Sent: Friday, January 20, 2006 6:02 AM > Subject: Re: [BioC] Biological replication (was RNA > degradation problem) > > > >> > >> The question of what is appropriate biological replication is > >> a tough one. The objective is to obtain results that are > >> valid in the population of interest, which usually is not > >> plants grown in a single batch in the green house. But how > >> much variability should we induce? Each batch of plants > >> grown separately but in the same building (different growth > >> chambers), grown in different labs? different universities? > > > > Yes, but this is more a question of 'some' biological > replication versus > > none. Obviously, if you have perfect reproducability of your growth > > conditions then repeat experiments will have little > influence, but in my > > experience independent experiments suitably accounts for slight > > environmental and sampling (eg:time) variability. Plants > grown under the > > same conditions are highly reproducable, so even the random > block design > > might not be ideal depending on what the environmental factors are - > > light, water, temp etc.. I would always favour separate, independent > > experiments. > > > > As for reproducability in general this is a problem. I'm sure in all > > fields that some patterns found by a certain lab, > labelling, scanning > > etc.. will not be reproducible. For example, I wonder how > many training > > set - sample set molecular diagnosis studies would continue > to work if > > new independent data is introduced without updating the whole study. > > > >> In my very first Affy experiment, the investigator did the > >> following: 2 batches of plants grown separately, 2 samples > >> of plants from 1 of the batches, 2 microarrays from one of > >> the samples, for 4 arrays in all. > >> The correlation among the results was 2 arrays from same > >> sample > 2 samples from same batch > 2 batches. This should > >> be no surprise, even though we did not have enough > >> replication to do any formal testing. > >> > >> I think at minimum that you want to achieve results that > >> would be replicable within your own lab. That would suggest > >> batches of plants grown separately from separate batches of seed. > >> > >> The best plan is a randomized complete block design, with > >> every condition sampled in every block. If the conditions > >> are tissues, this is readily achieved. > > > > I assume you mean random in each independent experiment, and then > > independently repeated, in which case this is the best approach. > > > >> Personally, I look at the density plots of the probes on the > >> arrays. If they have the same "shape" (which is usually a > >> unimodal distribution with long tail to the right on the log2 > >> scale) then I cross my fingers (that is supposed to bring > >> good luck) and use RMA. Most of the experiments I have been > >> involved with using arabidopsis arrays have involved tissue > >> differences, and the amount of differential expression has > >> been huge on the probeset scale (over 60% of genes), but > >> these probe densities have been pretty similar. > > > > I always look at RNAdeg and PLM as well, but in most cases > this is also > > seen on the density plots. > > > > Cheers, > > MAtt > > > >> >From: fhong at salk.edu [mailto:fhong at salk.edu] > >> >Sent: Thu 19/01/2006 21:27 > >> >To: Matthew Hannah > >> >Cc: bioconductor at stat.math.ethz.ch > >> >Subject: Re: [BioC] RNA degradation problem > >> > > >> > > >> > > >> >Hi Matthew, > >> > > >> >Thank you very much for your help. > >> > > >> > > >It's amazing how many > >> > >> lab plant biologists see pooled samples from a bulk of > >> plants grown > >> > >> at the same time as biological replicates when they are > >> clearly not. > >> > >I would think that all plants under experiment shoudl be > >> grown at the > >> > >same time without different conditions/treatments. Biological > >> > >replicates should be tissue samples from differnt groupd > >> of plants, > >> > >say sample from 50 plants as replicate1 and sample from > >> another 50 as replicate 2. > >> > >Do you think that biological replicates should be grown at > >> different time? > >> > > >> > > >> >Absolutely! Biological replication must be either single > >> plants grown > >> >in the same experiment (but noone wants to risk single plants for > >> >arrays) or large pools of plants from INDEPENDENT > >> experiments (or the > >> >pools must be smaller than sample size - doesn't really happen for > >> >arrays) otherwise what biological variability are you > >> sampling? Say you > >> >have 150 plants growing in the greenhouse and you harvest 3 random > >> >pools of 50 as your 3 'biological replicates' then you will have > >> >eliminated all variability from them and the arrays will be > >> as good as > >> >technical replicates and any statistical testing is invalid. > >> > > >> > >> I find hist, RNA deg, AffyPLM and a simple RMA norm > followed by > >> > >> plot(as.data.frame(exprs(eset.rma))) can answer in most > >> cases for > >> > >> why it didn't work, or won't work - in the rare case > >> when someone > >> > >> asks for QC > >> > > >before rather than after they realise the data is strange ;-) > >> > >This actually pull out another question: when % of > >> differential genes > >> > >is large, which normalization better works better? > >> >I've posted on this alot about 1.5 years ago, you should > >> find it in the > >> >archives - but simply noone knows or has tested it > >> > > >> > > >> > >http://cactus.salk.edu/temp/QC_t.doc > >> > >Take a look at the last plot, which clearly indicate homogeneous > >> > >within replicates and heterogeneous among samples. > >> > >(1) Will stem top and stem base differ so much? Or it is the > >> > >preparation process bring in extra correlaton within replicates. > >> > >(2) when % of differential genes is large, which > >> normalization better > >> > >works better? > >> >Looking at these scatterplots, I can honestly say I've > never seen so > >> >much DE. I would be suprised if samples such as different stem > >> >positions were so different. Something must be wrong with > >> the samples > >> >or sampling in my opinion. The scatterplots are slightly more user > >> >friendly if you use pch="." > >> > > >> >HTH, > >> > > >> >Matt > >> > > >> > > >> > > >> > > >> > > >> > > >> >-------------------- > >> >Fangxin Hong Ph.D. > >> >Plant Biology Laboratory > >> >The Salk Institute > >> >10010 N. Torrey Pines Rd. > >> >La Jolla, CA 92037 > >> >E-mail: fhong at salk.edu > >> >(Phone): 858-453-4100 ext 1105 > >> > > >> >_______________________________________________ > >> >Bioconductor mailing list > >> >Bioconductor at stat.math.ethz.ch > >> >https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > >> Naomi S. Altman 814-865-3791 (voice) > >> Associate Professor > >> Dept. of Statistics 814-863-7114 (fax) > >> Penn State University 814-865-1348 > >> (Statistics) > >> University Park, PA 16802-2111 > >> > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > >

ADD COMMENT • link 18.3 years ago Matthew Hannah ▴ 940

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

PCR is also noisy. --Naomi At 09:51 AM 1/22/2006, Matthew Hannah wrote: >This is interesting and there are certainly contrasting views. It is >also certainly not a bioC issue but is of interest. At the risk of >dragging it on I don't think this statement should be left without the >comments below. > > > -----Original Message----- > > From: fhong at salk.edu [mailto:fhong at salk.edu] > > Sent: 20 January 2006 20:12 > > To: Henk van den Toorn > > Cc: 'Naomi Altman'; Matthew Hannah; bioconductor at stat.math.ethz.ch > > Subject: RE: [BioC] Biological replication (was RNA > > degradation problem) > > > > Thank you all for the useful input and interesting discussion. > > > > I agree with Henk, "biologicla replicates" means to include > > biological variation among individual plants, not the > > enviromental factors, such as growth chamber and climate. It > > is known that batch effect and lab effect are profound > > factors, which might, sometimes, block the true signals. > > Array experiments are still relatively expensive, we would > > prefer to eliminate enviromental factors (conduct experiments > > at the same time, same growth room) and include biological > > variation ( different plant samples as biological replicates). > >I strongly believe that if you cannot prove that your results are >reproducible in at least 2 independent experiments then any >interpretation of such results is far from conclusive. > >Microarrays are expensive, but many other methods are also expensive and >time-consuming but this does not exempt them from being shown to be >reproducible, it is also no more expensive (array-cost) to use 3 >experiments versus 3 samples from one experiment. If biological >reproducibility was not an issue then why would we be using replicate >plants or experiments to measure metabolites, plant growth, etc... >rather than taking a single measurement on a huge pool of plants? > >What I do agree with is that you are interested in the biological >variation and not studying environmental factors having spurious effects >on your results. But what is also obvious is that one batch of plants >grown at a single time in a single place is much more likely to yield >results where the 'biological factor' of interest is affected by a >biological factor-environment interaction. eg: plants with higher sugar >content may be more attractive to aphid attack, different Arabidopsis >ecotypes have differential sensitivity to mildew, or stress such as poor >watering + many other less observable interactions. > >The final point is on when biological replication becomes technical >replication. It is obvious that if you take sufficiently large repeated >samples from a population that those samples will have an extremely high >probability of being 'almost' identical or put differently - essentially >the same as harvesting all of them together, grinding them and then >taking 2 aliquots of the material (ie:technical replica). Eg: if you >split a group of 1000 plants into 2 pools of 500 do you believe there >would be any difference between them compared to aliquoting the 1000 >once ground? I think that 50 plants is already far beyond the point >where two pools of plants are essentially identical. In my experience, >when grown in randomised blocks in the same batch, 5-10 replicate plants >are usually sufficient to get virtually identical mean values for many >biological measurements. So does it then make sense to hybridise >'identical' samples and call them 'biological replicates', which in >addition could be misleading to the reader who understands that to mean >something quite different. > >Having said all that, 'IF' you just want to identify a few, highly >changed, candidate genes that will be followed up (in independent >experiments), then independent array experiments are obviously not >essential. However, on the 'arrays are expensive' point I would be >interested if anyone had data to show how cost-effective using pooled >samples from the same experiment is in reducing the work for Q-PCR >verification. ie: the % confirmation rate for using genes selected based >on 1, 2 or 3 arrays. > >Cheers, >MAtt > > > > > > > I think the last question is very important. I guess you > > don't need to > > > try to INCREASE the biological variability at all cost for a single > > > experiment. > > > If you would be interested in combining experiments of the > > same lab in > > > a single analysis, it's probably wise to follow Naomi's > > advice to take > > > different replicates. A problem that might arise, is that > > the "biological" > > > variation is influenced by many circumstances inside a > > greenhouse or > > > growth chamber. In our lab practice, it's clear that > > influences like > > > the weather and the season have a profound influence on the > > biology of > > > the plant, even though our plants are kept in climate controlled > > > growth chambers. If you would use different batches of > > plants, you are > > > actually confounding these factors to the batches of > > plants. By using > > > different samples of plants, although unfortunately pooled > > in the same > > > circumstances, you might actually block the circumstances for later > > > analysis, if you're willing to go that far. > > > > > > > > > I'm very interested to see what other people have to say about this! > > > > > > Henk van den Toorn, MSc > > > bioinformatician, Molecular Genetics group, Utrecht University > > > > > > > > > -----Original Message----- > > > From: bioconductor-bounces at stat.math.ethz.ch > > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi > > > Altman > > > Sent: 20 January 2006 01:42 > > > To: Matthew Hannah; fhong at salk.edu > > > Cc: bioconductor at stat.math.ethz.ch > > > Subject: Re: [BioC] Biological replication (was RNA degradation > > > problem) > > > > > > The question of what is appropriate biological replication > > is a tough one. > > > The objective is to obtain results that are valid in the > > population of > > > interest, which usually is not plants grown in a single > > batch in the > > > green house. But how much variability should we induce? > > Each batch > > > of plants grown separately but in the same building > > (different growth > > > chambers), grown in different labs? different universities? > > > > > > In my very first Affy experiment, the investigator did the > > > following: 2 batches of plants grown separately, 2 samples > > of plants > > > from > > > 1 > > > of the batches, 2 microarrays from one of the samples, for > > 4 arrays in > > > all. > > > The correlation among the results was 2 arrays from same sample > 2 > > > samples from same batch > 2 batches. This should be no > > surprise, even > > > though we did not have enough replication to do any formal testing. > > > > > > I think at minimum that you want to achieve results that would be > > > replicable within your own lab. That would suggest batches > > of plants > > > grown separately from separate batches of seed. > > > > > > The best plan is a randomized complete block design, with every > > > condition sampled in every block. If the conditions are > > tissues, this > > > is readily achieved. > > > > > > Personally, I look at the density plots of the probes on > > the arrays. > > > If they have the same "shape" (which is usually a unimodal > > > distribution with long tail to the right on the log2 scale) then I > > > cross my fingers (that is supposed to bring good luck) and > > use RMA. > > > Most of the experiments I have been involved with using arabidopsis > > > arrays have involved tissue differences, and the amount of > > > differential expression has been huge on the probeset scale > > (over 60% > > > of genes), but these probe densities have been pretty similar. > > > > > > --Naomi > > > > > > At 05:02 PM 1/19/2006, Matthew Hannah wrote: > > >> > > >> > > >>________________________________ > > >> > > >>From: fhong at salk.edu [mailto:fhong at salk.edu] > > >>Sent: Thu 19/01/2006 21:27 > > >>To: Matthew Hannah > > >>Cc: bioconductor at stat.math.ethz.ch > > >>Subject: Re: [BioC] RNA degradation problem > > >> > > >> > > >> > > >>Hi Matthew, > > >> > > >>Thank you very much for your help. > > >> > > >> > >It's amazing how many > > >> >> lab plant biologists see pooled samples from a bulk of plants > > >> >> grown at the same time as biological replicates when > > they are clearly not. > > >> >I would think that all plants under experiment shoudl be grown at > > >> >the same time without different conditions/treatments. Biological > > >> >replicates should be tissue samples from differnt groupd > > of plants, > > >> >say sample from 50 plants as replicate1 and sample from > > another 50 > > >> >as > > > replicate 2. > > >> >Do you think that biological replicates should be grown > > at different > > > time? > > >> > > >> > > >>Absolutely! Biological replication must be either single > > plants grown > > >>in the same experiment (but noone wants to risk single plants for > > >>arrays) or large pools of plants from INDEPENDENT > > experiments (or the > > >>pools must be smaller than sample size - doesn't really happen for > > >>arrays) otherwise what biological variability are you sampling? Say > > >>you have 150 plants growing in the greenhouse and you > > harvest 3 random > > >>pools of 50 as your 3 'biological replicates' then you will have > > >>eliminated all variability from them and the arrays will be > > as good as > > >>technical replicates and any statistical testing is invalid. > > >> > > >> >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by > > >> >> plot(as.data.frame(exprs(eset.rma))) can answer in most > > cases for > > >> >> why it didn't work, or won't work - in the rare case > > when someone > > >> >> asks for QC > > >> > >before rather than after they realise the data is strange ;-) > > >> >This actually pull out another question: when % of differential > > >> >genes is large, which normalization better works better? > > >>I've posted on this alot about 1.5 years ago, you should find it in > > >>the archives - but simply noone knows or has tested it > > >> > > >> > > >> >http://cactus.salk.edu/temp/QC_t.doc > > >> >Take a look at the last plot, which clearly indicate homogeneous > > >> >within replicates and heterogeneous among samples. > > >> >(1) Will stem top and stem base differ so much? Or it is the > > >> >preparation process bring in extra correlaton within replicates. > > >> >(2) when % of differential genes is large, which normalization > > >> >better works better? > > >>Looking at these scatterplots, I can honestly say I've > > never seen so > > >>much DE. I would be suprised if samples such as different stem > > >>positions were so different. Something must be wrong with > > the samples > > >>or sampling in my opinion. The scatterplots are slightly more user > > >>friendly if you use pch="." > > >> > > >>HTH, > > >> > > >>Matt > > >> > > >> > > >> > > >> > > >> > > >> > > >>-------------------- > > >>Fangxin Hong Ph.D. > > >>Plant Biology Laboratory > > >>The Salk Institute > > >>10010 N. Torrey Pines Rd. > > >>La Jolla, CA 92037 > > >>E-mail: fhong at salk.edu > > >>(Phone): 858-453-4100 ext 1105 > > >> > > >>_______________________________________________ > > >>Bioconductor mailing list > > >>Bioconductor at stat.math.ethz.ch > > >>https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > Naomi S. Altman 814-865-3791 (voice) > > > Associate Professor > > > Dept. of Statistics 814-863-7114 (fax) > > > Penn State University 814-865-1348 > > (Statistics) > > > University Park, PA 16802-2111 > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > > > > -------------------- > > Fangxin Hong Ph.D. > > Plant Biology Laboratory > > The Salk Institute > > 10010 N. Torrey Pines Rd. > > La Jolla, CA 92037 > > E-mail: fhong at salk.edu > > (Phone): 858-453-4100 ext 1105 > > > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 18.3 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 9.6 years ago

This is interesting and there are certainly contrasting views. It is also certainly not a bioC issue but is of interest. At the risk of dragging it on I don't think this statement should be left without the comments below. > -----Original Message----- > From: fhong at salk.edu [mailto:fhong at salk.edu] > Sent: 20 January 2006 20:12 > To: Henk van den Toorn > Cc: 'Naomi Altman'; Matthew Hannah; bioconductor at stat.math.ethz.ch > Subject: RE: [BioC] Biological replication (was RNA > degradation problem) > > Thank you all for the useful input and interesting discussion. > > I agree with Henk, "biologicla replicates" means to include > biological variation among individual plants, not the > enviromental factors, such as growth chamber and climate. It > is known that batch effect and lab effect are profound > factors, which might, sometimes, block the true signals. > Array experiments are still relatively expensive, we would > prefer to eliminate enviromental factors (conduct experiments > at the same time, same growth room) and include biological > variation ( different plant samples as biological replicates). I strongly believe that if you cannot prove that your results are reproducible in at least 2 independent experiments then any interpretation of such results is far from conclusive. Microarrays are expensive, but many other methods are also expensive and time-consuming but this does not exempt them from being shown to be reproducible, it is also no more expensive (array-cost) to use 3 experiments versus 3 samples from one experiment. If biological reproducibility was not an issue then why would we be using replicate plants or experiments to measure metabolites, plant growth, etc... rather than taking a single measurement on a huge pool of plants? What I do agree with is that you are interested in the biological variation and not studying environmental factors having spurious effects on your results. But what is also obvious is that one batch of plants grown at a single time in a single place is much more likely to yield results where the 'biological factor' of interest is affected by a biological factor-environment interaction. eg: plants with higher sugar content may be more attractive to aphid attack, different Arabidopsis ecotypes have differential sensitivity to mildew, or stress such as poor watering + many other less observable interactions. The final point is on when biological replication becomes technical replication. It is obvious that if you take sufficiently large repeated samples from a population that those samples will have an extremely high probability of being 'almost' identical or put differently - essentially the same as harvesting all of them together, grinding them and then taking 2 aliquots of the material (ie:technical replica). Eg: if you split a group of 1000 plants into 2 pools of 500 do you believe there would be any difference between them compared to aliquoting the 1000 once ground? I think that 50 plants is already far beyond the point where two pools of plants are essentially identical. In my experience, when grown in randomised blocks in the same batch, 5-10 replicate plants are usually sufficient to get virtually identical mean values for many biological measurements. So does it then make sense to hybridise 'identical' samples and call them 'biological replicates', which in addition could be misleading to the reader who understands that to mean something quite different. Having said all that, 'IF' you just want to identify a few, highly changed, candidate genes that will be followed up (in independent experiments), then independent array experiments are obviously not essential. However, on the 'arrays are expensive' point I would be interested if anyone had data to show how cost-effective using pooled samples from the same experiment is in reducing the work for Q-PCR verification. ie: the % confirmation rate for using genes selected based on 1, 2 or 3 arrays. Cheers, MAtt > > I think the last question is very important. I guess you > don't need to > > try to INCREASE the biological variability at all cost for a single > > experiment. > > If you would be interested in combining experiments of the > same lab in > > a single analysis, it's probably wise to follow Naomi's > advice to take > > different replicates. A problem that might arise, is that > the "biological" > > variation is influenced by many circumstances inside a > greenhouse or > > growth chamber. In our lab practice, it's clear that > influences like > > the weather and the season have a profound influence on the > biology of > > the plant, even though our plants are kept in climate controlled > > growth chambers. If you would use different batches of > plants, you are > > actually confounding these factors to the batches of > plants. By using > > different samples of plants, although unfortunately pooled > in the same > > circumstances, you might actually block the circumstances for later > > analysis, if you're willing to go that far. > > > > > I'm very interested to see what other people have to say about this! > > > > Henk van den Toorn, MSc > > bioinformatician, Molecular Genetics group, Utrecht University > > > > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi > > Altman > > Sent: 20 January 2006 01:42 > > To: Matthew Hannah; fhong at salk.edu > > Cc: bioconductor at stat.math.ethz.ch > > Subject: Re: [BioC] Biological replication (was RNA degradation > > problem) > > > > The question of what is appropriate biological replication > is a tough one. > > The objective is to obtain results that are valid in the > population of > > interest, which usually is not plants grown in a single > batch in the > > green house. But how much variability should we induce? > Each batch > > of plants grown separately but in the same building > (different growth > > chambers), grown in different labs? different universities? > > > > In my very first Affy experiment, the investigator did the > > following: 2 batches of plants grown separately, 2 samples > of plants > > from > > 1 > > of the batches, 2 microarrays from one of the samples, for > 4 arrays in > > all. > > The correlation among the results was 2 arrays from same sample > 2 > > samples from same batch > 2 batches. This should be no > surprise, even > > though we did not have enough replication to do any formal testing. > > > > I think at minimum that you want to achieve results that would be > > replicable within your own lab. That would suggest batches > of plants > > grown separately from separate batches of seed. > > > > The best plan is a randomized complete block design, with every > > condition sampled in every block. If the conditions are > tissues, this > > is readily achieved. > > > > Personally, I look at the density plots of the probes on > the arrays. > > If they have the same "shape" (which is usually a unimodal > > distribution with long tail to the right on the log2 scale) then I > > cross my fingers (that is supposed to bring good luck) and > use RMA. > > Most of the experiments I have been involved with using arabidopsis > > arrays have involved tissue differences, and the amount of > > differential expression has been huge on the probeset scale > (over 60% > > of genes), but these probe densities have been pretty similar. > > > > --Naomi > > > > At 05:02 PM 1/19/2006, Matthew Hannah wrote: > >> > >> > >>________________________________ > >> > >>From: fhong at salk.edu [mailto:fhong at salk.edu] > >>Sent: Thu 19/01/2006 21:27 > >>To: Matthew Hannah > >>Cc: bioconductor at stat.math.ethz.ch > >>Subject: Re: [BioC] RNA degradation problem > >> > >> > >> > >>Hi Matthew, > >> > >>Thank you very much for your help. > >> > >> > >It's amazing how many > >> >> lab plant biologists see pooled samples from a bulk of plants > >> >> grown at the same time as biological replicates when > they are clearly not. > >> >I would think that all plants under experiment shoudl be grown at > >> >the same time without different conditions/treatments. Biological > >> >replicates should be tissue samples from differnt groupd > of plants, > >> >say sample from 50 plants as replicate1 and sample from > another 50 > >> >as > > replicate 2. > >> >Do you think that biological replicates should be grown > at different > > time? > >> > >> > >>Absolutely! Biological replication must be either single > plants grown > >>in the same experiment (but noone wants to risk single plants for > >>arrays) or large pools of plants from INDEPENDENT > experiments (or the > >>pools must be smaller than sample size - doesn't really happen for > >>arrays) otherwise what biological variability are you sampling? Say > >>you have 150 plants growing in the greenhouse and you > harvest 3 random > >>pools of 50 as your 3 'biological replicates' then you will have > >>eliminated all variability from them and the arrays will be > as good as > >>technical replicates and any statistical testing is invalid. > >> > >> >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by > >> >> plot(as.data.frame(exprs(eset.rma))) can answer in most > cases for > >> >> why it didn't work, or won't work - in the rare case > when someone > >> >> asks for QC > >> > >before rather than after they realise the data is strange ;-) > >> >This actually pull out another question: when % of differential > >> >genes is large, which normalization better works better? > >>I've posted on this alot about 1.5 years ago, you should find it in > >>the archives - but simply noone knows or has tested it > >> > >> > >> >http://cactus.salk.edu/temp/QC_t.doc > >> >Take a look at the last plot, which clearly indicate homogeneous > >> >within replicates and heterogeneous among samples. > >> >(1) Will stem top and stem base differ so much? Or it is the > >> >preparation process bring in extra correlaton within replicates. > >> >(2) when % of differential genes is large, which normalization > >> >better works better? > >>Looking at these scatterplots, I can honestly say I've > never seen so > >>much DE. I would be suprised if samples such as different stem > >>positions were so different. Something must be wrong with > the samples > >>or sampling in my opinion. The scatterplots are slightly more user > >>friendly if you use pch="." > >> > >>HTH, > >> > >>Matt > >> > >> > >> > >> > >> > >> > >>-------------------- > >>Fangxin Hong Ph.D. > >>Plant Biology Laboratory > >>The Salk Institute > >>10010 N. Torrey Pines Rd. > >>La Jolla, CA 92037 > >>E-mail: fhong at salk.edu > >>(Phone): 858-453-4100 ext 1105 > >> > >>_______________________________________________ > >>Bioconductor mailing list > >>Bioconductor at stat.math.ethz.ch > >>https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 > (Statistics) > > University Park, PA 16802-2111 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > -------------------- > Fangxin Hong Ph.D. > Plant Biology Laboratory > The Salk Institute > 10010 N. Torrey Pines Rd. > La Jolla, CA 92037 > E-mail: fhong at salk.edu > (Phone): 858-453-4100 ext 1105 > >

ADD COMMENT • link 18.3 years ago Matthew Hannah ▴ 940

Login before adding your answer.