Question

Expt. design question on optimal number of replicates (in edgeR or elsewhere)

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 11 hours ago

WEHI, Melbourne, Australia

Dear Gowthaman, There's no rigorous answer to this question, because it depends on the variability of your population, how large the fold changes are that you want to detect, how many genes you need to find, what FDR you can tolerate, etc etc, and it's impossible to know all these things in advance. However, we can learn from experience. At our institute, we regularly undertake mRNA-seq experiments using genetically identical mice for which we can keep the biological coefficient of variation between replicates down to about 10%. For these experiments, three or four biological replicates per group works well. If you can keep the same consistency, and the expression changes you want to detect are not too small, then three or four can work well for you also. If you work with more heterogeneous organisms (like humans), or use less well controlled protocols (like aggressive RNA amplification), or look for subtle fold changes, larger numbers will likely be needed. Best wishes Gordon > Date: Fri, 25 May 2012 09:33:45 -0700 > From: gowtham <ragowthaman at="" gmail.com=""> > To: bioconductor <bioconductor at="" r-project.org=""> > Subject: [BioC] Expt. design question on optimal number of replicates > (in edgeR or else where) > > Hi Everyone, > Thanks to recent bioconductor workshop i atteneded ( and of course thanks > to Martin Morgan's inspiration) I am stepping out of hist/plot functions in > R to use bioconductor for more powerful analysis. We have many RNAseq > libries with out replicates. And I read edgeR document and understand, not > much use of doing any significant analysis. > > But, now, we are in a position to have biological replicates. But, we are > trying to decide the number. I understand more is merrier. But, what is a > good number? If that is too vaguge to suggest a number....we plan for 4 > biological replicates of each condition. Is that good enough ? > > Couple more information on the project: > 1) Aim of the project is to identify mRNAs that are bound to one > translational factor (compared to another factor) > 2) Our organism has 8,000 genes > 3) We use a modified RNAseq where each read represents one mRNA transcript. > 4) and our library usually contains 10 or more transcript per gene (>80% > cases) per Million mapped reads. > 5) this is a first step/survey experiment to see what class of genes > are differentially bound > > I appreciate your help/pointers, > If this has been discussed before, could you please point me towards that. > > gowthaman > > > -- > Gowthaman > > Bioinformatics Systems Programmer. > SBRI, 307 West lake Ave N Suite 500 > Seattle, WA. 98109-5219 > Phone : LAB 206-256-7188 (direct). ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

RNASeq Organism edgeR RNASeq Organism edgeR • 1.1k views

ADD COMMENT • link updated 13.7 years ago by gowtham ▴ 210 • written 13.7 years ago by Gordon Smyth 53k

score 0 · Answer 1 · 2012-05-26

Thanks very much for the reply with explanation Gordon, I work with Trypansomatid parasites which we think has less variability compared to humans. But, do not know what would be their biological coefficient of variation. We may have some data to calculate that to know for sure. We use 10 to 15 rounds of amplification. Hope that is not very aggressive? At least at this point in time, we are looking for major changes (4 fold change), and hoping to find some. But, as you mentioned, it may be hard asume before generating the data. But, sounds like 4 replicates are a good start. And after initial analysis we can increase the numbers should that be needed. Once again thanks for educating me. Gowthaman On Sat, May 26, 2012 at 4:08 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Gowthaman, > > There's no rigorous answer to this question, because it depends on the > variability of your population, how large the fold changes are that you > want to detect, how many genes you need to find, what FDR you can tolerate, > etc etc, and it's impossible to know all these things in advance. > > However, we can learn from experience. At our institute, we regularly > undertake mRNA-seq experiments using genetically identical mice for which > we can keep the biological coefficient of variation between replicates down > to about 10%. For these experiments, three or four biological replicates > per group works well. If you can keep the same consistency, and the > expression changes you want to detect are not too small, then three or four > can work well for you also. > > If you work with more heterogeneous organisms (like humans), or use less > well controlled protocols (like aggressive RNA amplification), or look for > subtle fold changes, larger numbers will likely be needed. > > Best wishes > Gordon > > Date: Fri, 25 May 2012 09:33:45 -0700 >> From: gowtham <ragowthaman@gmail.com> >> To: bioconductor <bioconductor@r-project.org> >> Subject: [BioC] Expt. design question on optimal number of replicates >> (in edgeR or else where) >> >> Hi Everyone, >> Thanks to recent bioconductor workshop i atteneded ( and of course thanks >> to Martin Morgan's inspiration) I am stepping out of hist/plot functions >> in >> R to use bioconductor for more powerful analysis. We have many RNAseq >> libries with out replicates. And I read edgeR document and understand, not >> much use of doing any significant analysis. >> >> But, now, we are in a position to have biological replicates. But, we are >> trying to decide the number. I understand more is merrier. But, what is a >> good number? If that is too vaguge to suggest a number....we plan for 4 >> biological replicates of each condition. Is that good enough ? >> >> Couple more information on the project: >> 1) Aim of the project is to identify mRNAs that are bound to one >> translational factor (compared to another factor) >> 2) Our organism has 8,000 genes >> 3) We use a modified RNAseq where each read represents one mRNA >> transcript. >> 4) and our library usually contains 10 or more transcript per gene (>80% >> cases) per Million mapped reads. >> 5) this is a first step/survey experiment to see what class of genes >> are differentially bound >> >> I appreciate your help/pointers, >> If this has been discussed before, could you please point me towards that. >> >> gowthaman >> >> >> -- >> Gowthaman >> >> Bioinformatics Systems Programmer. >> SBRI, 307 West lake Ave N Suite 500 >> Seattle, WA. 98109-5219 >> Phone : LAB 206-256-7188 (direct). >> > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:20}}