Experimental design for RNA-Seq

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.6 years ago

Dear List I'm about to design a simple experiment (knockout vs wild-type) and we plan to use RNA-Seq. We're interested in gene expression, for mRNA and microRNAs in particular, and calculating stats for differential expression. I'm aware of DEseq, DEGseq and edgeR. I wanted to ask those who have a lot of experience of this type of analysis if they have any advice for experimental design, in particular, the number of replicates they have used and why (I was planning on going for all biological replicates, no technical). Thanks Mick [[alternative HTML version deleted]]

edgeR DEGseq DESeq edgeR DEGseq DESeq • 2.0k views

ADD COMMENT • link updated 13.9 years ago by Naomi Altman ★ 6.0k • written 13.9 years ago by michael watson IAH-C ★ 3.4k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

At least from the stat theory point of view, the best design is equal numbers of biological samples (the more the better) for each condition and no technical reps. So far, there is little indication that there are flowcell effects. However, to be on the safe side, you should use the blocking principle - as much as possible distribute the reps from the different conditions across different flow cells (unless the whole experiment fits on a single flow cell). --Naomi At 04:02 AM 5/28/2010, michael watson (IAH-C) wrote: >Dear List > >I'm about to design a simple experiment (knockout vs wild-type) and >we plan to use RNA-Seq. We're interested in gene expression, for >mRNA and microRNAs in particular, and calculating stats for >differential expression. > >I'm aware of DEseq, DEGseq and edgeR. I wanted to ask those who >have a lot of experience of this type of analysis if they have any >advice for experimental design, in particular, the number of >replicates they have used and why (I was planning on going for all >biological replicates, no technical). > >Thanks >Mick > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 13.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hello list: I post my questions here again to get help about my experiment design, as I am new and have been struggling with my analysis. I could not find the example design from LIMMA user's guide. The first part of my experiment consists of a loop design to compare the gene expression of at different development stages (10DAP, 22DAP and 35DAP, day-after-pollenization) of the same Brassica line. The purpose is to see the differentiation at different development stages of the same single line. Pooled sample of the same line was used for each stage and treated as biological replicates. I have dye swap plus two technical replicates. The target file consists of three or four columns as file name, Cy3 and Cy5. here is the target file: FileName Cy3 Cy5 AT Oligo 02.11.02.176.gpr.fixed 10DAP 22DAP AT Oligo 02.11.02.177.gpr.fixed 22DAP 10DAP AT Oligo 02.11.02.178.gpr.fixed 22DAP 10DAP AT Oligo 02.11.02.179.gpr.fixed 10DAP 22DAP AT Oligo 02.11.02.180.gpr.fixed 22DAP 35DAP AT Oligo 02.11.02.181.gpr.fixed 22DAP 35DAP AT Oligo 02.11.02.182.gpr.fixed 35DAP 22DAP AT Oligo 02.11.02.183.gpr.fixed 35DAP 22DAP AT Oligo 02.11.02.184.gpr.fixed 10DAP 35DAP AT Oligo 02.11.02.185.gpr.fixed 10DAP 35DAP AT Oligo 02.11.02.186.gpr.fixed 35DAP 10DAP AT Oligo 02.11.02.187.gpr.fixed 35DAP 10DAP This experiment is very similar to the design in LIMMA User's Guide section 7.4, except I have technical replicates. From the Guide, should I have to use one sample like "10DAP" as reference, or any sample for a reference? My goal is to see which genes are differentiated from 10DAP, 22DAP and 35 DAP. How do I get the results of: 1)which genes are consistently up/down-regulated across the 3 stages? 2) which genes are up-down-regulated at each development stage? The second part of my experiment is: FileName DPA Cy3 Cy5 2009-07-10-atq3.7.3.145-15.gpr 15DPA WT MUTANT 2009-07-15-atq3.7.3.146-15.gpr 15DPA MUTANT WT 2009-07-15-atq3.7.3.147-15.gpr 15DPA MUTANT WT 2009-07-15-atq3.7.3.148-15.gpr 15DPA WT MUTANT 2009-07-15-atq3.7.3.149-15.gpr 15DPA WT MUTANT 2009-07-17-atq3.7.3.151-20.gpr 20DPA MUTANT WT 2009-07-17-atq3.7.3.152-20.gpr 20DPA WT MUTANT 2009-07-17-atq3.7.3.153-25.gpr 25DPA MUTANT WT 2009-07-17-atq3.7.3.154-25.gpr 25DPA WT MUTANT 2009-07-17-atq3.7.3.155-10.gpr 10DPA MUTANT WT 2009-07-17-atq3.7.3.156-10.gpr 10DPA WT MUTANT 2009-07-17-atq3.7.3.157-30.gpr 30DPA MUTANT WT 2009-07-17-atq3.7.3.158-30.gpr 30DPA WT MUTANT 2009-07-21-atq3.7.3-159-10.gpr 10DPA MUTANT WT 2009-07-21-atq3.7.3-160-20.gpr 20DPA MUTANT WT 2009-07-21-atq3.7.3-164-25.gpr 25DPA MUTANT WT 2009-07-21-atq3.7.3-256-30.gpr 30DPA MUTANT WT 2009-07-22-atq3.7.3-115-10.gpr 10DPA MUTANT WT 2009-07-22-atq3.7.3-116-20.gpr 20DPA MUTANT WT 2009-07-22-atq3.7.3-117-25.gpr 25DPA MUTANT WT 2009-07-22-atq3.7.3-118-30.gpr 30DPA MUTANT WT 2009-07-22-atq3.7.3-119-10.gpr 10DPA WT MUTANT 2009-07-22-atq3.7.3-120-20.gpr 20DPA WT MUTANT 2009-07-22-atq3.7.3-124-25.gpr 25DPA WT MUTANT 2009-07-22-atq3.7.3-125-30.gpr 30DPA WT MUTANT The target file consists of four columns as file name, time, Cy3 and Cy5. This is a time course experiment. Again I want to see the expression differentiation across the stage (10, 15 20, 25 and 30DAP). 1)Can I split the analysis into five sub-groups by time course (say, 10, 15, 20 25 and 30DPA separately) instead of a whole? 2)If I split the 25 slides into 5 sub-experiments, my feeling is the variance and normalization would be different from each other. Is this correct? 3)How do I prepare the biolrep as I treated the pooled sample as biological replicates.? I would appreciate very much if you could give me some suggestions on these questions. Thanks a lot! Yifang ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Naomi Altman [naomi@stat.psu.edu] Sent: Friday, May 28, 2010 9:17 AM To: michael watson (IAH-C); bioconductor Subject: Re: [BioC] Experimental design for RNA-Seq At least from the stat theory point of view, the best design is equal numbers of biological samples (the more the better) for each condition and no technical reps. So far, there is little indication that there are flowcell effects. However, to be on the safe side, you should use the blocking principle - as much as possible distribute the reps from the different conditions across different flow cells (unless the whole experiment fits on a single flow cell). --Naomi At 04:02 AM 5/28/2010, michael watson (IAH-C) wrote: >Dear List > >I'm about to design a simple experiment (knockout vs wild-type) and >we plan to use RNA-Seq. We're interested in gene expression, for >mRNA and microRNAs in particular, and calculating stats for >differential expression. > >I'm aware of DEseq, DEGseq and edgeR. I wanted to ask those who >have a lot of experience of this type of analysis if they have any >advice for experimental design, in particular, the number of >replicates they have used and why (I was planning on going for all >biological replicates, no technical). > >Thanks >Mick > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.9 years ago Tan, Yifang ▴ 50

0

Entering edit mode

Hi Yifang, Out of curiosity, what does this have to do with "Experimental design for RNA-Seq"? Thanks, -steve On Fri, May 28, 2010 at 10:08 AM, Tan, Yifang <yifang.tan at="" nrc-="" cnrc.gc.ca=""> wrote: > Hello list: > > I post my questions here again to get help about my experiment design, as I am new and have been struggling with my analysis. I could not find the example design from LIMMA user's guide. > > > The first part of my experiment consists of a loop design to compare the gene expression of at different development stages (10DAP, 22DAP and 35DAP, day-after-pollenization) of the same Brassica line. The purpose is to see the differentiation at different development stages of the same single line. Pooled sample of the same line was used for each stage and treated as biological replicates. ?I have dye swap plus two technical replicates. The target file consists of three or four columns as file name, Cy3 and Cy5. here is the target file: > ? ? ? ? ? ? ? FileName ? ? ? ?Cy3 ? ? Cy5 > ?AT Oligo 02.11.02.176.gpr.fixed 10DAP 22DAP > ?AT Oligo 02.11.02.177.gpr.fixed 22DAP 10DAP > ?AT Oligo 02.11.02.178.gpr.fixed 22DAP 10DAP > ?AT Oligo 02.11.02.179.gpr.fixed 10DAP 22DAP > ?AT Oligo 02.11.02.180.gpr.fixed 22DAP 35DAP > ?AT Oligo 02.11.02.181.gpr.fixed 22DAP 35DAP > ?AT Oligo 02.11.02.182.gpr.fixed 35DAP 22DAP > ?AT Oligo 02.11.02.183.gpr.fixed 35DAP 22DAP > ?AT Oligo 02.11.02.184.gpr.fixed 10DAP 35DAP > ?AT Oligo 02.11.02.185.gpr.fixed 10DAP 35DAP > ?AT Oligo 02.11.02.186.gpr.fixed 35DAP 10DAP > ?AT Oligo 02.11.02.187.gpr.fixed 35DAP 10DAP > > ?This experiment is very similar to the design in LIMMA User's Guide section 7.4, except I have technical replicates. From the Guide, should I have to use one sample like "10DAP" as reference, or any sample for a reference? ?My goal is to see which genes are differentiated from 10DAP, 22DAP and 35 DAP. How do I get the results of: 1)which genes are consistently up/down-regulated across the 3 stages? 2) which genes are up-down-regulated at each development stage? > > ?The second part of my experiment is: > ? ? ? ? ? ? ? FileName ? ? ? ?DPA ? ? Cy3 ? ? Cy5 > ?2009-07-10-atq3.7.3.145-15.gpr ? ? ? ?15DPA ? WT ? ? ?MUTANT > ?2009-07-15-atq3.7.3.146-15.gpr ? ? ? ?15DPA ? MUTANT ?WT > ?2009-07-15-atq3.7.3.147-15.gpr ? ? ? ?15DPA ? MUTANT ?WT > ?2009-07-15-atq3.7.3.148-15.gpr ? ? ? ?15DPA ? WT ? ? ?MUTANT > ?2009-07-15-atq3.7.3.149-15.gpr ? ? ? ?15DPA ? WT ? ? ?MUTANT > ?2009-07-17-atq3.7.3.151-20.gpr ? ? ? ?20DPA ? MUTANT ?WT > ?2009-07-17-atq3.7.3.152-20.gpr ? ? ? ?20DPA ? WT ? ? ?MUTANT > ?2009-07-17-atq3.7.3.153-25.gpr ? ? ? ?25DPA ? MUTANT ?WT > ?2009-07-17-atq3.7.3.154-25.gpr ? ? ? ?25DPA ? WT ? ? ?MUTANT > ?2009-07-17-atq3.7.3.155-10.gpr ? ? ? ?10DPA ? MUTANT ?WT > ?2009-07-17-atq3.7.3.156-10.gpr ? ? ? ?10DPA ? WT ? ? ?MUTANT > ?2009-07-17-atq3.7.3.157-30.gpr ? ? ? ?30DPA ? MUTANT ?WT > ?2009-07-17-atq3.7.3.158-30.gpr ? ? ? ?30DPA ? WT ? ? ?MUTANT > ?2009-07-21-atq3.7.3-159-10.gpr ? ? ? ?10DPA ? MUTANT ?WT > ?2009-07-21-atq3.7.3-160-20.gpr ? ? ? ?20DPA ? MUTANT ?WT > ?2009-07-21-atq3.7.3-164-25.gpr ? ? ? ?25DPA ? MUTANT ?WT > ?2009-07-21-atq3.7.3-256-30.gpr ? ? ? ?30DPA ? MUTANT ?WT > ?2009-07-22-atq3.7.3-115-10.gpr ? ? ? ?10DPA ? MUTANT ?WT > ?2009-07-22-atq3.7.3-116-20.gpr ? ? ? ?20DPA ? MUTANT ?WT > ?2009-07-22-atq3.7.3-117-25.gpr ? ? ? ?25DPA ? MUTANT ?WT > ?2009-07-22-atq3.7.3-118-30.gpr ? ? ? ?30DPA ? MUTANT ?WT > ?2009-07-22-atq3.7.3-119-10.gpr ? ? ? ?10DPA ? WT ? ? ?MUTANT > ?2009-07-22-atq3.7.3-120-20.gpr ? ? ? ?20DPA ? WT ? ? ?MUTANT > ?2009-07-22-atq3.7.3-124-25.gpr ? ? ? ?25DPA ? WT ? ? ?MUTANT > ?2009-07-22-atq3.7.3-125-30.gpr ? ? ? ?30DPA ? WT ? ? ?MUTANT > > ?The target file consists of four columns as file name, time, Cy3 and Cy5. ?This is a time course experiment. Again I want to see the expression differentiation across the stage (10, 15 20, 25 and 30DAP). > 1)Can I split the analysis into five sub-groups by time course (say, 10, 15, 20 25 and 30DPA separately) instead of a whole? > 2)If I split the 25 slides into 5 sub-experiments, my feeling is the variance and normalization would be different from each other. Is this correct? > 3)How do I prepare the biolrep as I treated the pooled sample as biological replicates.? > > I would appreciate very much if you could give me some suggestions on these questions. Thanks a lot! > > > ?Yifang > > ________________________________________ > From: bioconductor-bounces at stat.math.ethz.ch [bioconductor- bounces at stat.math.ethz.ch] On Behalf Of Naomi Altman [naomi at stat.psu.edu] > Sent: Friday, May 28, 2010 9:17 AM > To: michael watson (IAH-C); bioconductor > Subject: Re: [BioC] Experimental design for RNA-Seq > > At least from the stat theory point of view, the best design is equal > numbers of biological samples (the more the better) for each > condition and no technical reps. > > So far, there is little indication that there are flowcell > effects. ?However, to be on the safe side, you should use the > blocking principle - as much as possible distribute the reps from the > different conditions across different flow cells (unless the whole > experiment fits on a single flow cell). > > --Naomi > > At 04:02 AM 5/28/2010, michael watson (IAH-C) wrote: >>Dear List >> >>I'm about to design a simple experiment (knockout vs wild-type) and >>we plan to use RNA-Seq. ?We're interested in gene expression, for >>mRNA and microRNAs in particular, and calculating stats for >>differential expression. >> >>I'm aware of DEseq, DEGseq and edgeR. ?I wanted to ask those who >>have a lot of experience of this type of analysis if they have any >>advice for experimental design, in particular, the number of >>replicates they have used and why (I was planning on going for all >>biological replicates, no technical). >> >>Thanks >>Mick >> >> >> >> ? ? ? ? [[alternative HTML version deleted]] >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?814-865-3791 (voice) > Associate Professor > Dept. of Statistics ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?814-863-7114 (fax) > Penn State University ? ? ? ? ? ? ? ? ? ? ? ? 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi, I just wanted to ask/make one point. On Fri, May 28, 2010 at 9:17 AM, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > At least from the stat theory point of view, the best design is equal > numbers of biological samples (the more the better) for each condition and > no technical reps. Can you clarify a bit as to what you are referring to as a "technical replicate" in this sense? You could consider two lanes that are sequenced from the same library as technical replicates, no? Or, by "technical replicate" do you mean creating two libraries out of one sample? If we're talking about the former, then I think there is lots of value to be gained, and perhaps necessary(?), to running more than one lane per library preparation -- and maybe the question would rather be "how many lanes to run per library"? What does the court think? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Great stuff, thanks Steve and Naomi. I guess I was thinking of technical replicates simply as sequencing the same library on multiple occasions; though creating two libraries out of one sample adds an extra layer of complexity. What is the evidence (if any) that lane and/or library preparation can have an effect? To adjust for lane effects, I guess one could multiplex each sample so that they're run on all lanes, and combine the counts at the end? Hmmm Mick -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: 28 May 2010 16:01 To: Naomi Altman Cc: michael watson (IAH-C); bioconductor Subject: Re: [BioC] Experimental design for RNA-Seq Hi, I just wanted to ask/make one point. On Fri, May 28, 2010 at 9:17 AM, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > At least from the stat theory point of view, the best design is equal > numbers of biological samples (the more the better) for each condition and > no technical reps. Can you clarify a bit as to what you are referring to as a "technical replicate" in this sense? You could consider two lanes that are sequenced from the same library as technical replicates, no? Or, by "technical replicate" do you mean creating two libraries out of one sample? If we're talking about the former, then I think there is lots of value to be gained, and perhaps necessary(?), to running more than one lane per library preparation -- and maybe the question would rather be "how many lanes to run per library"? What does the court think? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.9 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Hi, On Fri, May 28, 2010 at 12:04 PM, michael watson (IAH-C) <michael.watson at="" bbsrc.ac.uk=""> wrote: > Great stuff, thanks Steve and Naomi. > > I guess I was thinking of technical replicates simply as sequencing the same library on multiple occasions; ?though creating two libraries out of one sample adds an extra layer of complexity. > > What is the evidence (if any) that lane and/or library preparation can have an effect? I'm sure that this has hit your radar already, but I should point out that at least some part of the recent Bullard et al. paper: http://www.biomedcentral.com/1471-2105/11/94 talks about lane, flow-cell, and library prep effects. In their concluding remarks, the mention that: """We find that technical variation is quite low across lanes and flow-cells and slightly larger across library preparations. In all cases, however, the effect on differ- ential expression results is minimal. As noted above, the MAQC datasets are unusual, in that we expect extre- mely large differences in expression between Brain and UHR and only small library preparation effects because of the high quality of the RNA. In practice, library pre- paration effects may be closer in magnitude to biological effects.""" I feel like their original "working draft" paper that you can get from bepress: http://www.bepress.com/ucbbiostat/paper247 talks about lane effects too (if the BMC Bioinformatics one doesn't) .. I can't recall .. it might be worth looking at, too. I'm also reminded of some comparisons that were performed in the "FRT-seq" paper: http://www.nature.com/nmeth/journal/v7/n2/full/nmeth.1417.html They were presenting a modified Illumina sequencing protocol that skips the traditional amplification step, and prepare two libraries from the same sample using their FRT-seq protocol, and the "standard" Illumina rna-seq protocol, to show you the pros/cons of each. There is data is also made available for you to play with, if you're so inclined. If you have any vote on how the sequencing is done, though, and you guys are using an Illumina sequencer, it seems like doing FRT-seq would be a big win (if you were to ask me :-) -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi, We have just completed the sequencing of RNA-Seq libraries from a porcine challenge experiment: two treatments (bacteria) and 5 time points after challenge (T0,T6,T12,T24,T48 hours pi) - a total of 48 samples (5-6 samples/treatmentXtime). A single RNA-Seq library has been generated from each sample (so no true technical replication) and the 48 libraries have been sequenced as 12-plex in four flowcells (4 lanes of 12-plexed samples/flowcell, all 48 samples sequenced in each flowcell) using the Illumina index system. In each 12-plex, the samples have been mixed to balance each treatmentXtime in each plex. When starting the experiment it was not recommended by Illumina to do less than 12-plex. Since then, Illumina have changed their recommendation so it is possible to do 2, 3, 6 and 12 plex indexing. The experiment could hence have been conducted by 3 plexing instead (so each sample would have been sequenced once instead of four times in four runs) but I still like the idea of sequencing all samples in each run.... Following mapping, the counts from each library have been combined from the three runs - generating more than 4 millions seqs/sample Starting the analysis, I have found that the available package (DEseq, DEGseq and edger) present examples on the analysis of simple experiment (e.g. control vs challenge) but wonder how to analyse a time-point experiment with two treatments. Initially, I am going to compare each time-point to the control (within and across treatment) but it would be nice to take the interactions into account as well. Best regards, Jakob -----Oprindelig meddelelse----- Fra: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- bounces at stat.math.ethz.ch] P? vegne af michael watson (IAH-C) Sendt: 28. maj 2010 18:04 Til: 'Steve Lianoglou'; Naomi Altman Cc: bioconductor Emne: Re: [BioC] Experimental design for RNA-Seq Great stuff, thanks Steve and Naomi. I guess I was thinking of technical replicates simply as sequencing the same library on multiple occasions; though creating two libraries out of one sample adds an extra layer of complexity. What is the evidence (if any) that lane and/or library preparation can have an effect? To adjust for lane effects, I guess one could multiplex each sample so that they're run on all lanes, and combine the counts at the end? Hmmm Mick -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: 28 May 2010 16:01 To: Naomi Altman Cc: michael watson (IAH-C); bioconductor Subject: Re: [BioC] Experimental design for RNA-Seq Hi, I just wanted to ask/make one point. On Fri, May 28, 2010 at 9:17 AM, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > At least from the stat theory point of view, the best design is equal > numbers of biological samples (the more the better) for each condition and > no technical reps. Can you clarify a bit as to what you are referring to as a "technical replicate" in this sense? You could consider two lanes that are sequenced from the same library as technical replicates, no? Or, by "technical replicate" do you mean creating two libraries out of one sample? If we're talking about the former, then I think there is lots of value to be gained, and perhaps necessary(?), to running more than one lane per library preparation -- and maybe the question would rather be "how many lanes to run per library"? What does the court think? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.9 years ago Jakob Hedegaard ▴ 170

0

Entering edit mode

Jakob An excellent answer, thank you. You say each sample has been sequenced four times due to the 12-plexing. What kind of variation do you see for the counts across those four times? Thanks Mick ________________________________________ From: Jakob Hedegaard [Jakob.Hedegaard@agrsci.dk] Sent: 30 May 2010 12:03 To: bioconductor at stat.math.ethz.ch Cc: michael watson (IAH-C) Subject: SV: [BioC] Experimental design for RNA-Seq Hi, We have just completed the sequencing of RNA-Seq libraries from a porcine challenge experiment: two treatments (bacteria) and 5 time points after challenge (T0,T6,T12,T24,T48 hours pi) - a total of 48 samples (5-6 samples/treatmentXtime). A single RNA-Seq library has been generated from each sample (so no true technical replication) and the 48 libraries have been sequenced as 12-plex in four flowcells (4 lanes of 12-plexed samples/flowcell, all 48 samples sequenced in each flowcell) using the Illumina index system. In each 12-plex, the samples have been mixed to balance each treatmentXtime in each plex. When starting the experiment it was not recommended by Illumina to do less than 12-plex. Since then, Illumina have changed their recommendation so it is possible to do 2, 3, 6 and 12 plex indexing. The experiment could hence have been conducted by 3 plexing instead (so each sample would have been sequenced once instead of four times in four runs) but I still like the idea of sequencing all samples in each run.... Following mapping, the counts from each library have been combined from the three runs - generating more than 4 millions seqs/sample Starting the analysis, I have found that the available package (DEseq, DEGseq and edger) present examples on the analysis of simple experiment (e.g. control vs challenge) but wonder how to analyse a time-point experiment with two treatments. Initially, I am going to compare each time-point to the control (within and across treatment) but it would be nice to take the interactions into account as well. Best regards, Jakob -----Oprindelig meddelelse----- Fra: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- bounces at stat.math.ethz.ch] P? vegne af michael watson (IAH-C) Sendt: 28. maj 2010 18:04 Til: 'Steve Lianoglou'; Naomi Altman Cc: bioconductor Emne: Re: [BioC] Experimental design for RNA-Seq Great stuff, thanks Steve and Naomi. I guess I was thinking of technical replicates simply as sequencing the same library on multiple occasions; though creating two libraries out of one sample adds an extra layer of complexity. What is the evidence (if any) that lane and/or library preparation can have an effect? To adjust for lane effects, I guess one could multiplex each sample so that they're run on all lanes, and combine the counts at the end? Hmmm Mick -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: 28 May 2010 16:01 To: Naomi Altman Cc: michael watson (IAH-C); bioconductor Subject: Re: [BioC] Experimental design for RNA-Seq Hi, I just wanted to ask/make one point. On Fri, May 28, 2010 at 9:17 AM, Naomi Altman <naomi at="" stat.psu.edu=""> wrote: > At least from the stat theory point of view, the best design is equal > numbers of biological samples (the more the better) for each condition and > no technical reps. Can you clarify a bit as to what you are referring to as a "technical replicate" in this sense? You could consider two lanes that are sequenced from the same library as technical replicates, no? Or, by "technical replicate" do you mean creating two libraries out of one sample? If we're talking about the former, then I think there is lots of value to be gained, and perhaps necessary(?), to running more than one lane per library preparation -- and maybe the question would rather be "how many lanes to run per library"? What does the court think? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.9 years ago michael watson IAH-C ★ 3.4k

Login before adding your answer.