need help for the study design of a RNA-Seq project

0

Entering edit mode

shirley zhang ★ 1.0k

@shirley-zhang-2038

Last seen 9.7 years ago

Dear list, I am not sure whether this list is the right place to ask this study design question. But on this list, I got lots of information regarding how to analyze RNA-Seq data, so would like to give a try. We are going to do RNA-Sequencing using Illumina HiSeq for 200 samples. Given that the sample size is fixed, and the budget is fixed, the following 3 options were proposed. 1. 50bp pair-end reads, sequencing each sample per lane --> we will get ~100 million reads per sample 2. 75bp pair-end reads, sequencing two samples per lane --> we will get ~50-60 million reads per sample 3. 100bp pair-end reads, sequencing four samples per lane --> we will get ~30-40 million reads per sample Based on your experience, which option is the best or you have other suggestions? We would like to do different kinds of analysis for these data, i.e.,novel transcripts, lncRNA, and splicing, SNP, etc. You name it. If we have to sort them by priority (from high to low), I would like to say " novel transcripts, long-noncoding RNAs splicing and differential expression". Currently, the majority of labs sequence 100bp pair-end, right? But I was told that even you sequence 100bp long, after 75bp, the sequencing quality is very bad due to the issue of sequencer itself, that is, it has nothing with the RNA quality of samples. If this is true, why is 100bp read length becoming more popular now? Many thanks, Shirley <zhangxl@bu.edu> [[alternative HTML version deleted]]

SNP Sequencing SNP Sequencing • 1.5k views

ADD COMMENT • link 11.1 years ago shirley zhang ★ 1.0k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 4 months ago

United States

On Thu, Apr 11, 2013 at 4:32 PM, shirley zhang <shirley0818 at="" gmail.com=""> wrote: > Dear list, > > I am not sure whether this list is the right place to ask this study design > question. But on this list, I got lots of information regarding how to > analyze RNA-Seq data, so would like to give a try. > > We are going to do RNA-Sequencing using Illumina HiSeq for 200 samples. > Given that the sample size is fixed, and the budget is fixed, the following > 3 options were proposed. > > 1. 50bp pair-end reads, sequencing each sample per lane --> we will get > ~100 million reads per sample > 2. 75bp pair-end reads, sequencing two samples per lane --> we will get > ~50-60 million reads per sample > 3. 100bp pair-end reads, sequencing four samples per lane --> we will get > ~30-40 million reads per sample > > Based on your experience, which option is the best or you have other > suggestions? We would like to do different kinds of analysis for these > data, i.e.,novel transcripts, lncRNA, and splicing, SNP, etc. You name it. > If we have to sort them by priority (from high to low), I would like to say > " novel transcripts, long-noncoding RNAs splicing and differential > expression". > > Currently, the majority of labs sequence 100bp pair-end, right? But I was > told that even you sequence 100bp long, after 75bp, the sequencing quality > is very bad due to the issue of sequencer itself, that is, it has nothing > with the RNA quality of samples. If this is true, why is 100bp read length > becoming more popular now? Hi, Shirley. I don't mean this as MY answer to your question, but this blog post has a few statements that might be interesting to you. http://core-genomics.blogspot.com/2013/04/encodes-rna-seq- recommendations-need.html You'll not that it refers to the ENCODE RNA-seq guidelines which might also be instructive. Sean

ADD COMMENT • link 11.1 years ago Sean Davis 21k

0

Entering edit mode

Dario Strbenac ★ 1.5k

@dario-strbenac-5916

Last seen 4 days ago

Australia

It is a question to ask at Biostars which has the address http://www.biostars.org/

ADD COMMENT • link 11.1 years ago Dario Strbenac ★ 1.5k

0

Entering edit mode

Dear Wei, Sean and Dario, Many thanks for all of your reply and suggestions. I really appreciate. I will check the ENCODE RNA-seq guidelines.I also posted my question at Biostars. Thanks again, Shirley On Thu, Apr 11, 2013 at 9:59 PM, Dario Strbenac <d.strbenac@garvan.org.au>wrote: > It is a question to ask at Biostars which has the address > http://www.biostars.org/ <zhangxl@bu.edu> [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago shirley zhang ★ 1.0k

0

Entering edit mode

Hi Shirley, I would say it depends. If you are investigating an organism with no good transcriptomic or genomic sequence ressources, I would also recommend 100bp PE because for a subsequent assembly of the reads this is for shure beneficial. If you have plenty of genomic ressources available, I would not generally discard the 50bp option. I attached a paper that partially covers the length / precision debate. Maybe it is helpful for you. Best regards Moritz 2013/4/12 shirley zhang <shirley0818 at="" gmail.com=""> > Dear Wei, Sean and Dario, > > Many thanks for all of your reply and suggestions. I really appreciate. > > I will check the ENCODE RNA-seq guidelines.I also posted my question at > Biostars. > > Thanks again, > Shirley > On Thu, Apr 11, 2013 at 9:59 PM, Dario Strbenac <d.strbenac at="" garvan.org.au=""> >wrote: > > > It is a question to ask at Biostars which has the address > > http://www.biostars.org/ > > > > <zhangxl at="" bu.edu=""> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *Moritz He? PhD Candidate * *Research associate Forest Research Institute of Baden W?rttemberg (FVA) Wonnhalde 4 79100 Freiburg (Germany) phone +49 761 4018 301* -------------- next part -------------- A non-text attachment was scrubbed... Name: Li, Dewey - 2011 - RSEM accurate transcript quantification from RNA-Seq data with or without a reference genome.pdf Type: application/pdf Size: 516515 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20130412="" bbb05fb1="" attachment.pdf="">

ADD REPLY • link 11.1 years ago Moritz Hess ▴ 60

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 4 weeks ago

Australia/Melbourne/Olivia Newton-John …

Hi Shirley, You should use 100bp PE reads. Longer reads will help reduce the mapping ambiguity, and thus give you more power in detecting new transcripts and SNPs. It will enable to you quantify gene expression levels more accurately as well. Cheers, Wei On Apr 12, 2013, at 6:32 AM, shirley zhang wrote: > Dear list, > > I am not sure whether this list is the right place to ask this study design > question. But on this list, I got lots of information regarding how to > analyze RNA-Seq data, so would like to give a try. > > We are going to do RNA-Sequencing using Illumina HiSeq for 200 samples. > Given that the sample size is fixed, and the budget is fixed, the following > 3 options were proposed. > > 1. 50bp pair-end reads, sequencing each sample per lane --> we will get > ~100 million reads per sample > 2. 75bp pair-end reads, sequencing two samples per lane --> we will get > ~50-60 million reads per sample > 3. 100bp pair-end reads, sequencing four samples per lane --> we will get > ~30-40 million reads per sample > > Based on your experience, which option is the best or you have other > suggestions? We would like to do different kinds of analysis for these > data, i.e.,novel transcripts, lncRNA, and splicing, SNP, etc. You name it. > If we have to sort them by priority (from high to low), I would like to say > " novel transcripts, long-noncoding RNAs splicing and differential > expression". > > Currently, the majority of labs sequence 100bp pair-end, right? But I was > told that even you sequence 100bp long, after 75bp, the sequencing quality > is very bad due to the issue of sequencer itself, that is, it has nothing > with the RNA quality of samples. If this is true, why is 100bp read length > becoming more popular now? > > Many thanks, > Shirley > <zhangxl at="" bu.edu=""> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 11.1 years ago Wei Shi ★ 3.6k

0

Entering edit mode

shirley zhang ★ 1.0k

@shirley-zhang-2038

Last seen 9.7 years ago

Dear Moritz, Thanks a lot for your suggestions and circulating the paper. I will read it. Sorry that I forgot to mention in my original question. We are working on human samples. During the design, the RNA fragment size has to be taken into account as well. If the fragment is 150-200bp, then 100 paired end is a waste as the reads will frequently overlap. Am I right? So I might go to option 2 (76bp PE) which is also recommended by ENCODE guideline. Hope to hear more comments and suggestions. Many thanks, On Fri, Apr 12, 2013 at 3:50 AM, Moritz Hess <ssehztirom@googlemail.com>wrote: > Hi Shirley, > > I would say it depends. If you are investigating an organism with no good > transcriptomic or genomic sequence ressources, I would also recommend 100bp > PE because for a subsequent assembly of the reads this is for shure > beneficial. If you have plenty of genomic ressources available, I would not > generally discard the 50bp option. I attached a paper that partially covers > the length / precision debate. Maybe it is helpful for you. > > Best regards > > Moritz > > 2013/4/12 shirley zhang <shirley0818@gmail.com> > >> Dear Wei, Sean and Dario, >> >> Many thanks for all of your reply and suggestions. I really appreciate. >> >> I will check the ENCODE RNA-seq guidelines.I also posted my question at >> Biostars. >> >> Thanks again, >> Shirley >> On Thu, Apr 11, 2013 at 9:59 PM, Dario Strbenac <d.strbenac@garvan.org.au>> >wrote: >> >> > It is a question to ask at Biostars which has the address >> > http://www.biostars.org/ >> >> >> >> <zhangxl@bu.edu> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > > *Moritz Heß > PhD Candidate > * > *Research associate > Forest Research Institute > of Baden Württemberg (FVA) > Wonnhalde 4 > 79100 Freiburg (Germany) > > phone +49 761 4018 301* > <zhangxl@bu.edu> [[alternative HTML version deleted]]

ADD COMMENT • link 11.1 years ago shirley zhang ★ 1.0k

Login before adding your answer.