Question

shan

0

Entering edit mode

wang peter ★ 2.0k

@wang-peter-4647

Last seen 11.4 years ago

dear ALL: i want to find DE genes between two conditions, each condition has 6 samples, but among those 6 samples, 3 samples contains 51 bp reads and 3 samples contains 100 bp reads. that is because 51 bp reads are too short, so we sequenced 100 bp on the same samples. i know using DESeq, i should combine 51 bp with 100 bp reads on each sample as technical replicate. but i donot want to remove those difference by combination of them. who can help me design a good experimental design? thank u in advance shan -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253

DESeq DESeq • 1.1k views

ADD COMMENT • link updated 13.3 years ago by Steve Lianoglou ★ 13k • written 13.3 years ago by wang peter ★ 2.0k

score 0 · Answer 1 · 2012-10-01

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 11 weeks ago

United States

Hi Shan, On Mon, Oct 1, 2012 at 1:30 PM, wang peter <wng.peter at="" gmail.com=""> wrote: > dear ALL: > i want to find DE genes between two conditions, > each condition has 6 samples, but among those 6 samples, > 3 samples contains 51 bp reads and 3 samples contains 100 bp reads. > that is because 51 bp reads are too short, so we sequenced > 100 bp on the same samples. > i know using DESeq, i should combine 51 bp with 100 bp reads > on each sample as technical replicate. but i donot want to remove > those difference by combination of them. > who can help me design a good experimental design? This is a somehow interesting situation ... no answers from me, but I do have some questions :-) Have you done some exploratory analysis to see what kind of effects you get from different read sizes? For example, you might try to cluster the samples to see if they cluster by sample or by read length? Or in edgeR you can try the plotMDS function? Also -- did you use a "splice-aware" aligner (like tophat2, GSNAP, ...), or?. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 13.3 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

thx steve i trid the correlation of 51 library with 100 library it is more than 0.9 on average so very good technical replicate shan

ADD REPLY • link 13.3 years ago wang peter ★ 2.0k

0

Entering edit mode

Hi Shan, On Mon, Oct 1, 2012 at 4:08 PM, wang peter <wng.peter at="" gmail.com=""> wrote: > thx steve > i trid the correlation of 51 library with 100 library > it is more than 0.9 on average > > so very good technical replicate Well, in the absence of telling us how that number compares to the correlation between the two 100bp libs, I guess we don't really know. If the correlation is really no different between a 51bp and a 100bp sample and a 100bp vs 100bp sample, then I guess the original 51bp was too short after all, right? ;-) Honestly curious, though, what was the ultimate reason that the powers that be decided 50bp was too short? Was there a particular gene (or set of genes) that is highly unmappable at 50bp? Better splice-junction mapping? Are you trying to do some gene fusion detection, or something?Maybe transcript assembly? Anyway, I'd still dig deeper to see if you find systematic bias between your samples (do the cluster together, or similar). Without having done any of that, I'll take a shot in the dark as to what I'm *guessing* might be "just fine" if I were trying to use this data for differential expression analysis: I bet that trimming your 100bp reads back to 50 bp reads and running the "analysis" by treating the appropriate libraries as technical replicates of each other, you're probably going to be "playing" with the same sets of genes as any other method (more or less). I say that because I'm guessing that treating the 100bp and 50bp reads as biological replicates (when they're not) isn't exactly correct either, so I'm erring to the side of being more conservative. Hopefully you might get some better ideas from others, too ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.3 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

dear steve: you are right, 1. 51 bp lib is too short for assembly 2. we have two condition of samples (DLCK1-51bp DLCK2-51bp DLCK3-51bp) vs (LCK5-51bp LCK6-51bp LCK7-51bp) then we sequencing the same samples by 100bp (DLCK1 DLCK2 DLCK3) vs (LCK5 LCK6 LCK7) 3 i want to find DE genes between DLCK vs LCK BUT i donot want to combine 51 bp with 100 bp data, which removing the replicate variation (DLCK1+DCLK1-51bp DLCK2.... DLCK3.....) vs .... shan -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253

ADD REPLY • link 13.3 years ago wang peter ★ 2.0k

0

Entering edit mode

Hi, On Wed, Oct 3, 2012 at 9:48 AM, wang peter <wng.peter at="" gmail.com=""> wrote: > dear steve: > you are right, > > 1. 51 bp lib is too short for assembly Does that mean you *need* to do assembly for downstream analysis? Are you not aligning to a genome? Or do you have a genome, but also want to do to transcript assembly to identify novel transcripts? Maybe your genome is poorly annotated? What type of data are you working with here? Plant? Animal? Which plant/animal, etc ... I'm not sure what to tell you, partly because you're not giving a lot of information in your question/answers. You are giving facts about your data, but no motivation regarding the question you're trying to answer, and what is stopping you from doing so. You also say: > 3 i want to find DE genes between DLCK vs LCK > > BUT i donot want to combine 51 bp with 100 bp data, which removing the > replicate variation > (DLCK1+DCLK1-51bp DLCK2.... DLCK3.....) vs .... OK. So your analysis twice, once w/ 51bp, and again w/ 100bp -- see how they compare. If you need to assemble transcripts first, do so w/ the 100bp reads, then quantify their expression with each library separately. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.3 years ago Steve Lianoglou ★ 13k