Question

deseq2 for pair-wise analysis

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

I have just started analyzing my transcriptome assembly data by deseq2. I am finding that my two control samples do not coincide very well. This may be affecting the number of differentially expressed genes if I use the average of two controls to compare with the treated. Is there a way I can minimize the effects of biological variation on the analysis? The overall counts are almost the same in all four files (~59-65 million, with an average of 62 million and SD of 3 million). C1 and C2 (untreated) are two different batches of experimental material and one half of them are dosed (D1 and D2; treated). They are kind of paired that way, although they are distinct set of individuals between C1 & D1 or C2 & D2. We actually started with 3 reps but decided to pool the 2nd and 3rd reps to have sufficient RNA for follow up qPCR assays. For some reason (could be biological differences between batches), the counts in C2 are 3-5 fold high compared to C1 for only some transcripts. There is a faintly similar trend between D2 and D1, although the treatment has wiped off this difference in many of those transcripts. I am wondering if I can use the difference between the pairs (C1&D1 and C2&D2), at least some of the biological variability may be masked allowing us to see the effects of the treatment more clearly on the transcriptome. I find from the FAQs that there is a way to do pair-wise analysis in deseq2. I appreciate if you can elaborate the answer a little more. I have also another question related to this study. We actually used another closely related species for the study. Since this is small in size, we had to pool all three reps into one. Transcripts are >75% identical at the DNA sequence level and the gene expression changes are similar between the species in response to the treatment. I am thinking of combining this data as a third rep of the study for deseq2 analysis, instead of throwing out the data. I appreciate to have your input about this step as well. I am also attaching the code I used (thanks to Dave Wheeler). Thank you very much for your help. Subbaiah Chalivendra -- output of sessionInfo(): R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] BiocGenerics_0.10.0 GenomeInfoDb_1.0.2 GenomicRanges_1.16.3 [4] IRanges_1.22.9 parallel_3.1.1 stats4_3.1.1 [7] XVector_0.4.0 -- Sent via the guest posting facility at bioconductor.org.

qPCR qPCR • 1.7k views

ADD COMMENT • link updated 11.5 years ago by Michael Love 43k • written 11.5 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-08-12

hi Subbaiah, On Tue, Aug 12, 2014 at 10:33 AM, Subbaiah Chalivendra [guest] <guest at="" bioconductor.org=""> wrote: > I have just started analyzing my transcriptome assembly data by deseq2. I am finding that my two control samples do not coincide very well. This may be affecting the number of differentially expressed genes if I use the average of two controls to compare with the treated. Is there a way I can minimize the effects of biological variation on the analysis? > > The overall counts are almost the same in all four files (~59-65 million, with an average of 62 million and SD of 3 million). C1 and C2 (untreated) are two different batches of experimental material and one half of them are dosed (D1 and D2; treated). They are kind of paired that way, although they are distinct set of individuals between C1 & D1 or C2 & D2. We actually started with 3 reps but decided to pool the 2nd and 3rd reps to have sufficient RNA for follow up qPCR assays. > > For some reason (could be biological differences between batches), the counts in C2 are 3-5 fold high compared to C1 for only some transcripts. There is a faintly similar trend between D2 and D1, although the treatment has wiped off this difference in many of those transcripts. > > I am wondering if I can use the difference between the pairs (C1&D1 and C2&D2), at least some of the biological variability may be masked allowing us to see the effects of the treatment more clearly on the transcriptome. > > I find from the FAQs that there is a way to do pair-wise analysis in deseq2. I appreciate if you can elaborate the answer a little more. > Do you have a more specific question? We have in the vignette: "FAQ: Can I use DESeq2 to analyze paired samples? Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula. See Section 1.5." so you would have a design of ~ sample + treatment where the column data should look like sample treatment 1 C 2 C 1 D 2 D > I have also another question related to this study. We actually used another closely related species for the study. Since this is small in size, we had to pool all three reps into one. Transcripts are >75% identical at the DNA sequence level and the gene expression changes are similar between the species in response to the treatment. I am thinking of combining this data as a third rep of the study for deseq2 analysis, instead of throwing out the data. I appreciate to have your input about this step as well. > I wouldn't recommend adding this sample from the different species, because the differences in counts due to the different transcripts of the other species would throw off the analysis (whether you align the reads to the genome/transcriptome of that species or to the genome/transcriptome of the species with the 4 samples you are already using). Basically, the feature-level effects on counts would not be expected to cancel out across samples. Mike > I am also attaching the code I used (thanks to Dave Wheeler). > > Thank you very much for your help. > > Subbaiah Chalivendra > > -- output of sessionInfo(): > > R version 3.1.1 (2014-07-10) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] BiocGenerics_0.10.0 GenomeInfoDb_1.0.2 GenomicRanges_1.16.3 > [4] IRanges_1.22.9 parallel_3.1.1 stats4_3.1.1 > [7] XVector_0.4.0 > > > -- > Sent via the guest posting facility at bioconductor.org.