Search
Question: Deseq for pairwise comparison among many pairs
0
4.2 years ago by
Hi Michael, I have a dataset of five different conditions with three repeats each and my aim is to conduct a few different pairwise comparison between the different conditions. I followed successfully your instructions from Wed Jun 11 15:32:11 about how to do it in DESeq 2. However, while analyzing the results I saw some strange numbers where big differences did not get significant Padj- when I repeated the analysis with uploading only the relevant conditions the Padj dropped dramatically. One example: * pairwise comparison from the big data set - normalized reads of condition a: 216,241; 204,221; 229,866. condition b: 11,192; 10,840; 9,172 Padj: 0.132 * comparison in a dataset of the relevant conditions only: normalized reads of condition a: 220,126; 205,579; 230,317. condition b: 11,341; 10,897; 9,233 Padj: 1.74E-215 Can you think of what might have caused this problem? I'm wondering if I should go back and analyze each comparison separately. The basic script I have used: sampleFiles <- c("P11.txt","P12.txt","P13.txt","P21.txt","P22.txt","P2 3.txt","L11.txt","L12.txt","L13.txt","L21.txt","L22.txt","L23.txt","L3 1.txt","L32.txt","L33.txt") sampleCondition<-c("a","a","a","e","e","e","ILa","ILa","ILa","ILe","IL e","ILe","ILh","ILh","ILh") sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition) ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition) colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("a","e","ILa","ILe","ILh")) dds<-DESeq(ddsHTSeq) res<-results(dds, contrast=c("condition","a","ILa")) res<-res[order(res$padj),] mcols(res,use.names=TRUE) write.csv(as.data.frame(res),file="a-ILa.csv") All the best, Avichai [[alternative HTML version deleted]] ADD COMMENTlink modified 4.2 years ago by Michael Love19k • written 4.2 years ago by avichai.amrad@ips.unibe.ch10 0 4.2 years ago by Michael Love19k United States Michael Love19k wrote: hi Avichai, On Mon, Jul 21, 2014 at 11:18 AM, <avichai.amrad at="" ips.unibe.ch=""> wrote: > Hi Michael, > > I have a dataset of five different conditions with three repeats each and my aim is to conduct a few different pairwise comparison between the different conditions. I followed successfully your instructions from Wed Jun 11 15:32:11 about how to do it in DESeq 2. > However, while analyzing the results I saw some strange numbers where big differences did not get significant Padj- when I repeated the analysis with uploading only the relevant conditions the Padj dropped dramatically. > This can happen when the variance of counts for the two conditions is much smaller than the variance for the other conditions. The DESeq model has a single dispersion estimate for each gene, so the other conditions can pull up the dispersion estimate (both for a singel gene and for the trend line, so pulling up all the estimates). In this case it might make sense to analyze the conditions separately. You might find it useful to examine the PCA plot for these samples, which may or may not show the global differences in variances. Mike > One example: > > * pairwise comparison from the big data set - normalized reads of condition a: 216,241; 204,221; 229,866. condition b: 11,192; 10,840; 9,172 Padj: 0.132 > > * comparison in a dataset of the relevant conditions only: normalized reads of condition a: 220,126; 205,579; 230,317. condition b: 11,341; 10,897; 9,233 Padj: 1.74E-215 > > Can you think of what might have caused this problem? > I'm wondering if I should go back and analyze each comparison separately. > > The basic script I have used: > > sampleFiles <- c("P11.txt","P12.txt","P13.txt","P21.txt","P22.txt"," P23.txt","L11.txt","L12.txt","L13.txt","L21.txt","L22.txt","L23.txt"," L31.txt","L32.txt","L33.txt") > sampleCondition<-c("a","a","a","e","e","e","ILa","ILa","ILa","ILe"," ILe","ILe","ILh","ILh","ILh") > sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition) > ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition) > colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("a","e","ILa","ILe","ILh")) > dds<-DESeq(ddsHTSeq) > res<-results(dds, contrast=c("condition","a","ILa")) > res<-res[order(res$padj),] > mcols(res,use.names=TRUE) > write.csv(as.data.frame(res),file="a-ILa.csv") > > All the best, > > Avichai > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor