DESeq(normalize using all samples?)

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

I have a file about readcount values with eight samples(A1,A2,B1,B2,C1,C2,D1,D2),I want to know the differential genes between A and B.Normally ,I should extract sample A1,A2,B1,B2 from the file.Now I use all samples to normalize the readcounts and fit the model ,I find that I find more DE genes.I want to know if my code is true and why? In this part I find 322 genes using all samples, while I find 77 genes using specfic samples. -- output of sessionInfo(): ###each### ###analysie with specific coloums### library('DESeq') x=read.delim("readcount.xls",row.names=1) x=round(x[,1:4]) group=factor(c("A","A","B","B")) cds <- newCountDataSet(x, group) cds <- estimateSizeFactors(cds) cds <- estimateDispersions(cds) res <- nbinomTest(cds,'A','B') a<-subset(res,padj<0.05) dim(a) write.table(a[,1],"each.txt",quote=F,col.names=F,row.names=F) ###union### ###analysis with all coloums### library('DESeq') x=read.delim("readcount.xls",row.names=1) x=round(x) group=c("A","A","B","B","C","C","D","D") cds <- newCountDataSet(x, group) cds <- estimateSizeFactors(cds) cds <- estimateDispersions(cds) res <- nbinomTest(cds,'A','B') a<-subset(res,padj<0.05) dim(a) write.table(a[,1],"union.txt",quote=F,col.names=F,row.names=F) -- Sent via the guest posting facility at bioconductor.org.

• 1.4k views

ADD COMMENT • link updated 10.4 years ago by Michael Love 41k • written 10.4 years ago by Guest User ★ 13k

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

hi Hui Zhao, I can explain why this happens, but it's hard to say which result is 'true'. Including other samples in this case affects the number of genes passing an FDR threshold, mostly through the estimation of dispersion. If the other samples tend to have counts with small within-group variance, then the estimate of dispersion for each gene will be reduced. If the other samples had larger within-group variance, you would expect the opposite effect: higher estimates of dispersion and less genes passing an FDR threshold. We recommend using all samples to estimate the dispersion, as generally more samples reduces the variance of estimators. The model assumes though that the dispersion parameter for a given gene is the same across the groups. Mike On Sun, Dec 8, 2013 at 10:38 PM, Hui Zhao [guest] <guest@bioconductor.org>wrote: > > I have a file about readcount values with eight > samples(A1,A2,B1,B2,C1,C2,D1,D2),I want to know the differential genes > between A and B.Normally ,I should extract sample A1,A2,B1,B2 from the > file.Now I use all samples to normalize the readcounts and fit the model ,I > find that I find more DE genes.I want to know if my code is true and why? > In this part I find 322 genes using all samples, while I find 77 genes > using specfic samples. > > > -- output of sessionInfo(): > > ###each### > ###analysie with specific coloums### > library('DESeq') > x=read.delim("readcount.xls",row.names=1) > x=round(x[,1:4]) > group=factor(c("A","A","B","B")) > cds <- newCountDataSet(x, group) > cds <- estimateSizeFactors(cds) > cds <- estimateDispersions(cds) > res <- nbinomTest(cds,'A','B') > a<-subset(res,padj<0.05) > dim(a) > write.table(a[,1],"each.txt",quote=F,col.names=F,row.names=F) > > ###union### > ###analysis with all coloums### > library('DESeq') > x=read.delim("readcount.xls",row.names=1) > x=round(x) > group=c("A","A","B","B","C","C","D","D") > cds <- newCountDataSet(x, group) > cds <- estimateSizeFactors(cds) > cds <- estimateDispersions(cds) > res <- nbinomTest(cds,'A','B') > a<-subset(res,padj<0.05) > dim(a) > write.table(a[,1],"union.txt",quote=F,col.names=F,row.names=F) > > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.4 years ago Michael Love 41k

Login before adding your answer.