number of genes for DESeq analysis

0

Entering edit mode

vladimir mashanov ▴ 10

@vladimir-mashanov-5118

Last seen 9.6 years ago

Dear All, I have carried out an RNAseq experiment with 4 conditions, 2 biological replicates of each. In the moment, I am interested in how my conditions differ in terms of expression of a subset of 36 genes. The idea is to count only the reads, which correspond to those 36 genes and use this piece of data for the analysis of their differential expression across the conditions. Will this approach be valid? What is the minimum number of genes required by the statistical model implemented in DESeq? I apologize if the question are too naive. Thank you Vladimir.

RNASeq RNASeq • 1.6k views

ADD COMMENT • link updated 12.2 years ago by Simon Anders ★ 3.7k • written 12.2 years ago by vladimir mashanov ▴ 10

0

Entering edit mode

Abhishek Pratap ▴ 410

@abhishek-pratap-5083

Last seen 9.6 years ago

Hi Vladimir One way to do this without cutting down on your gene set is to do the dispersion estimation and binomial test for all the genes you have in your annotation model and then take a subset of the genes you are interested from the resulting data frame spitted out after the binomial test. I am not sure what kind of impact will it have on the statistical model if you reduce the number of genes. I guess the estimates are taken for each gene but since your gene sample size will be very small may be the model will have issues on estimation a dispersion parameter. I am not sure. Simon or Wolfgang can best answer that. HTH, -Abhi On Tue, Feb 14, 2012 at 6:37 AM, vladimir mashanov <mashanovvlad at="" googlemail.com=""> wrote: > Dear All, > > I have carried out an RNAseq experiment with 4 conditions, 2 > biological replicates of each. In the moment, I am interested in how > my conditions differ in terms of expression of a subset of 36 genes. > The idea is to count only the reads, which correspond to those 36 > genes and use this piece of data for the analysis of their > differential expression across the conditions. Will this approach be > valid? What is the minimum number of genes required by the statistical > model implemented in DESeq? I apologize if the question are too naive. > > Thank you > > Vladimir. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.2 years ago Abhishek Pratap ▴ 410

0

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 3.7 years ago

Zentrum für Molekularbiologie, Universi…

Dear Vladimir > I have carried out an RNAseq experiment with 4 conditions, 2 > biological replicates of each. In the moment, I am interested in how > my conditions differ in terms of expression of a subset of 36 genes. > The idea is to count only the reads, which correspond to those 36 > genes and use this piece of data for the analysis of their > differential expression across the conditions. Will this approach be > valid? What is the minimum number of genes required by the statistical > model implemented in DESeq? I apologize if the question are too naive. What is wrong with doing the analysis for all genes, and then looking only at those that you are interested in? For the dispersion estimation, you should use all available genes. However, at least if you have really selected the list of 36 genes prior to your experiment or at least independently of your RNA-Seq data and do not intend to look at any further genes to decide on the hypothesis you currently have in mind, you might be justified at performing the multiple testing adjustment on the raw p values of only those 36 genes, which would surely improve your power. To do so, subset them from the "pvalue" column of the final result and hand them to the 'p.adjust' function (with 'method="BH"'). Simon

ADD COMMENT • link 12.2 years ago Simon Anders ★ 3.7k

Login before adding your answer.