Dear All,
I have carried out an RNAseq experiment with 4 conditions, 2
biological replicates of each. In the moment, I am interested in how
my conditions differ in terms of expression of a subset of 36 genes.
The idea is to count only the reads, which correspond to those 36
genes and use this piece of data for the analysis of their
differential expression across the conditions. Will this approach be
valid? What is the minimum number of genes required by the statistical
model implemented in DESeq? I apologize if the question are too naive.
Thank you
Vladimir.
Hi Vladimir
One way to do this without cutting down on your gene set is to do the
dispersion estimation and binomial test for all the genes you have
in your annotation model and then take a subset of the genes you are
interested from the resulting data frame spitted out after the
binomial test.
I am not sure what kind of impact will it have on the statistical
model if you reduce the number of genes. I guess the estimates are
taken for each gene but since your gene sample size will be very small
may be the model will have issues on estimation a dispersion
parameter. I am not sure. Simon or Wolfgang can best answer that.
HTH,
-Abhi
On Tue, Feb 14, 2012 at 6:37 AM, vladimir mashanov
<mashanovvlad at="" googlemail.com=""> wrote:
> Dear All,
>
> I have carried out an RNAseq experiment with 4 conditions, 2
> biological replicates of each. In the moment, I am interested in how
> my conditions differ in terms of expression of a subset of 36 genes.
> The idea is to count only the reads, which correspond to those 36
> genes and use this piece of data for the analysis of their
> differential expression across the conditions. Will this approach be
> valid? What is the minimum number of genes required by the
statistical
> model implemented in DESeq? I apologize if the question are too
naive.
>
> Thank you
>
> Vladimir.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Dear Vladimir
> I have carried out an RNAseq experiment with 4 conditions, 2
> biological replicates of each. In the moment, I am interested in how
> my conditions differ in terms of expression of a subset of 36 genes.
> The idea is to count only the reads, which correspond to those 36
> genes and use this piece of data for the analysis of their
> differential expression across the conditions. Will this approach be
> valid? What is the minimum number of genes required by the
statistical
> model implemented in DESeq? I apologize if the question are too
naive.
What is wrong with doing the analysis for all genes, and then looking
only at those that you are interested in?
For the dispersion estimation, you should use all available genes.
However, at least if you have really selected the list of 36 genes
prior
to your experiment or at least independently of your RNA-Seq data and
do
not intend to look at any further genes to decide on the hypothesis
you
currently have in mind, you might be justified at performing the
multiple testing adjustment on the raw p values of only those 36
genes,
which would surely improve your power. To do so, subset them from the
"pvalue" column of the final result and hand them to the 'p.adjust'
function (with 'method="BH"').
Simon