Entering edit mode
Dear Iddo,
No, it is not valid to use a different design matrix for the
dispersion
estimation.
edgeR will handle your model with 400 samples, but it will admitedly
be
slow. If this is too slow, then switch to voom() in the limma package,
which will be very fast, or to glmQLFTest() in the edgeR package,
which
will still be relatively slow but faster than the glm routines in
edgeR
(or DESeq2).
Best wishes
Gordon
> From: Iddo Ben-dov <iddobe at="" ekmd.huji.ac.il="">
> Subject: edgeR and DESeq2: model design and estimation of dispersion
> Date: June 12, 2014 at 4:51:51 PM GMT+3
> To: bioconductor at r-project.org
>
> hi,
>
> in both edgeR and DESeq2, estimation of dispersion precedes negative
> binomial GLM fitting.
>
> my question is, can I use a design formula when estimating
dispersion
> which is different from the formula used for GLM fitting?
specifically,
> I would like to use a simplified design when estimating dispersion
and a
> full design for GLM fitting.
>
> my motivation for doing so is that with the full design estimation
of
> dispersion is too demanding for my computer and time.
>
> my dataset includes 400 mRNAseq profiles (~22,000 genes). there are
100
> controls and 100 cases, and each was sampled twice - before and
after
> intervention.
>
> thus, the full design is:
> ~ group*intervention + individual:group (blocking factor)
>
> as I mentioned, estimation of dispersion with the above design is
not
> practical, and I thus would like to simplify to: ~
group*intervention
>
> and introduce the 'individual' blocking factor only for NB GLM
fitting.
>
> is this statistically valid?
>
> appreciate any help,
> iddo
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}