Entering edit mode

Dea Gordon, Ryan and Nicolas,
Than you all for the detailed advice.
I have one more question regarding the blocking factor model. In my
case I
have, actually, 2 external factors to consider - one is the platform,
the
other one are the subjects.
My sample matrix is the following (I've attached the CSV in case you
can't
view the image):
I am only interested in comparing treatments B:D to A (the latter are
controls). So far I've never had a model with more than one external
factor. I imagine it should be OK to have more - is this correct? If
yes -
can you, perhaps, check whether I am setting the model matrix
correctly?
(Apologies if this sounds too trivial) I imagine it shall be defined
as:
Platform <- factor(targets$Platform)
> Subject <- factor(targets$Subject)
> Treatment <- factor(targets$Treatment)
> design <- model.matrix(~Platform+Subject+Treatment)
..
> fit <- glmFit(y, design)
> lrt <- glmLRT(fit, coef=24) # for comparing Treatment B to Treatment
A
Is this correct?
On Sun, Aug 31, 2014 at 12:44 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
> Dear Nick,
>
> If you go back to the post from 2010 that you give the URL for, you
will
> see that I was giving very briefly the same advice about checking
Poisson
> variability that Ryan has explained at greater detail.
>
> You don't give any information about read lengths, sequence depths
or
> alignment methods. I would be surprised if MiSeq and HiSeq would
generate
> perfect Poisson replicates of one another, especially if the read
lengths
> from the two platform are different or the alignment and counting
software
> has been varied. So you may well end up back at the blocking idea.
>
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
> On Sun, 31 Aug 2014, Ryan wrote:
>
> Thanks to the underlying theory behind dispersion estimation, you
can
>> easily test whether your "technical replicates" really do represent
>> technical replicates. Specifically, read counts in technical
replicates
>> should follow a Poisson distribution, which is a special case of
the
>> negative binomial with zero dispersion. So, simply fit a model
using edgeR
>> or DESeq2 with a separate coefficient for each group of technical
>> replicates. Thus all the experimental variation will be absorbed
into the
>> model coefficients and the only thing left will be the technical
>> variability of of the replicates. For true technical replicates,
the
>> dispersion should be zero for all genes. So if you estimate
dispersions
>> using this model, and plotBCV/plotDispEsts shows the dispersion
very near
>> to zero, then you can be confident that you really have technical
>> replicates. If the dispersion is nonzero, then there is some
additional
>> source of unaccounted-for variation.
>>
>> I have used this method on a pilot dataset with several technical
>> replicates for each condition. edgeR said the dispersion was
something like
>> 10^-3 or less for all genes except for the very low-expressed
genes.
>>
>> -Ryan
>>
>> On 8/28/14, 9:23 AM, Nick N wrote:
>>
>>> Hi,
>>>
>>> I have a study where a fraction of the samples have been
replicated on 2
>>> Illumina platforms (HiSeq and Miseq). These are technical
replicates - the
>>> library preparation is the same using the same biological
replicates - it's
>>> only the sequencing which is different.
>>>
>>> My hunch was that I shall introduce the platform as as an
additional
>>> (blocking) factor in the analysis. Than I stumbled upon this post:
>>>
>>> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
>>>
>>> It recommends pooling the replicates. The post seems to apply to a
>>> different case ("pure" technical replicates, i.e. no differences
in the
>>> sequencing platform used) so I probably shall ignore it. But I
still feel a
>>> bit uncertain of the best way to treat the technical replicates.
Can you,
>>> please, advise me on this?
>>>
>>> many thanks!
>>> Nick
>>>
>>
>
______________________________________________________________________
> The information in this email is confidential and intended solely
for the
> addressee.
> You must not disclose, forward, print or use it without the
permission of
> the sender.
>
______________________________________________________________________
>