Dear Mike,
On Mon, 19 May 2014, Mike Miller wrote:
> Dear Gordon,
>
>
> Thank you for your help.
>
> The contrast was: (control_1.localization_A) minus
> (control_1.localization_B). It was not a simple comparison between
> groups, the complete design looks like (in previous email I tried to
> simplify the question, for that reason, I excluded some factors in
the
> question):
Unfortunately, the phenomenon that you say you have observed could not
have occured in the simplified situation you reported, which made
answering the question difficult.
> design_combi=model.matrix(~0+ ctrl_loc + Gender+Sample_blocking,
data=combi)
>
>> colnames(design_combi)
> [1] "control_0.localization_A" "control_0.localization_B"
> "control_1.localization_A"
> [4] "control_1.localization_B" "Gender1"
> "Sample_blocking15"
> [7] "Sample_blocking16" "Sample_blocking17" "Sample_blocking18"
> [10] "Sample_blocking19" "Sample_blocking2" "Sample_blocking20"
> [13] "Sample_blocking21" "Sample_blocking22" "Sample_blocking23"
> [16] "Sample_blocking24" "Sample_blocking27" "Sample_blocking28"
> [19] "Sample_blocking3" "Sample_blocking33" "Sample_blocking34"
> [22] "Sample_blocking35" "Sample_blocking36" "Sample_blocking37"
> [25] "Sample_blocking38" "Sample_blocking39" "Sample_blocking6"
> [28] "Sample_blocking8"
>
> Factor "ctrl_loc" contains combinations of 2 factors (thus has 4
levels):
> 1. disease_state (levels: control_0 and control_1)
> 2. localization (levels: A and B)
> Levels: control_0.localization_A, control_1.localization_A,
> control_0.localization_B, control_1.localization_B.
>
> Factor "Gender" has 2 levels: 0 and 1.
> Factor "Sample_blocking" is here to account for the batch effect
(some
> samples are coming from the same patient).
> Since the "Design matrix not of full rank" I excluded 2 columns from
the
> "design_combi" (columns number 25 and 26) to be able to compute
dispersions.
This means that you are treating blocks 38 and 39 as if they were the
same
as block 1. You seem to be trying to adjust for too many things.
> After fitting the model, I defined contrasts:
> my_contrasts=makeContrasts(disease.A_minus_disease.B=control_1.local
ization_A
> - control_1.localization_B, levels=design_combi).
>
> lrt=glmLRT( fit, contrast=my_contrasts[,
"disease.A_minus_disease.B"])
>
> I was wondering how is logFC computed in this case (in this
contrast), and
> I wanted just to confirm that when the logFC >0, that means that the
gene
> is up-regulated in control_1.localization_A compared to
> control_1.localization_B.
What it means is that the gene would be higher in localization A than
in
B, if both localizations were applied to cases of the same gender and
the
same block.
Given that the localizations A and B might actually be unbalanced by
gender and block, the observed marginal differences between A and B
could
be quite different.
> To check that, I tried to manually calculate just for one gene the
logFC
> value, but obviously I do not know how it is computed in this case.
You can't compute it manually. The whole point of a package like
edgeR is
to do computations you can't do easily by yourself.
Best wishes
Gordon
> Thank you very much in advance!
>
> Best regards,
> Mike
>
>
> On Sun, May 18, 2014 at 5:55 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>
>> Dear Mike,
>>
>> edgeR does compute logFC correctly, and a positive logFC does
obviously
>> mean that the counts are higher, other things being equal, in the
second
>> group than the first group.
>>
>> If you are making the comparison B-A (B minus A), then a positive
logFC
>> means expression is higher in B than in A. The topTable() function
always
>> output the contrast you are making at the top of the table, so that
the
>> table is unambiguous.
>>
>> I can't tell from your email what comparison you actually made.
What is
>> the non-ascii character is your email between
"control_1.localization_A"
>> and "control_1.localization_B"? Was it supposed to be a minus sign
or
>> something else?
>>
>> I don't know whether you are making a simple comparison between two
>> groups, or whether you have fitted a more complex linear model.
This can
>> affect the interpration of logFCs.
>>
>> Best wishes
>> Gordon
>>
>>
>> Date: Fri, 16 May 2014 12:06:14 +0200
>>> From: Mike Miller <mike.bioc32 at="" gmail.com="">
>>> To: <bioconductor at="" stat.math.ethz.ch="">
>>> Subject: [BioC] edgeR, logFC calculation in factor combination
>>> Message-ID:
>>> <cankskzosyxfqkn3y=itkq4qj7dpp+da+umgpjku0de98ruu5bg at="" mail.="">>> gmail.com>
>>> Content-Type: text/plain
>>>
>>> Dear All,
>>>
>>>
>>> I am using edgeR for an RNASeq experiment (~30 samples), where I
need to
>>> explore the influence of 2 factors with 2 levels each (there are
actually
>>> more factors, but for simplicity I put only 2 here):
>>>
>>> 1. disease_state (levels: control_0 and control_1)
>>>
>>> 2. localization (levels: A and B)
>>>
>>>
>>> I combined the factors and got 4 combinations:
>>>
>>> control_0.localization_A, control_1.localization_A,
>>> control_0.localization_B, control_1.localization_B.
>>>
>>> I am specifically interested in a list of diff. expressed genes
between
>>> these 2 combinations: control_1.localization_A ???
>>> control_1.localization_B.
>>>
>>> When separating up- and down-regulated genes, I did it simply
according to
>>> logFC column (after filtering for genes with padj < 0.05). If the
logFC<0,
>>> could I infer that the gene is DOWN-regulated in
control_1.localization_A?
>>> I assume that this is true.
>>>
>>> However, when I take the raw counts for one of the diff. expressed
genes
>>> (for which logFC<0), in control_1.localization_A and in
>>> control_1.localization_B conditions, it turned out to be the
following:
>>>
>>> mean(control_1.localization_A) / mean(control_1.localization_B) =
4.2
>>>
>>> (Formula explained in words: mean of raw counts for the diff.
expressed
>>> gene X in samples which are control_1 and localization_A, divided
by mean
>>> of raw counts for the diff. expressed gene X in samples which are
>>> control_1
>>> and localization_B)
>>>
>>> This is contrary to the conclusion I got from logFC, since
according to
>>> raw counts that gene is UP-regulated in control_1.localization_A.
>>>
>>> I know that logFC is calculated differently, but shouldn't the
ratio
>>> (whether the gene is up- or down-regulated) stay nevertheless the
same (if
>>> library sizes are very similar in all samples, which is the case
here)?
>>>
>>> Maybe a more precise question would be: how is logFC calculated in
this
>>> case, when we have a combination of different factors?
>>>
>>> Thank you very much in advance for any piece of clarification!
>>>
>>> Mike
>>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the
>> addressee.
>> You must not disclose, forward, print or use it without the
permission of
>> the sender.
>>
______________________________________________________________________
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:5}}