Question: Regression analysis problem

0

khadeeja ismail •

**400**wrote:Hi everyone,
I am wondering if anyone can help me understand the following output
from limma.
The problem goes like this:
Say I have methylation values for four genes from three twin pairs. I
created some dummy data so that values of gene1 is clearly different
within twin pairs. My interest here is to find out the genes that are
significantly different within the pairs (i.e. between T1a and T1b,
T2a and T2b and T3a and T3b)
T1a T1b T2a T2b T3a T3b
"gene1" 16 4 18 4 19 5
"gene2" 3 1 4 5 2 6
"gene3" 2 1 4 3 4 5
"gene4" 3 1 2 3 3 4
I created a design matrix:
> Pair<-c("TWIN1","TWIN1","TWIN2","TWIN2","TWIN3","TWIN3")
>Classes<-c(2,1,2,1,2,1)
>BMI<-c(45,20,34,12,25,19)
>Pair<-factor(Pair)
>Classes<-factor(Classes)
>design <- model.matrix(~Pair+Classes)
> design
(Intercept) PairTWIN2 PairTWIN3 Classes2
1 1 0 0 1
2 1 0 0 0
3 1 1 0 1
4 1 1 0 0
5 1 0 1 1
6 1 0 1 0
**********
...and fitted a linear model to the data to get the following results:
************************************************************
ID X.Intercept. PairTWIN2 PairTWIN3 Classes2 AveExpr F
1 gene1 3.333333 1.0 2.0 1.333333e+01 11.000000
106.298724
2 gene2 2.500000 2.5 2.0 -1.000000e+00 3.500000
8.745648
3 gene3 1.333333 2.0 3.0 3.333333e-01 3.166667
7.430245
4 gene4 2.000000 0.5 1.5 9.064933e-16 2.666667
4.799441
P.Value adj.P.Val
1 9.993042e-91 3.997217e-90
2 4.683758e-07 9.367515e-07
3 5.578117e-06 7.437489e-06
4 7.186523e-04 7.186523e-04
*************************************************************
As you can see, all the genes have adjusted p values less than 0.05.
But if I just compute the differnces between the pairs
T1 T2 T3
gene1 12 14 14
gene2 2 -1 -4
gene3 1 1 -1
gene4 2 -1 -1
....and used the design matrix (1,1,1),
gene 1 comes up as significantly different within the pairs.
ID logFC t P.Value adj.P.Val B
1 gene1 13.3333333 10.6660452 5.234557e-06 2.093823e-05 46.016217
2 gene2 -1.0000000 -0.7999534 4.468388e-01 8.936777e-01 -5.513313
3 gene3 0.3333333 0.2666511 7.964821e-01 1.000000e+00 -5.772418
4 gene4 0.0000000 0.0000000 1.000000e+00 1.000000e+00 -5.804806
So, my questions are:
Why am I not getting the same (expected) result from my first
approach? Or am I misinterpreting the output?
What does the p value in the 1st approach signify?
And most importantly, am I applying the right model?
Any help on this would be most appreciated.
Thanks in advance,
Khadeeja
[[alternative HTML version deleted]]

ADD COMMENT
• link
•
modified 7.4 years ago
by
Sean Davis ♦

**21k**• written 7.4 years ago by khadeeja ismail •**400**