Entering edit mode
Marcus
▴
150
@marcus-410
Last seen 10.3 years ago
Hi limma users.
I have some general questions regarding the use of dupcor.series and
gls.series functions.
I use an experimental design that compares three different conditions
to a
fourth (which I will call a reference).
So the design of the experiment looks something like:
Cond 1 Cond2 Cond2
Ref
(There are supposed to be double arrows between the different
conditions
and the reference.)
The experiment is done in four (and between some conditions, five)
biological replicates, so in total I have made 4 times 6 (dye-swap) =
24
hybridizations which compares all the conditions with the reference,
plus
four additional hybridizations which is a dye-swap between cond2 and
reference plus cond3 and the reference. This gives in total 27 slides
since
one slide was excluded due to a terrible hybridization. Each slide has
a
duplicate spot.
This data has been filtered using different filter cut-offs and the
remaining data has been put into a matrix of M-values. The data has
been
normalized and dye-swapped.
From this point I have made two different analysis paths.
One analysis path was to divide the M-matrix into three different
matrices
which contains data from the different conditions and I continued
using the
matrices with the following
command:
dupcor.series(M.matrix.cond1,design,ndups=2,spacing=13824)->corr
Where the design was something like: design <- c(1,1,1,1,1,1,1,1)
The correlation is later used in the function:
gls.series(M.matrix.cond1,
design, ndups=2,spacing=13824,corr$cor)->fit
and continuing with: eb<-ebayes(fit)
the data from ebayes is later used in the toptable
function:
toptable(number=100,genelist=list[,],fit=fit,eb=eb,adjust="fdr")
When I make this analysis I get a list of 82 genes for cond1 that has
a
B-value larger than zero.
But if I use another strategy where I use all the slides in the same
analysis and use a design matrix which looks something like:
Con1 Con2 Con3
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 0 0
7 1 0 0
8 1 0 0
9 0 1 0
10 0 1 0
11 0 1 0
12 0 1 0
13 0 1 0
14 0 1 0
15 0 1 0
16 0 1 0
17 0 1 0
18 0 1 0
19 0 0 1
20 0 0 1
21 0 0 1
22 0 0 1
23 0 0 1
24 0 0 1
25 0 0 1
26 0 0 1
27 0 0 1
And make the function calls like the procedures above except the
toptable
where I specify the coef number to be the condition which I want to
investigate. The thing is, when I use this approach I get more genes
that
have a b-value greater than zero, namely 110.
My guess is that when I run the dupcor.series the correlations is
based on
all the slides in some way, since corr$cor is a lonely number and not
a
correlation for every comparison in the experimental design. The cor
numbers from the different analysis pathways are not equal either.
When I run the gls.series function I get a fit$sigma that is based on
all
the slides or?
What is actually happening when the sigma is calculated. The sigma is
also
generally smaller when all the slides are included compared to the
analysis
path where I used one condition at a time, which I guess should be the
case
if all slides are included.
The lods which is calculated in the ebayes function is larger then the
lods
which originates from the analysis path where I examined one condition
at a
time, so this result in more genes that is larger than zero.
So to my questions (sorry for the long introduction):
Is the design matrix correct when the experiment is performed the way
it
is? According to the user guide, this is an optional way when you have
a
reference design but since the result is deviating from the one
comparison
at a time I am not sure.
If the design is correct, which of the two analysis pathways is the
correct/better one?
I dont consider myself as a statistician, more like a biologist, but I
reckon the sigma factor as a crucial variable in these analyses. I
changed
the fit$sigma for the "one condition" analysis path to the sigma that
originates from the analysis that contains all the slides and the
number of
genes that reach above B-value zero increased to 111 which is quite
like
the 110 that I got from the other analysis pathway. So I guess that
the
sigma factor is quite responsible for the B-value spectra.
If I would like to compare, lets say the cond2 with cond3 can I then
add a
another column to the design matrix, and if so, how would it look
like.
During these analysis I have used R-version 1.8 and limma version
1.3.1
I would be very grateful for some response.
Regards
Marcus
**********************************************************************
*********************
Marcus Gry Bj?rklund
Royal Institute of Technology
AlbaNova University Center
Stockholm Center for Physics, Astronomy and Biotechnology
Department of Molecular Biotechnology
106 91 Stockholm, Sweden
Phone (office): +46 8 553 783 45
Fax: + 46 8 553 784 81
Visiting adress: Roslagstullsbacken 21, Floor 3
Delivery adress: Roslagsv?gen 30B