dear group,
I am sorry to ask again design related question. the data is from SMD.
three or two different samples have been obtained from single patient.
Say :
from patient 1 - (A). a normal tissue, (B). inflamed tissue and (C).
cancer tissue was extracted
from Patient 2 - (A). a normal tissue (B). cancer tissue was only
extracted and like wise.
A universal reference sample was used to hybridize on Green channel.
This is a paired design and a reference design. Limma manual describes
examples unique to one specific design.
I do not know how to combine two different designs.
My targets file:
FileName Cy3 Cy5 SibShip (patient)
61453.xls Ref B 12
61454.xls Ref ACA 12
61459.xls Ref N 15
61460.xls Ref ACA 15
61461.xls Ref N 16
61462.xls Ref ACA 16
61463.xls Ref N 17
61464.xls Ref B 17
61465.xls Ref ACA 17
I want to identify BvsN, ACAvsN, ACAvsB.
how could I get design matrix for this type of design.
This is one of those studies where rare cancers have been studied (in
2003).
Unfortunately, this is public dataset (Published in Oncogene) where
experiments have been done using stanford microarray database.
thank you in advance.
Adrian.
Hi Adrian,
Adrian Johnson wrote:
> dear group,
>
> I am sorry to ask again design related question. the data is from
SMD.
> three or two different samples have been obtained from single
patient.
> Say :
> from patient 1 - (A). a normal tissue, (B). inflamed tissue and
(C).
> cancer tissue was extracted
> from Patient 2 - (A). a normal tissue (B). cancer tissue was only
> extracted and like wise.
> A universal reference sample was used to hybridize on Green channel.
>
> This is a paired design and a reference design. Limma manual
describes
> examples unique to one specific design.
Yes, but the 'limma User's Guide' also notes that the reference design
is pretty much the same as a one-color analysis, but that you have to
account for dye-swaps. Since you don't have dye-swaps, then it _is_
the
same as a one-color analysis. The only wrinkle here is that you have
blocked data (which is also covered in the limma User's Guide).
If you had doubts, you could have approached this iteratively. First
let's see what limma thinks you should be using:
> modelMatrix(targets, ref="Ref")
Found unique target names:
ACA B N Ref
ACA B N
[1,] 0 1 0
[2,] 1 0 0
[3,] 0 0 1
[4,] 1 0 0
[5,] 0 0 1
[6,] 1 0 0
[7,] 0 0 1
[8,] 0 1 0
[9,] 1 0 0
So this is a pretty simple model matrix, but it doesn't account for
the
blocks.
> Cy5=factor(c("B","ACA","N","ACA","N","ACA","N","B","ACA"))
> sibship=factor(rep(c(12,15,16,17), c(2,2,2,3)))
> model.matrix(~0 + Cy5 + sibship)
Cy5ACA Cy5B Cy5N sibship15 sibship16 sibship17
1 0 1 0 0 0 0
2 1 0 0 0 0 0
3 0 0 1 1 0 0
4 1 0 0 1 0 0
5 0 0 1 0 1 0
6 1 0 0 0 1 0
7 0 0 1 0 0 1
8 0 1 0 0 0 1
9 1 0 0 0 0 1
Now this is identical to the above, but with three extra columns to
capture the sib-specific means. Note that you could have simply added
the three extra columns for the sibs to the previous model matrix.
Also note that your contrast matrix will have to have 6 rows (with the
last three being all zeros).
Best,
Jim
> I do not know how to combine two different designs.
>
> My targets file:
>
> FileName Cy3 Cy5 SibShip (patient)
> 61453.xls Ref B 12
> 61454.xls Ref ACA 12
> 61459.xls Ref N 15
> 61460.xls Ref ACA 15
> 61461.xls Ref N 16
> 61462.xls Ref ACA 16
> 61463.xls Ref N 17
> 61464.xls Ref B 17
> 61465.xls Ref ACA 17
>
>
>
> I want to identify BvsN, ACAvsN, ACAvsB.
>
> how could I get design matrix for this type of design.
>
> This is one of those studies where rare cancers have been studied
(in 2003).
> Unfortunately, this is public dataset (Published in Oncogene) where
> experiments have been done using stanford microarray database.
>
> thank you in advance.
>
> Adrian.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
Hi Jim,
I've seen you suggest this way for account for blocks by fitting
extra columns in the design matrix before. I'm just wondering how
this differs from the suggestion in the limma vignette (Section 8.2
Technical Replication) to use duplicateCorrelation() to determine the
average correlation between blocks. I know they are not
mathematically equivalent; the coefficients for the treatment groups
are slightly different, they use different DF, and the p-values tend
to be larger using the duplicateCorrelation() method (at least for
the one experiment I'm using). So, is one more "correct" than the
other? Or are blocks of technical replicates different somehow than
blocks of patients or cell lines, etc.?
Thanks,
Jenny
At 08:05 AM 11/25/2008, James W. MacDonald wrote:
>Hi Adrian,
>
>Adrian Johnson wrote:
>>dear group,
>>I am sorry to ask again design related question. the data is from
SMD.
>>three or two different samples have been obtained from single
patient.
>>Say :
>>from patient 1 - (A). a normal tissue, (B). inflamed tissue and
(C).
>>cancer tissue was extracted
>>from Patient 2 - (A). a normal tissue (B). cancer tissue was only
>>extracted and like wise.
>>A universal reference sample was used to hybridize on Green channel.
>>This is a paired design and a reference design. Limma manual
describes
>>examples unique to one specific design.
>
>Yes, but the 'limma User's Guide' also notes that the reference
>design is pretty much the same as a one-color analysis, but that you
>have to account for dye-swaps. Since you don't have dye-swaps, then
>it _is_ the same as a one-color analysis. The only wrinkle here is
>that you have blocked data (which is also covered in the limma User's
Guide).
>
>If you had doubts, you could have approached this iteratively. First
>let's see what limma thinks you should be using:
>
> > modelMatrix(targets, ref="Ref")
>Found unique target names:
> ACA B N Ref
> ACA B N
> [1,] 0 1 0
> [2,] 1 0 0
> [3,] 0 0 1
> [4,] 1 0 0
> [5,] 0 0 1
> [6,] 1 0 0
> [7,] 0 0 1
> [8,] 0 1 0
> [9,] 1 0 0
>
>So this is a pretty simple model matrix, but it doesn't account for
>the blocks.
>
> > Cy5=factor(c("B","ACA","N","ACA","N","ACA","N","B","ACA"))
> > sibship=factor(rep(c(12,15,16,17), c(2,2,2,3)))
> > model.matrix(~0 + Cy5 + sibship)
> Cy5ACA Cy5B Cy5N sibship15 sibship16 sibship17
>1 0 1 0 0 0 0
>2 1 0 0 0 0 0
>3 0 0 1 1 0 0
>4 1 0 0 1 0 0
>5 0 0 1 0 1 0
>6 1 0 0 0 1 0
>7 0 0 1 0 0 1
>8 0 1 0 0 0 1
>9 1 0 0 0 0 1
>
>Now this is identical to the above, but with three extra columns to
>capture the sib-specific means. Note that you could have simply
>added the three extra columns for the sibs to the previous model
matrix.
>
>Also note that your contrast matrix will have to have 6 rows (with
>the last three being all zeros).
>
>Best,
>
>Jim
>
>
>>I do not know how to combine two different designs.
>>My targets file:
>>FileName Cy3 Cy5 SibShip (patient)
>>61453.xls Ref B 12
>>61454.xls Ref ACA 12
>>61459.xls Ref N 15
>>61460.xls Ref ACA 15
>>61461.xls Ref N 16
>>61462.xls Ref ACA 16
>>61463.xls Ref N 17
>>61464.xls Ref B 17
>>61465.xls Ref ACA 17
>>
>>I want to identify BvsN, ACAvsN, ACAvsB.
>>how could I get design matrix for this type of design.
>>This is one of those studies where rare cancers have been studied
(in 2003).
>>Unfortunately, this is public dataset (Published in Oncogene) where
>>experiments have been done using stanford microarray database.
>>thank you in advance.
>>Adrian.
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Hildebrandt Lab
>8220D MSRB III
>1150 W. Medical Center Drive
>Ann Arbor MI 48109-0646
>734-936-8662
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu
Hi Jenny,
The way I understand it, the difference is that the way I suggested is
simply a fixed-effects model, where we assume that the variance is
constant for all of the groups.
If you compute the intra-group correlation using
duplicateCorrelation(),
you will then fit a mixed linear model that allows for different
variance (or correlation) within the groups.
I don't know if one is more correct than the other. Certainly the
fixed
effects model makes more assumptions. I think you can use
duplicateCorrelation() to see if the intra-group correlation is high,
which would argue for fitting a mixed linear model instead.
Best,
Jim
Jenny Drnevich wrote:
> Hi Jim,
>
> I've seen you suggest this way for account for blocks by fitting
extra
> columns in the design matrix before. I'm just wondering how this
differs
> from the suggestion in the limma vignette (Section 8.2 Technical
> Replication) to use duplicateCorrelation() to determine the average
> correlation between blocks. I know they are not mathematically
> equivalent; the coefficients for the treatment groups are slightly
> different, they use different DF, and the p-values tend to be larger
> using the duplicateCorrelation() method (at least for the one
experiment
> I'm using). So, is one more "correct" than the other? Or are blocks
of
> technical replicates different somehow than blocks of patients or
cell
> lines, etc.?
>
> Thanks,
> Jenny
>
> At 08:05 AM 11/25/2008, James W. MacDonald wrote:
>> Hi Adrian,
>>
>> Adrian Johnson wrote:
>>> dear group,
>>> I am sorry to ask again design related question. the data is from
SMD.
>>> three or two different samples have been obtained from single
patient.
>>> Say :
>>> from patient 1 - (A). a normal tissue, (B). inflamed tissue and
(C).
>>> cancer tissue was extracted
>>> from Patient 2 - (A). a normal tissue (B). cancer tissue was only
>>> extracted and like wise.
>>> A universal reference sample was used to hybridize on Green
channel.
>>> This is a paired design and a reference design. Limma manual
describes
>>> examples unique to one specific design.
>>
>> Yes, but the 'limma User's Guide' also notes that the reference
design
>> is pretty much the same as a one-color analysis, but that you have
to
>> account for dye-swaps. Since you don't have dye-swaps, then it _is_
>> the same as a one-color analysis. The only wrinkle here is that you
>> have blocked data (which is also covered in the limma User's
Guide).
>>
>> If you had doubts, you could have approached this iteratively.
First
>> let's see what limma thinks you should be using:
>>
>> > modelMatrix(targets, ref="Ref")
>> Found unique target names:
>> ACA B N Ref
>> ACA B N
>> [1,] 0 1 0
>> [2,] 1 0 0
>> [3,] 0 0 1
>> [4,] 1 0 0
>> [5,] 0 0 1
>> [6,] 1 0 0
>> [7,] 0 0 1
>> [8,] 0 1 0
>> [9,] 1 0 0
>>
>> So this is a pretty simple model matrix, but it doesn't account for
>> the blocks.
>>
>> > Cy5=factor(c("B","ACA","N","ACA","N","ACA","N","B","ACA"))
>> > sibship=factor(rep(c(12,15,16,17), c(2,2,2,3)))
>> > model.matrix(~0 + Cy5 + sibship)
>> Cy5ACA Cy5B Cy5N sibship15 sibship16 sibship17
>> 1 0 1 0 0 0 0
>> 2 1 0 0 0 0 0
>> 3 0 0 1 1 0 0
>> 4 1 0 0 1 0 0
>> 5 0 0 1 0 1 0
>> 6 1 0 0 0 1 0
>> 7 0 0 1 0 0 1
>> 8 0 1 0 0 0 1
>> 9 1 0 0 0 0 1
>>
>> Now this is identical to the above, but with three extra columns to
>> capture the sib-specific means. Note that you could have simply
added
>> the three extra columns for the sibs to the previous model matrix.
>>
>> Also note that your contrast matrix will have to have 6 rows (with
the
>> last three being all zeros).
>>
>> Best,
>>
>> Jim
>>
>>
>>> I do not know how to combine two different designs.
>>> My targets file:
>>> FileName Cy3 Cy5 SibShip (patient)
>>> 61453.xls Ref B 12
>>> 61454.xls Ref ACA 12
>>> 61459.xls Ref N 15
>>> 61460.xls Ref ACA 15
>>> 61461.xls Ref N 16
>>> 61462.xls Ref ACA 16
>>> 61463.xls Ref N 17
>>> 61464.xls Ref B 17
>>> 61465.xls Ref ACA 17
>>>
>>> I want to identify BvsN, ACAvsN, ACAvsB.
>>> how could I get design matrix for this type of design.
>>> This is one of those studies where rare cancers have been studied
(in
>>> 2003).
>>> Unfortunately, this is public dataset (Published in Oncogene)
where
>>> experiments have been done using stanford microarray database.
>>> thank you in advance.
>>> Adrian.
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Hildebrandt Lab
>> 8220D MSRB III
>> 1150 W. Medical Center Drive
>> Ann Arbor MI 48109-0646
>> 734-936-8662
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at illinois.edu
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
Hi Jenny,
Should blocks be fixed (in the design matrix) or treated as random
(hence
enter the covariance matrix as correlations)? This question has a
long
history in mathematical statistics, so long that you can be sure than
the
answer is somewhat subtle.
Neither approach is right or wrong. The random approach makes more
assumptions and allows you, in some circumstances, to extract more
information. The limma approach with dupcor etc makes even more
assumptions than classical random effects models. If the blocks are
treated as fixed, then treatments can only be compared within blocks.
If
blocks are treated as random, then it is possible to compare
treatments
between blocks as well as within.
So the first key issue is whether treatment comparisons are made
between
blocks or within blocks.
Suppose you do an experiment on random samples of subjects from two
groups, in which each subject is subjected to several tests. The
subjects
are blocks. The total sums of squares can be divided into between and
within subject sums of squares. In other words, the information in
the
data can be divided into a between-subject error strata and a
within-subject strata.
Suppose you want to compare the two groups. All the information is in
the
between-subject error strata. You cannot do any statistical test
unless
you treat the subjects as random.
Suppose now you want to compare the treatments. If the experiment is
balanced (all subjects do all tests), then all the information about
the
treatments is in the within-block strata. So you may as well treat
the
subjects as fixed effects (as for example is done in a paired t-test).
If the experiment is unbalanced (each subject does only a subset of
the
tests, subjects do tests a different number of times), then you can
extract more information about the treatment comparisons from the
between-subject error strata. To do this, you have to treat the
blocks as
random.
The second key issue to consider is whether it makes sense
scientifically
to treat the blocks as random. If there are only two or three blocks,
then there is little to be gained by treating them as random. If the
blocks have large unpredictable effects, then it is much safer to
treat
them as fixed. If you want to make specific conclusions about each of
the
blocks, then it doesn't make sense to treat them as a random. In
general,
random is natural if there are lots of blocks with relatively small
effects and not of interest in themselves. Sometimes you can go
either
way.
Hope this helps
Gordon
On Tue, 25 Nov 2008, Jenny Drnevich wrote:
> Hi Jim,
>
> I've seen you suggest this way for account for blocks by fitting
extra
> columns in the design matrix before. I'm just wondering how this
differs from
> the suggestion in the limma vignette (Section 8.2 Technical
Replication) to
> use duplicateCorrelation() to determine the average correlation
between
> blocks. I know they are not mathematically equivalent; the
coefficients for
> the treatment groups are slightly different, they use different DF,
and the
> p-values tend to be larger using the duplicateCorrelation() method
(at least
> for the one experiment I'm using). So, is one more "correct" than
the other?
> Or are blocks of technical replicates different somehow than blocks
of
> patients or cell lines, etc.?
>
> Thanks,
> Jenny