limma design question

0

Entering edit mode

Adrian Johnson ▴ 330

@adrian-johnson-2728

Last seen 4.7 years ago

dear group, I am sorry to ask again design related question. the data is from SMD. three or two different samples have been obtained from single patient. Say : from patient 1 - (A). a normal tissue, (B). inflamed tissue and (C). cancer tissue was extracted from Patient 2 - (A). a normal tissue (B). cancer tissue was only extracted and like wise. A universal reference sample was used to hybridize on Green channel. This is a paired design and a reference design. Limma manual describes examples unique to one specific design. I do not know how to combine two different designs. My targets file: FileName Cy3 Cy5 SibShip (patient) 61453.xls Ref B 12 61454.xls Ref ACA 12 61459.xls Ref N 15 61460.xls Ref ACA 15 61461.xls Ref N 16 61462.xls Ref ACA 16 61463.xls Ref N 17 61464.xls Ref B 17 61465.xls Ref ACA 17 I want to identify BvsN, ACAvsN, ACAvsB. how could I get design matrix for this type of design. This is one of those studies where rare cancers have been studied (in 2003). Unfortunately, this is public dataset (Published in Oncogene) where experiments have been done using stanford microarray database. thank you in advance. Adrian.

Microarray Cancer limma Microarray Cancer limma • 1.1k views

ADD COMMENT • link updated 16.1 years ago by James W. MacDonald 67k • written 16.1 years ago by Adrian Johnson ▴ 330

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 5 days ago

United States

Hi Adrian, Adrian Johnson wrote: > dear group, > > I am sorry to ask again design related question. the data is from SMD. > three or two different samples have been obtained from single patient. > Say : > from patient 1 - (A). a normal tissue, (B). inflamed tissue and (C). > cancer tissue was extracted > from Patient 2 - (A). a normal tissue (B). cancer tissue was only > extracted and like wise. > A universal reference sample was used to hybridize on Green channel. > > This is a paired design and a reference design. Limma manual describes > examples unique to one specific design. Yes, but the 'limma User's Guide' also notes that the reference design is pretty much the same as a one-color analysis, but that you have to account for dye-swaps. Since you don't have dye-swaps, then it _is_ the same as a one-color analysis. The only wrinkle here is that you have blocked data (which is also covered in the limma User's Guide). If you had doubts, you could have approached this iteratively. First let's see what limma thinks you should be using: > modelMatrix(targets, ref="Ref") Found unique target names: ACA B N Ref ACA B N [1,] 0 1 0 [2,] 1 0 0 [3,] 0 0 1 [4,] 1 0 0 [5,] 0 0 1 [6,] 1 0 0 [7,] 0 0 1 [8,] 0 1 0 [9,] 1 0 0 So this is a pretty simple model matrix, but it doesn't account for the blocks. > Cy5=factor(c("B","ACA","N","ACA","N","ACA","N","B","ACA")) > sibship=factor(rep(c(12,15,16,17), c(2,2,2,3))) > model.matrix(~0 + Cy5 + sibship) Cy5ACA Cy5B Cy5N sibship15 sibship16 sibship17 1 0 1 0 0 0 0 2 1 0 0 0 0 0 3 0 0 1 1 0 0 4 1 0 0 1 0 0 5 0 0 1 0 1 0 6 1 0 0 0 1 0 7 0 0 1 0 0 1 8 0 1 0 0 0 1 9 1 0 0 0 0 1 Now this is identical to the above, but with three extra columns to capture the sib-specific means. Note that you could have simply added the three extra columns for the sibs to the previous model matrix. Also note that your contrast matrix will have to have 6 rows (with the last three being all zeros). Best, Jim > I do not know how to combine two different designs. > > My targets file: > > FileName Cy3 Cy5 SibShip (patient) > 61453.xls Ref B 12 > 61454.xls Ref ACA 12 > 61459.xls Ref N 15 > 61460.xls Ref ACA 15 > 61461.xls Ref N 16 > 61462.xls Ref ACA 16 > 61463.xls Ref N 17 > 61464.xls Ref B 17 > 61465.xls Ref ACA 17 > > > > I want to identify BvsN, ACAvsN, ACAvsB. > > how could I get design matrix for this type of design. > > This is one of those studies where rare cancers have been studied (in 2003). > Unfortunately, this is public dataset (Published in Oncogene) where > experiments have been done using stanford microarray database. > > thank you in advance. > > Adrian. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 16.1 years ago James W. MacDonald 67k

0

Entering edit mode

Hi Jim, I've seen you suggest this way for account for blocks by fitting extra columns in the design matrix before. I'm just wondering how this differs from the suggestion in the limma vignette (Section 8.2 Technical Replication) to use duplicateCorrelation() to determine the average correlation between blocks. I know they are not mathematically equivalent; the coefficients for the treatment groups are slightly different, they use different DF, and the p-values tend to be larger using the duplicateCorrelation() method (at least for the one experiment I'm using). So, is one more "correct" than the other? Or are blocks of technical replicates different somehow than blocks of patients or cell lines, etc.? Thanks, Jenny At 08:05 AM 11/25/2008, James W. MacDonald wrote: >Hi Adrian, > >Adrian Johnson wrote: >>dear group, >>I am sorry to ask again design related question. the data is from SMD. >>three or two different samples have been obtained from single patient. >>Say : >>from patient 1 - (A). a normal tissue, (B). inflamed tissue and (C). >>cancer tissue was extracted >>from Patient 2 - (A). a normal tissue (B). cancer tissue was only >>extracted and like wise. >>A universal reference sample was used to hybridize on Green channel. >>This is a paired design and a reference design. Limma manual describes >>examples unique to one specific design. > >Yes, but the 'limma User's Guide' also notes that the reference >design is pretty much the same as a one-color analysis, but that you >have to account for dye-swaps. Since you don't have dye-swaps, then >it _is_ the same as a one-color analysis. The only wrinkle here is >that you have blocked data (which is also covered in the limma User's Guide). > >If you had doubts, you could have approached this iteratively. First >let's see what limma thinks you should be using: > > > modelMatrix(targets, ref="Ref") >Found unique target names: > ACA B N Ref > ACA B N > [1,] 0 1 0 > [2,] 1 0 0 > [3,] 0 0 1 > [4,] 1 0 0 > [5,] 0 0 1 > [6,] 1 0 0 > [7,] 0 0 1 > [8,] 0 1 0 > [9,] 1 0 0 > >So this is a pretty simple model matrix, but it doesn't account for >the blocks. > > > Cy5=factor(c("B","ACA","N","ACA","N","ACA","N","B","ACA")) > > sibship=factor(rep(c(12,15,16,17), c(2,2,2,3))) > > model.matrix(~0 + Cy5 + sibship) > Cy5ACA Cy5B Cy5N sibship15 sibship16 sibship17 >1 0 1 0 0 0 0 >2 1 0 0 0 0 0 >3 0 0 1 1 0 0 >4 1 0 0 1 0 0 >5 0 0 1 0 1 0 >6 1 0 0 0 1 0 >7 0 0 1 0 0 1 >8 0 1 0 0 0 1 >9 1 0 0 0 0 1 > >Now this is identical to the above, but with three extra columns to >capture the sib-specific means. Note that you could have simply >added the three extra columns for the sibs to the previous model matrix. > >Also note that your contrast matrix will have to have 6 rows (with >the last three being all zeros). > >Best, > >Jim > > >>I do not know how to combine two different designs. >>My targets file: >>FileName Cy3 Cy5 SibShip (patient) >>61453.xls Ref B 12 >>61454.xls Ref ACA 12 >>61459.xls Ref N 15 >>61460.xls Ref ACA 15 >>61461.xls Ref N 16 >>61462.xls Ref ACA 16 >>61463.xls Ref N 17 >>61464.xls Ref B 17 >>61465.xls Ref ACA 17 >> >>I want to identify BvsN, ACAvsN, ACAvsB. >>how could I get design matrix for this type of design. >>This is one of those studies where rare cancers have been studied (in 2003). >>Unfortunately, this is public dataset (Published in Oncogene) where >>experiments have been done using stanford microarray database. >>thank you in advance. >>Adrian. >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > >-- >James W. MacDonald, M.S. >Biostatistician >Hildebrandt Lab >8220D MSRB III >1150 W. Medical Center Drive >Ann Arbor MI 48109-0646 >734-936-8662 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD REPLY • link 16.1 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Hi Jenny, The way I understand it, the difference is that the way I suggested is simply a fixed-effects model, where we assume that the variance is constant for all of the groups. If you compute the intra-group correlation using duplicateCorrelation(), you will then fit a mixed linear model that allows for different variance (or correlation) within the groups. I don't know if one is more correct than the other. Certainly the fixed effects model makes more assumptions. I think you can use duplicateCorrelation() to see if the intra-group correlation is high, which would argue for fitting a mixed linear model instead. Best, Jim Jenny Drnevich wrote: > Hi Jim, > > I've seen you suggest this way for account for blocks by fitting extra > columns in the design matrix before. I'm just wondering how this differs > from the suggestion in the limma vignette (Section 8.2 Technical > Replication) to use duplicateCorrelation() to determine the average > correlation between blocks. I know they are not mathematically > equivalent; the coefficients for the treatment groups are slightly > different, they use different DF, and the p-values tend to be larger > using the duplicateCorrelation() method (at least for the one experiment > I'm using). So, is one more "correct" than the other? Or are blocks of > technical replicates different somehow than blocks of patients or cell > lines, etc.? > > Thanks, > Jenny > > At 08:05 AM 11/25/2008, James W. MacDonald wrote: >> Hi Adrian, >> >> Adrian Johnson wrote: >>> dear group, >>> I am sorry to ask again design related question. the data is from SMD. >>> three or two different samples have been obtained from single patient. >>> Say : >>> from patient 1 - (A). a normal tissue, (B). inflamed tissue and (C). >>> cancer tissue was extracted >>> from Patient 2 - (A). a normal tissue (B). cancer tissue was only >>> extracted and like wise. >>> A universal reference sample was used to hybridize on Green channel. >>> This is a paired design and a reference design. Limma manual describes >>> examples unique to one specific design. >> >> Yes, but the 'limma User's Guide' also notes that the reference design >> is pretty much the same as a one-color analysis, but that you have to >> account for dye-swaps. Since you don't have dye-swaps, then it _is_ >> the same as a one-color analysis. The only wrinkle here is that you >> have blocked data (which is also covered in the limma User's Guide). >> >> If you had doubts, you could have approached this iteratively. First >> let's see what limma thinks you should be using: >> >> > modelMatrix(targets, ref="Ref") >> Found unique target names: >> ACA B N Ref >> ACA B N >> [1,] 0 1 0 >> [2,] 1 0 0 >> [3,] 0 0 1 >> [4,] 1 0 0 >> [5,] 0 0 1 >> [6,] 1 0 0 >> [7,] 0 0 1 >> [8,] 0 1 0 >> [9,] 1 0 0 >> >> So this is a pretty simple model matrix, but it doesn't account for >> the blocks. >> >> > Cy5=factor(c("B","ACA","N","ACA","N","ACA","N","B","ACA")) >> > sibship=factor(rep(c(12,15,16,17), c(2,2,2,3))) >> > model.matrix(~0 + Cy5 + sibship) >> Cy5ACA Cy5B Cy5N sibship15 sibship16 sibship17 >> 1 0 1 0 0 0 0 >> 2 1 0 0 0 0 0 >> 3 0 0 1 1 0 0 >> 4 1 0 0 1 0 0 >> 5 0 0 1 0 1 0 >> 6 1 0 0 0 1 0 >> 7 0 0 1 0 0 1 >> 8 0 1 0 0 0 1 >> 9 1 0 0 0 0 1 >> >> Now this is identical to the above, but with three extra columns to >> capture the sib-specific means. Note that you could have simply added >> the three extra columns for the sibs to the previous model matrix. >> >> Also note that your contrast matrix will have to have 6 rows (with the >> last three being all zeros). >> >> Best, >> >> Jim >> >> >>> I do not know how to combine two different designs. >>> My targets file: >>> FileName Cy3 Cy5 SibShip (patient) >>> 61453.xls Ref B 12 >>> 61454.xls Ref ACA 12 >>> 61459.xls Ref N 15 >>> 61460.xls Ref ACA 15 >>> 61461.xls Ref N 16 >>> 61462.xls Ref ACA 16 >>> 61463.xls Ref N 17 >>> 61464.xls Ref B 17 >>> 61465.xls Ref ACA 17 >>> >>> I want to identify BvsN, ACAvsN, ACAvsB. >>> how could I get design matrix for this type of design. >>> This is one of those studies where rare cancers have been studied (in >>> 2003). >>> Unfortunately, this is public dataset (Published in Oncogene) where >>> experiments have been done using stanford microarray database. >>> thank you in advance. >>> Adrian. >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Hildebrandt Lab >> 8220D MSRB III >> 1150 W. Medical Center Drive >> Ann Arbor MI 48109-0646 >> 734-936-8662 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at illinois.edu -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD REPLY • link 16.1 years ago James W. MacDonald 67k

0

Entering edit mode

Hi Jenny, Should blocks be fixed (in the design matrix) or treated as random (hence enter the covariance matrix as correlations)? This question has a long history in mathematical statistics, so long that you can be sure than the answer is somewhat subtle. Neither approach is right or wrong. The random approach makes more assumptions and allows you, in some circumstances, to extract more information. The limma approach with dupcor etc makes even more assumptions than classical random effects models. If the blocks are treated as fixed, then treatments can only be compared within blocks. If blocks are treated as random, then it is possible to compare treatments between blocks as well as within. So the first key issue is whether treatment comparisons are made between blocks or within blocks. Suppose you do an experiment on random samples of subjects from two groups, in which each subject is subjected to several tests. The subjects are blocks. The total sums of squares can be divided into between and within subject sums of squares. In other words, the information in the data can be divided into a between-subject error strata and a within-subject strata. Suppose you want to compare the two groups. All the information is in the between-subject error strata. You cannot do any statistical test unless you treat the subjects as random. Suppose now you want to compare the treatments. If the experiment is balanced (all subjects do all tests), then all the information about the treatments is in the within-block strata. So you may as well treat the subjects as fixed effects (as for example is done in a paired t-test). If the experiment is unbalanced (each subject does only a subset of the tests, subjects do tests a different number of times), then you can extract more information about the treatment comparisons from the between-subject error strata. To do this, you have to treat the blocks as random. The second key issue to consider is whether it makes sense scientifically to treat the blocks as random. If there are only two or three blocks, then there is little to be gained by treating them as random. If the blocks have large unpredictable effects, then it is much safer to treat them as fixed. If you want to make specific conclusions about each of the blocks, then it doesn't make sense to treat them as a random. In general, random is natural if there are lots of blocks with relatively small effects and not of interest in themselves. Sometimes you can go either way. Hope this helps Gordon On Tue, 25 Nov 2008, Jenny Drnevich wrote: > Hi Jim, > > I've seen you suggest this way for account for blocks by fitting extra > columns in the design matrix before. I'm just wondering how this differs from > the suggestion in the limma vignette (Section 8.2 Technical Replication) to > use duplicateCorrelation() to determine the average correlation between > blocks. I know they are not mathematically equivalent; the coefficients for > the treatment groups are slightly different, they use different DF, and the > p-values tend to be larger using the duplicateCorrelation() method (at least > for the one experiment I'm using). So, is one more "correct" than the other? > Or are blocks of technical replicates different somehow than blocks of > patients or cell lines, etc.? > > Thanks, > Jenny

ADD REPLY • link 16.1 years ago Gordon Smyth 52k

Login before adding your answer.