design matrix with technical and biologial replicates

0

Entering edit mode

Manuela Di Russo ▴ 70

@manuela-di-russo-4778

Last seen 9.6 years ago

Dear list, I'm working with microarray expression data and I am using limma to detect differentially expressed genes. I have some questions about the design matrix and the handling of biological and technical replicates. The target file is: Sample_name sample_type sample_replicate disease_status MPM_07 1 1 1 MPM_08 1 2 1 MPM_09 1 3 1 MPM_10_a 1 4 1 MPM_10_b 1 4 1 MPM_11 1 5 1 MPM_12 1 6 1 PP_01_a 2 7 0 PP_01_b 2 7 0 PP_02 2 8 0 PP_03 2 9 0 PP_04 2 10 0 PP_05 2 11 0 PP_06 2 12 0 PV_02 3 13 0 PV_03 3 14 0 PV_04 3 15 0 PV_05 3 16 0 Each sample is hybridized on an Affymetrix HG-U133-Plus2 array. So I have 7 mesothelioma samples (sample_type=1) where 2 were from the same patient (MPM_10 a e b)), 7 parietal pleural samples (sample_type= 2) where 2 were from the same patient (PP_01 a e b) and 4 visceral pleural samples (sample_type= 3). In reality 4 parietal pleural samples (PP_02,PP_03,PP_04 and PP_05) and 4 visceral pleural samples (PV_02,PV_03,PV_04 and PV_05) come from the same patients. pd <- data.frame(sample_type= c(rep(1,7),rep(2,7),rep(3,4)), sample_replicate = c(1:4,4,5,6,7,7,8:12,13:16), disease_status=c(rep(1,7),rep(0,11))) biolrep<-pd$sample_replicate f<- factor(pd$sample_type) design<- model.matrix(~0+f) colnames(design)<- c("MPM", "PP", "PV") I tried to handle technical replicates using the block argument of function duplicatecorrelation() as follows: corfit<- duplicateCorrelation(eset_norm_genes_ff_filtered, design, ndups=1, block= biolrep) # eset_norm_genes_ff_filtered is an ExpressionSet object containing pre-processed and filtered data I am interested in identifying differentially expressed genes between MPM and PP and between PV and PP. contrast.matrix_all.contrasts<- makeContrasts(MPMvsPP=MPM-PP,PVvsPP=PV-PP,levels=design) fit_ff<-lmFit(eset_norm_genes_ff_filtered, design,block=biolrep, ndups=1,cor=corfit$consensus) fit2_ff<- contrasts.fit(fit_ff, contrast.matrix_all.contrasts) fit2e_ff<-eBayes(fit2_ff) I think that my approach is correct for the first contrast (MPM vs PP) but not for the second one because biolrep doesn't consider the fact that some samples between PP and PV are paired. Am I correct? What about defining biolrep<-c(1:4,4,5,6,7,7,8:12,8:11)? Is there a method to handle such an experimental design? Sorry for my long post! Any suggestion/comment is welcome. Cheers, Manuela ---------------------------------------------------------------------- ------ ---------- Manuela Di Russo, Ph.D. Student Department of Experimental Pathology, MBIE University of Pisa Pisa, Italy e-mail: <mailto:manuela.dirusso@for.unipi.it> manuela.dirusso@for.unipi.it mobile: +393208778864 phone: +39050993538 [[alternative HTML version deleted]]

Microarray limma Microarray limma • 1.5k views

ADD COMMENT • link updated 12.0 years ago by James W. MacDonald 65k • written 12.0 years ago by Manuela Di Russo ▴ 70

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

Hi Manuela, On 4/18/2012 7:52 AM, Manuela Di Russo wrote: > Dear list, > > I'm working with microarray expression data and I am using limma to detect > differentially expressed genes. I have some questions about the design > matrix and the handling of biological and technical replicates. > > The target file is: > > Sample_name sample_type sample_replicate > disease_status > > MPM_07 1 > 1 1 > > MPM_08 1 > 2 1 > > MPM_09 1 > 3 1 > > MPM_10_a 1 > 4 1 > > MPM_10_b 1 > 4 1 > > MPM_11 1 > 5 1 > > MPM_12 1 > 6 1 > > PP_01_a 2 > 7 0 > > PP_01_b 2 > 7 0 > > PP_02 2 > 8 0 > > PP_03 2 > 9 0 > > PP_04 2 > 10 0 > > PP_05 2 > 11 0 > > PP_06 2 > 12 0 > > PV_02 3 > 13 0 > > PV_03 3 > 14 0 > > PV_04 3 > 15 0 > > PV_05 3 > 16 0 > > Each sample is hybridized on an Affymetrix HG-U133-Plus2 array. > > So I have 7 mesothelioma samples (sample_type=1) where 2 were from the same > patient (MPM_10 a e b)), 7 parietal pleural samples (sample_type= 2) where 2 > were from the same patient (PP_01 a e b) and 4 visceral pleural samples > (sample_type= 3). In reality 4 parietal pleural samples (PP_02,PP_03,PP_04 > and PP_05) and 4 visceral pleural samples (PV_02,PV_03,PV_04 and PV_05) come > from the same patients. > > pd<- data.frame(sample_type= c(rep(1,7),rep(2,7),rep(3,4)), > sample_replicate = c(1:4,4,5,6,7,7,8:12,13:16), > disease_status=c(rep(1,7),rep(0,11))) > > biolrep<-pd$sample_replicate > > f<- factor(pd$sample_type) > > design<- model.matrix(~0+f) > > colnames(design)<- c("MPM", "PP", "PV") > > I tried to handle technical replicates using the block argument of function > duplicatecorrelation() as follows: I don't think you can use duplicateCorrelation() here, as you don't have duplicates for all samples. I believe lmFit() with a cor argument will fit a block diagonal correlation matrix, which is clearly not applicable here. I may be in error however, in which case Gordon Smyth will surely post a correction around 5-6 pm EDT or so. With a mixture of duplicated and not duplicated samples, you will likely have to do one of two less than ideal things. First, you could simply ignore the duplication, and analyze as if the duplicates were independent samples. This is less than ideal because there will be a correlation between these samples, which will tend to lower your estimate of intra-sample variation. Second, you could compute means of the duplicates and then use those in lieu of the original data. Again, this is not ideal, as the means will have an intrinsically lower variance than individual samples. All things equal, this is probably the better way to go. Best, Jim > > corfit<- duplicateCorrelation(eset_norm_genes_ff_filtered, design, ndups=1, > block= biolrep) # eset_norm_genes_ff_filtered is an ExpressionSet object > containing pre-processed and filtered data > > I am interested in identifying differentially expressed genes between MPM > and PP and between PV and PP. > > contrast.matrix_all.contrasts<- > makeContrasts(MPMvsPP=MPM-PP,PVvsPP=PV-PP,levels=design) > > fit_ff<-lmFit(eset_norm_genes_ff_filtered, design,block=biolrep, > ndups=1,cor=corfit$consensus) > > fit2_ff<- contrasts.fit(fit_ff, contrast.matrix_all.contrasts) > > fit2e_ff<-eBayes(fit2_ff) > > I think that my approach is correct for the first contrast (MPM vs PP) but > not for the second one because biolrep doesn't consider the fact that some > samples between PP and PV are paired. > > Am I correct? > > What about defining biolrep<-c(1:4,4,5,6,7,7,8:12,8:11)? > > Is there a method to handle such an experimental design? > > Sorry for my long post! > > Any suggestion/comment is welcome. > > Cheers, > > Manuela > > > > -------------------------------------------------------------------- -------- > ---------- > > Manuela Di Russo, Ph.D. Student > Department of Experimental Pathology, MBIE > University of Pisa > Pisa, Italy > e-mail:<mailto:manuela.dirusso at="" for.unipi.it=""> manuela.dirusso at for.unipi.it > mobile: +393208778864 > > phone: +39050993538 > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you James! I had already applied both methods you suggested but I wanted to see if there was a better way to handle this kind of experimental design. I have another question: I computed means of the duplicates arrays after preprocessing and filtering but before fitting the linear model, is this correct? Thank you! Manuela ---------------------------------------------------------------------- ------ ---------- Manuela Di Russo, Ph.D. Student Department of Experimental Pathology, MBIE University of Pisa Pisa, Italy e-mail: manuela.dirusso at for.unipi.it mobile: +393208778864 phone: +39050993538 -----Messaggio originale----- Da: James W. MacDonald [mailto:jmacdon at uw.edu] Inviato: mercoled? 18 aprile 2012 15:43 A: Manuela Di Russo Cc: bioconductor at r-project.org Oggetto: Re: [BioC] design matrix with technical and biologial replicates Hi Manuela, On 4/18/2012 7:52 AM, Manuela Di Russo wrote: > Dear list, > > I'm working with microarray expression data and I am using limma to > detect differentially expressed genes. I have some questions about the > design matrix and the handling of biological and technical replicates. > > The target file is: > > Sample_name sample_type sample_replicate > disease_status > > MPM_07 1 > 1 1 > > MPM_08 1 > 2 1 > > MPM_09 1 > 3 1 > > MPM_10_a 1 > 4 1 > > MPM_10_b 1 > 4 1 > > MPM_11 1 > 5 1 > > MPM_12 1 > 6 1 > > PP_01_a 2 > 7 0 > > PP_01_b 2 > 7 0 > > PP_02 2 > 8 0 > > PP_03 2 > 9 0 > > PP_04 2 > 10 0 > > PP_05 2 > 11 0 > > PP_06 2 > 12 0 > > PV_02 3 > 13 0 > > PV_03 3 > 14 0 > > PV_04 3 > 15 0 > > PV_05 3 > 16 0 > > Each sample is hybridized on an Affymetrix HG-U133-Plus2 array. > > So I have 7 mesothelioma samples (sample_type=1) where 2 were from the > same patient (MPM_10 a e b)), 7 parietal pleural samples (sample_type= > 2) where 2 were from the same patient (PP_01 a e b) and 4 visceral > pleural samples (sample_type= 3). In reality 4 parietal pleural > samples (PP_02,PP_03,PP_04 and PP_05) and 4 visceral pleural samples > (PV_02,PV_03,PV_04 and PV_05) come from the same patients. > > pd<- data.frame(sample_type= c(rep(1,7),rep(2,7),rep(3,4)), > sample_replicate = c(1:4,4,5,6,7,7,8:12,13:16), > disease_status=c(rep(1,7),rep(0,11))) > > biolrep<-pd$sample_replicate > > f<- factor(pd$sample_type) > > design<- model.matrix(~0+f) > > colnames(design)<- c("MPM", "PP", "PV") > > I tried to handle technical replicates using the block argument of > function > duplicatecorrelation() as follows: I don't think you can use duplicateCorrelation() here, as you don't have duplicates for all samples. I believe lmFit() with a cor argument will fit a block diagonal correlation matrix, which is clearly not applicable here. I may be in error however, in which case Gordon Smyth will surely post a correction around 5-6 pm EDT or so. With a mixture of duplicated and not duplicated samples, you will likely have to do one of two less than ideal things. First, you could simply ignore the duplication, and analyze as if the duplicates were independent samples. This is less than ideal because there will be a correlation between these samples, which will tend to lower your estimate of intra-sample variation. Second, you could compute means of the duplicates and then use those in lieu of the original data. Again, this is not ideal, as the means will have an intrinsically lower variance than individual samples. All things equal, this is probably the better way to go. Best, Jim > > corfit<- duplicateCorrelation(eset_norm_genes_ff_filtered, design, > ndups=1, block= biolrep) # eset_norm_genes_ff_filtered is an > ExpressionSet object containing pre-processed and filtered data > > I am interested in identifying differentially expressed genes between > MPM and PP and between PV and PP. > > contrast.matrix_all.contrasts<- > makeContrasts(MPMvsPP=MPM-PP,PVvsPP=PV-PP,levels=design) > > fit_ff<-lmFit(eset_norm_genes_ff_filtered, design,block=biolrep, > ndups=1,cor=corfit$consensus) > > fit2_ff<- contrasts.fit(fit_ff, contrast.matrix_all.contrasts) > > fit2e_ff<-eBayes(fit2_ff) > > I think that my approach is correct for the first contrast (MPM vs PP) > but not for the second one because biolrep doesn't consider the fact > that some samples between PP and PV are paired. > > Am I correct? > > What about defining biolrep<-c(1:4,4,5,6,7,7,8:12,8:11)? > > Is there a method to handle such an experimental design? > > Sorry for my long post! > > Any suggestion/comment is welcome. > > Cheers, > > Manuela > > > > ---------------------------------------------------------------------- > ------ > ---------- > > Manuela Di Russo, Ph.D. Student > Department of Experimental Pathology, MBIE University of Pisa Pisa, > Italy e-mail:<mailto:manuela.dirusso at="" for.unipi.it=""> > manuela.dirusso at for.unipi.it > mobile: +393208778864 > > phone: +39050993538 > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.0 years ago Manuela Di Russo ▴ 70

0

Entering edit mode

Hi Manuela, On 4/18/2012 10:11 AM, Manuela Di Russo wrote: > Thank you James! > I had already applied both methods you suggested but I wanted to see if > there was a better way to handle this kind of experimental design. > I have another question: I computed means of the duplicates arrays after > preprocessing and filtering but before fitting the linear model, is this > correct? I would argue that you should compute means, then fit the model (to the eBayes step), and then filter. This is because the eBayes() step estimates a prior variance estimate over all probes. Here is an example: Say you filter your probes, selecting only those with an inter- quartile range greater than some threshold. In that case you are removing all probes with low variance. Now when you compute the prior in the eBayes() step, you are basing it on the subset of higher-variance probes. One could argue that you have biased that estimate upward, which will in turn decrease your power to detect differences. Best, Jim > Thank you! > Manuela > -------------------------------------------------------------------- -------- > ---------- > Manuela Di Russo, Ph.D. Student > Department of Experimental Pathology, MBIE > University of Pisa > Pisa, Italy > e-mail: manuela.dirusso at for.unipi.it > mobile: +393208778864 > phone: +39050993538 > > -----Messaggio originale----- > Da: James W. MacDonald [mailto:jmacdon at uw.edu] > Inviato: mercoled? 18 aprile 2012 15:43 > A: Manuela Di Russo > Cc: bioconductor at r-project.org > Oggetto: Re: [BioC] design matrix with technical and biologial replicates > > Hi Manuela, > > On 4/18/2012 7:52 AM, Manuela Di Russo wrote: >> Dear list, >> >> I'm working with microarray expression data and I am using limma to >> detect differentially expressed genes. I have some questions about the >> design matrix and the handling of biological and technical replicates. >> >> The target file is: >> >> Sample_name sample_type > sample_replicate >> disease_status >> >> MPM_07 1 >> 1 1 >> >> MPM_08 1 >> 2 1 >> >> MPM_09 1 >> 3 1 >> >> MPM_10_a 1 >> 4 1 >> >> MPM_10_b 1 >> 4 1 >> >> MPM_11 1 >> 5 1 >> >> MPM_12 1 >> 6 1 >> >> PP_01_a 2 >> 7 0 >> >> PP_01_b 2 >> 7 0 >> >> PP_02 2 >> 8 0 >> >> PP_03 2 >> 9 0 >> >> PP_04 2 >> 10 0 >> >> PP_05 2 >> 11 0 >> >> PP_06 2 >> 12 0 >> >> PV_02 3 >> 13 0 >> >> PV_03 3 >> 14 0 >> >> PV_04 3 >> 15 0 >> >> PV_05 3 >> 16 0 >> >> Each sample is hybridized on an Affymetrix HG-U133-Plus2 array. >> >> So I have 7 mesothelioma samples (sample_type=1) where 2 were from the >> same patient (MPM_10 a e b)), 7 parietal pleural samples (sample_type= >> 2) where 2 were from the same patient (PP_01 a e b) and 4 visceral >> pleural samples (sample_type= 3). In reality 4 parietal pleural >> samples (PP_02,PP_03,PP_04 and PP_05) and 4 visceral pleural samples >> (PV_02,PV_03,PV_04 and PV_05) come from the same patients. >> >> pd<- data.frame(sample_type= c(rep(1,7),rep(2,7),rep(3,4)), >> sample_replicate = c(1:4,4,5,6,7,7,8:12,13:16), >> disease_status=c(rep(1,7),rep(0,11))) >> >> biolrep<-pd$sample_replicate >> >> f<- factor(pd$sample_type) >> >> design<- model.matrix(~0+f) >> >> colnames(design)<- c("MPM", "PP", "PV") >> >> I tried to handle technical replicates using the block argument of >> function >> duplicatecorrelation() as follows: > I don't think you can use duplicateCorrelation() here, as you don't have > duplicates for all samples. I believe lmFit() with a cor argument will fit a > block diagonal correlation matrix, which is clearly not applicable here. I > may be in error however, in which case Gordon Smyth will surely post a > correction around 5-6 pm EDT or so. > > With a mixture of duplicated and not duplicated samples, you will likely > have to do one of two less than ideal things. First, you could simply ignore > the duplication, and analyze as if the duplicates were independent samples. > This is less than ideal because there will be a correlation between these > samples, which will tend to lower your estimate of intra-sample variation. > > Second, you could compute means of the duplicates and then use those in lieu > of the original data. Again, this is not ideal, as the means will have an > intrinsically lower variance than individual samples. All things equal, this > is probably the better way to go. > > Best, > > Jim > > >> corfit<- duplicateCorrelation(eset_norm_genes_ff_filtered, design, >> ndups=1, block= biolrep) # eset_norm_genes_ff_filtered is an >> ExpressionSet object containing pre-processed and filtered data >> >> I am interested in identifying differentially expressed genes between >> MPM and PP and between PV and PP. >> >> contrast.matrix_all.contrasts<- >> makeContrasts(MPMvsPP=MPM-PP,PVvsPP=PV-PP,levels=design) >> >> fit_ff<-lmFit(eset_norm_genes_ff_filtered, design,block=biolrep, >> ndups=1,cor=corfit$consensus) >> >> fit2_ff<- contrasts.fit(fit_ff, contrast.matrix_all.contrasts) >> >> fit2e_ff<-eBayes(fit2_ff) >> >> I think that my approach is correct for the first contrast (MPM vs PP) >> but not for the second one because biolrep doesn't consider the fact >> that some samples between PP and PV are paired. >> >> Am I correct? >> >> What about defining biolrep<-c(1:4,4,5,6,7,7,8:12,8:11)? >> >> Is there a method to handle such an experimental design? >> >> Sorry for my long post! >> >> Any suggestion/comment is welcome. >> >> Cheers, >> >> Manuela >> >> >> >> ---------------------------------------------------------------------- >> ------ >> ---------- >> >> Manuela Di Russo, Ph.D. Student >> Department of Experimental Pathology, MBIE University of Pisa Pisa, >> Italy e-mail:<mailto:manuela.dirusso at="" for.unipi.it=""> >> manuela.dirusso at for.unipi.it >> mobile: +393208778864 >> >> phone: +39050993538 >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James, if I have correctly interpreted, your suggestion is to fit the model to all probes on the array and then filter. But the approach usually adopted is based on filtering before fitting the linear model in order to reduce the number of hypotheses to test and increase the power. May I filter after fitting the model using the information about higher-variance/lower intensity probesets obtained applying genefilter() (for example) to the ExpressionSet object created by gcrma/rma function? Sorry but you are the first person who tells me to filter the data after fitting the model in order not to bias the estimates of the prior variance. Thank you very much for your help and your time. Manuela ---------------------------------------------------------------------- ------ ---------- Manuela Di Russo, Ph.D. Student Department of Experimental Pathology, MBIE University of Pisa Pisa, Italy e-mail: manuela.dirusso at for.unipi.it mobile: +393208778864 phone: +39050993538 -----Messaggio originale----- Da: James W. MacDonald [mailto:jmacdon at uw.edu] Inviato: mercoled? 18 aprile 2012 16:19 A: Manuela Di Russo Cc: bioconductor at r-project.org Oggetto: Re: R: [BioC] design matrix with technical and biologial replicates Hi Manuela, On 4/18/2012 10:11 AM, Manuela Di Russo wrote: > Thank you James! > I had already applied both methods you suggested but I wanted to see > if there was a better way to handle this kind of experimental design. > I have another question: I computed means of the duplicates arrays > after preprocessing and filtering but before fitting the linear model, > is this correct? I would argue that you should compute means, then fit the model (to the eBayes step), and then filter. This is because the eBayes() step estimates a prior variance estimate over all probes. Here is an example: Say you filter your probes, selecting only those with an inter- quartile range greater than some threshold. In that case you are removing all probes with low variance. Now when you compute the prior in the eBayes() step, you are basing it on the subset of higher-variance probes. One could argue that you have biased that estimate upward, which will in turn decrease your power to detect differences. Best, Jim > Thank you! > Manuela > ---------------------------------------------------------------------- > ------ > ---------- > Manuela Di Russo, Ph.D. Student > Department of Experimental Pathology, MBIE University of Pisa Pisa, > Italy > e-mail: manuela.dirusso at for.unipi.it > mobile: +393208778864 > phone: +39050993538 > > -----Messaggio originale----- > Da: James W. MacDonald [mailto:jmacdon at uw.edu] > Inviato: mercoled? 18 aprile 2012 15:43 > A: Manuela Di Russo > Cc: bioconductor at r-project.org > Oggetto: Re: [BioC] design matrix with technical and biologial > replicates > > Hi Manuela, > > On 4/18/2012 7:52 AM, Manuela Di Russo wrote: >> Dear list, >> >> I'm working with microarray expression data and I am using limma to >> detect differentially expressed genes. I have some questions about >> the design matrix and the handling of biological and technical replicates. >> >> The target file is: >> >> Sample_name sample_type > sample_replicate >> disease_status >> >> MPM_07 1 >> 1 1 >> >> MPM_08 1 >> 2 1 >> >> MPM_09 1 >> 3 1 >> >> MPM_10_a 1 >> 4 1 >> >> MPM_10_b 1 >> 4 1 >> >> MPM_11 1 >> 5 1 >> >> MPM_12 1 >> 6 1 >> >> PP_01_a 2 >> 7 0 >> >> PP_01_b 2 >> 7 0 >> >> PP_02 2 >> 8 0 >> >> PP_03 2 >> 9 0 >> >> PP_04 2 >> 10 0 >> >> PP_05 2 >> 11 0 >> >> PP_06 2 >> 12 0 >> >> PV_02 3 >> 13 0 >> >> PV_03 3 >> 14 0 >> >> PV_04 3 >> 15 0 >> >> PV_05 3 >> 16 0 >> >> Each sample is hybridized on an Affymetrix HG-U133-Plus2 array. >> >> So I have 7 mesothelioma samples (sample_type=1) where 2 were from >> the same patient (MPM_10 a e b)), 7 parietal pleural samples >> (sample_type= >> 2) where 2 were from the same patient (PP_01 a e b) and 4 visceral >> pleural samples (sample_type= 3). In reality 4 parietal pleural >> samples (PP_02,PP_03,PP_04 and PP_05) and 4 visceral pleural samples >> (PV_02,PV_03,PV_04 and PV_05) come from the same patients. >> >> pd<- data.frame(sample_type= c(rep(1,7),rep(2,7),rep(3,4)), >> sample_replicate = c(1:4,4,5,6,7,7,8:12,13:16), >> disease_status=c(rep(1,7),rep(0,11))) >> >> biolrep<-pd$sample_replicate >> >> f<- factor(pd$sample_type) >> >> design<- model.matrix(~0+f) >> >> colnames(design)<- c("MPM", "PP", "PV") >> >> I tried to handle technical replicates using the block argument of >> function >> duplicatecorrelation() as follows: > I don't think you can use duplicateCorrelation() here, as you don't > have duplicates for all samples. I believe lmFit() with a cor argument > will fit a block diagonal correlation matrix, which is clearly not > applicable here. I may be in error however, in which case Gordon Smyth > will surely post a correction around 5-6 pm EDT or so. > > With a mixture of duplicated and not duplicated samples, you will > likely have to do one of two less than ideal things. First, you could > simply ignore the duplication, and analyze as if the duplicates were independent samples. > This is less than ideal because there will be a correlation between > these samples, which will tend to lower your estimate of intra- sample variation. > > Second, you could compute means of the duplicates and then use those > in lieu of the original data. Again, this is not ideal, as the means > will have an intrinsically lower variance than individual samples. All > things equal, this is probably the better way to go. > > Best, > > Jim > > >> corfit<- duplicateCorrelation(eset_norm_genes_ff_filtered, design, >> ndups=1, block= biolrep) # eset_norm_genes_ff_filtered is an >> ExpressionSet object containing pre-processed and filtered data >> >> I am interested in identifying differentially expressed genes between >> MPM and PP and between PV and PP. >> >> contrast.matrix_all.contrasts<- >> makeContrasts(MPMvsPP=MPM-PP,PVvsPP=PV-PP,levels=design) >> >> fit_ff<-lmFit(eset_norm_genes_ff_filtered, design,block=biolrep, >> ndups=1,cor=corfit$consensus) >> >> fit2_ff<- contrasts.fit(fit_ff, contrast.matrix_all.contrasts) >> >> fit2e_ff<-eBayes(fit2_ff) >> >> I think that my approach is correct for the first contrast (MPM vs >> PP) but not for the second one because biolrep doesn't consider the >> fact that some samples between PP and PV are paired. >> >> Am I correct? >> >> What about defining biolrep<-c(1:4,4,5,6,7,7,8:12,8:11)? >> >> Is there a method to handle such an experimental design? >> >> Sorry for my long post! >> >> Any suggestion/comment is welcome. >> >> Cheers, >> >> Manuela >> >> >> >> --------------------------------------------------------------------- >> - >> ------ >> ---------- >> >> Manuela Di Russo, Ph.D. Student >> Department of Experimental Pathology, MBIE University of Pisa Pisa, >> Italy e-mail:<mailto:manuela.dirusso at="" for.unipi.it=""> >> manuela.dirusso at for.unipi.it >> mobile: +393208778864 >> >> phone: +39050993538 >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.0 years ago Manuela Di Russo ▴ 70

Login before adding your answer.