Positive correlation between dye-swap technical replicates
4
0
Entering edit mode
@micha-goralski-3888
Last seen 7.1 years ago
Dear All, I have some doubts concerning linear model used in my data analysis. I was searching for the answer on the mail list but I didn't find the similar case. I analyse tobacco roots treated with 2 types of stress: NaCl and CdCl2. I have pooled common reference and I have 3 biological replicates of treated plants. I also did dye swaps as technical replicate. This is my targets file: SlideNumber Name FileName Cy3 Cy5 13244317 21 13244317.gpr Control NaCl 13244318 22 13244318.gpr Control NaCl 13244315 23 13244315.gpr Control NaCl 13244319 31 13244319.gpr Control CdCl2 13244337 32 13244337.gpr Control CdCl2 13244316 33 13244316.gpr Control CdCl2 13244330 21 13244330.gpr NaCl Control 13244329 22 13244329.gpr NaCl Control 13244331 23 13244331.gpr NaCl Control 13244333 31 13244333.gpr CdCl2 Control 13244335 32 13244335.gpr CdCl2 Control 13244336 33 13244336.gpr CdCl2 Control I did background subtraction with method "normexp" and normalization "pronttip loess", without normalization between arrays. Now I have the vector indicating biological and technical replicates. >biolrep=c(1,2,3,4,5,6,1,2,3,4,5,6) and create model matrix: >design=modelMatrix(targets, ref="Control") > design CdCl2 NaCl [1,] 0 1 [2,] 0 1 [3,] 0 1 [4,] 1 0 [5,] 1 0 [6,] 1 0 [7,] 0 -1 [8,] 0 -1 [9,] 0 -1 [10,] -1 0 [11,] -1 0 [12,] -1 0 I'm interested in such contrasts: >cmatrix=makeContrasts(NaCl, CdCl2, NaCl-CdCl2,levels=design) > cmatrix Contrasts Levels NaCl CdCl2 NaCl - CdCl2 CdCl2 0 1 -1 NaCl 1 0 1 Object for duplicate correlation with dye-swaps: >corfit=duplicateCorrelation(MA, design=design, ndups=1, block=biolrep) and the first problem is: > corfit$consensus [1] 0.3926545 In limma manual it is written that correlation should be negative for dye swaps- why is it positive?- is it a question of wrong model matrix or is it something wrong with my samples? but When I do simple hierarchical clustering of log-ratios: >dist.matrix=dist(t(MA$M)) >hc=hclust(dist.matrix) >par(mfrow=c(1,1) >plot(hc) The plot divides my arrays in two groups that exactly reflects dye swaps. So maybe the model is correct? I was thinking also about checking dye effect so I tried with such model: > design2=cbind(Dye=1, design) > design2 Dye CdCl2 NaCl [1,] 1 0 1 [2,] 1 0 1 [3,] 1 0 1 [4,] 1 1 0 [5,] 1 1 0 [6,] 1 1 0 [7,] 1 0 -1 [8,] 1 0 -1 [9,] 1 0 -1 [10,] 1 -1 0 [11,] 1 -1 0 [12,] 1 -1 0 I'm not sure if I can use such model. if I use it: > corfit=duplicateCorrelation(MA, design=design2, ndups=1, block=blockrep) > corfit$consensus [1] -0.04530506 The second problem is that each probe on my array is duplicated so in the final top table I have each gene doubled- I read it is not possible in Limma to analyse both technical duplicates and gene replicas on the array. Could you give me any hint how to solve this problem? I will be glad for any help in this cases Best regards, Michal Goralski, PhD student, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland. ADD COMMENT 0 Entering edit mode @juan-pedro-steibel-1533 Last seen 7.1 years ago Hello Michal, Before anything... Have you checked your raw data? When we have dye swaps and a reference design we usually do pairwise plots... then check the sign of the slope or correlation coefficient with the expected value. opposite sign points at mislabeling or other technical errors. This should be obvious even before ANY pre-processing. I will let the limma gurues to tackle the corresponding questions. cheers! JP Micha? G?ralski wrote: > Dear All, > > I have some doubts concerning linear model used in my data analysis. I > was searching for the answer on the mail list but I didn't find the > similar case. > I analyse tobacco roots treated with 2 types of stress: NaCl and > CdCl2. I have pooled common reference and I have 3 biological > replicates of treated plants. I also did dye swaps as technical > replicate. > This is my targets file: > SlideNumber Name FileName Cy3 Cy5 > 13244317 21 13244317.gpr Control NaCl > 13244318 22 13244318.gpr Control NaCl > 13244315 23 13244315.gpr Control NaCl > 13244319 31 13244319.gpr Control CdCl2 > 13244337 32 13244337.gpr Control CdCl2 > 13244316 33 13244316.gpr Control CdCl2 > 13244330 21 13244330.gpr NaCl Control > 13244329 22 13244329.gpr NaCl Control > 13244331 23 13244331.gpr NaCl Control > 13244333 31 13244333.gpr CdCl2 Control > 13244335 32 13244335.gpr CdCl2 Control > 13244336 33 13244336.gpr CdCl2 Control > > I did background subtraction with method "normexp" and normalization > "pronttip loess", without normalization between arrays. > > Now I have the vector indicating biological and technical replicates. > > >biolrep=c(1,2,3,4,5,6,1,2,3,4,5,6) > > and create model matrix: > > >design=modelMatrix(targets, ref="Control") > > design > CdCl2 NaCl > [1,] 0 1 > [2,] 0 1 > [3,] 0 1 > [4,] 1 0 > [5,] 1 0 > [6,] 1 0 > [7,] 0 -1 > [8,] 0 -1 > [9,] 0 -1 > [10,] -1 0 > [11,] -1 0 > [12,] -1 0 > > I'm interested in such contrasts: > > >cmatrix=makeContrasts(NaCl, CdCl2, NaCl-CdCl2,levels=design) > > cmatrix > Contrasts > Levels NaCl CdCl2 NaCl - CdCl2 > CdCl2 0 1 -1 > NaCl 1 0 1 > > Object for duplicate correlation with dye-swaps: > > >corfit=duplicateCorrelation(MA, design=design, ndups=1, block=biolrep) > > and the first problem is: > > corfit$consensus > [1] 0.3926545 > > In limma manual it is written that correlation should be negative for > dye swaps- why is it positive?- is it a question of wrong model matrix > or is it something wrong with my samples? > > but > > When I do simple hierarchical clustering of log-ratios: > >dist.matrix=dist(t(MA$M)) > >hc=hclust(dist.matrix) > >par(mfrow=c(1,1) > >plot(hc) > > The plot divides my arrays in two groups that exactly reflects dye > swaps. So maybe the model is correct? > > I was thinking also about checking dye effect so I tried with such model: > > > design2=cbind(Dye=1, design) > > design2 > Dye CdCl2 NaCl > [1,] 1 0 1 > [2,] 1 0 1 > [3,] 1 0 1 > [4,] 1 1 0 > [5,] 1 1 0 > [6,] 1 1 0 > [7,] 1 0 -1 > [8,] 1 0 -1 > [9,] 1 0 -1 > [10,] 1 -1 0 > [11,] 1 -1 0 > [12,] 1 -1 0 > > I'm not sure if I can use such model. > if I use it: > > > corfit=duplicateCorrelation(MA, design=design2, ndups=1, > block=blockrep) > > corfit$consensus > [1] -0.04530506 > > The second problem is that each probe on my array is duplicated so in > the final top table I have each gene doubled- I read it is not > possible in Limma to analyse both technical duplicates and gene > replicas on the array. Could you give me any hint how to solve this > problem? > > I will be glad for any help in this cases > > Best regards, > > Michal Goralski, PhD student, > Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, > Poland. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- ============================= Juan Pedro Steibel Assistant Professor Statistical Genetics and Genomics Department of Animal Science & Department of Fisheries and Wildlife Michigan State University 1205-I Anthony Hall East Lansing, MI 48824 USA Phone: 1-517-353-5102 E-mail: steibelj at msu.edu
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Michal, Micha? G?ralski wrote: > Dear All, > > I have some doubts concerning linear model used in my data analysis. I > was searching for the answer on the mail list but I didn't find the > similar case. > I analyse tobacco roots treated with 2 types of stress: NaCl and CdCl2. > I have pooled common reference and I have 3 biological replicates of > treated plants. I also did dye swaps as technical replicate. > This is my targets file: > SlideNumber Name FileName Cy3 Cy5 > 13244317 21 13244317.gpr Control NaCl > 13244318 22 13244318.gpr Control NaCl > 13244315 23 13244315.gpr Control NaCl > 13244319 31 13244319.gpr Control CdCl2 > 13244337 32 13244337.gpr Control CdCl2 > 13244316 33 13244316.gpr Control CdCl2 > 13244330 21 13244330.gpr NaCl Control > 13244329 22 13244329.gpr NaCl Control > 13244331 23 13244331.gpr NaCl Control > 13244333 31 13244333.gpr CdCl2 Control > 13244335 32 13244335.gpr CdCl2 Control > 13244336 33 13244336.gpr CdCl2 Control > > I did background subtraction with method "normexp" and normalization > "pronttip loess", without normalization between arrays. > > Now I have the vector indicating biological and technical replicates. > > >biolrep=c(1,2,3,4,5,6,1,2,3,4,5,6) > > and create model matrix: > > >design=modelMatrix(targets, ref="Control") > > design > CdCl2 NaCl > [1,] 0 1 > [2,] 0 1 > [3,] 0 1 > [4,] 1 0 > [5,] 1 0 > [6,] 1 0 > [7,] 0 -1 > [8,] 0 -1 > [9,] 0 -1 > [10,] -1 0 > [11,] -1 0 > [12,] -1 0 > > I'm interested in such contrasts: > > >cmatrix=makeContrasts(NaCl, CdCl2, NaCl-CdCl2,levels=design) > > cmatrix > Contrasts > Levels NaCl CdCl2 NaCl - CdCl2 > CdCl2 0 1 -1 > NaCl 1 0 1 > > Object for duplicate correlation with dye-swaps: > > >corfit=duplicateCorrelation(MA, design=design, ndups=1, block=biolrep) > > and the first problem is: > > corfit$consensus > [1] 0.3926545 > > In limma manual it is written that correlation should be negative for > dye swaps- why is it positive?- is it a question of wrong model matrix > or is it something wrong with my samples? As you note below, this is probably due to a dye-bias. Making some MA plots should help clarify the problem. > > but > > When I do simple hierarchical clustering of log-ratios: > >dist.matrix=dist(t(MA$M)) > >hc=hclust(dist.matrix) > >par(mfrow=c(1,1) > >plot(hc) > > The plot divides my arrays in two groups that exactly reflects dye > swaps. So maybe the model is correct? > > I was thinking also about checking dye effect so I tried with such model: > > > design2=cbind(Dye=1, design) > > design2 > Dye CdCl2 NaCl > [1,] 1 0 1 > [2,] 1 0 1 > [3,] 1 0 1 > [4,] 1 1 0 > [5,] 1 1 0 > [6,] 1 1 0 > [7,] 1 0 -1 > [8,] 1 0 -1 > [9,] 1 0 -1 > [10,] 1 -1 0 > [11,] 1 -1 0 > [12,] 1 -1 0 > > I'm not sure if I can use such model. > if I use it: > > > corfit=duplicateCorrelation(MA, design=design2, ndups=1, block=blockrep) > > corfit$consensus > [1] -0.04530506 > > The second problem is that each probe on my array is duplicated so in > the final top table I have each gene doubled- I read it is not possible > in Limma to analyse both technical duplicates and gene replicas on the > array. Could you give me any hint how to solve this problem? Well, that isn't much correlation between the dye-swaps after controlling for the dye-bias, so you might check the correlation between the technical replicates. It might be more reasonable to ignore the fact that the dye-swaps are technical replicates and account for the intra-slide duplicate correlations instead. The other possibility is to average the within-slide duplicates and account for the fact that you have technical replication at the dye- swap level. But you will have to look at your data to make that call. Best, Jim > > I will be glad for any help in this cases > > Best regards, > > Michal Goralski, PhD student, > Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, > Poland. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues ADD COMMENT 0 Entering edit mode Claus Mayer ▴ 340 @claus-mayer-1179 Last seen 7.1 years ago European Union Dear Michal! You should include dye effect in your linear model (cf the limma guide 8.1.2 Dye Swaps). The normalization only removers an overall dye-effect but typically that effect is slightly different from gene to gene. Including the dye effect in the model should remove this remaining gene-specific bias. That effect is likely to be the reason for the positive correlation you observe. Claus > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor- > bounces at stat.math.ethz.ch] On Behalf Of Michal G?ralski > Sent: 14 January 2010 11:59 > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Positive correlation between dye-swap technical replicates > > Dear All, > > I have some doubts concerning linear model used in my data analysis. I > was searching for the answer on the mail list but I didn't find the > similar case. > I analyse tobacco roots treated with 2 types of stress: NaCl and CdCl2. > I have pooled common reference and I have 3 biological replicates of > treated plants. I also did dye swaps as technical replicate. > This is my targets file: > SlideNumber Name FileName Cy3 Cy5 > 13244317 21 13244317.gpr Control NaCl > 13244318 22 13244318.gpr Control NaCl > 13244315 23 13244315.gpr Control NaCl > 13244319 31 13244319.gpr Control CdCl2 > 13244337 32 13244337.gpr Control CdCl2 > 13244316 33 13244316.gpr Control CdCl2 > 13244330 21 13244330.gpr NaCl Control > 13244329 22 13244329.gpr NaCl Control > 13244331 23 13244331.gpr NaCl Control > 13244333 31 13244333.gpr CdCl2 Control > 13244335 32 13244335.gpr CdCl2 Control > 13244336 33 13244336.gpr CdCl2 Control > > I did background subtraction with method "normexp" and normalization > "pronttip loess", without normalization between arrays. > > Now I have the vector indicating biological and technical replicates. > > >biolrep=c(1,2,3,4,5,6,1,2,3,4,5,6) > > and create model matrix: > > >design=modelMatrix(targets, ref="Control") > > design > CdCl2 NaCl > [1,] 0 1 > [2,] 0 1 > [3,] 0 1 > [4,] 1 0 > [5,] 1 0 > [6,] 1 0 > [7,] 0 -1 > [8,] 0 -1 > [9,] 0 -1 > [10,] -1 0 > [11,] -1 0 > [12,] -1 0 > > I'm interested in such contrasts: > > >cmatrix=makeContrasts(NaCl, CdCl2, NaCl-CdCl2,levels=design) > > cmatrix > Contrasts > Levels NaCl CdCl2 NaCl - CdCl2 > CdCl2 0 1 -1 > NaCl 1 0 1 > > Object for duplicate correlation with dye-swaps: > > >corfit=duplicateCorrelation(MA, design=design, ndups=1, block=biolrep) > > and the first problem is: > > corfit$consensus > [1] 0.3926545 > > In limma manual it is written that correlation should be negative for > dye swaps- why is it positive?- is it a question of wrong model matrix > or is it something wrong with my samples? > > but > > When I do simple hierarchical clustering of log-ratios: > >dist.matrix=dist(t(MA$M)) > >hc=hclust(dist.matrix) > >par(mfrow=c(1,1) > >plot(hc) > > The plot divides my arrays in two groups that exactly reflects dye > swaps. So maybe the model is correct? > > I was thinking also about checking dye effect so I tried with such model: > > > design2=cbind(Dye=1, design) > > design2 > Dye CdCl2 NaCl > [1,] 1 0 1 > [2,] 1 0 1 > [3,] 1 0 1 > [4,] 1 1 0 > [5,] 1 1 0 > [6,] 1 1 0 > [7,] 1 0 -1 > [8,] 1 0 -1 > [9,] 1 0 -1 > [10,] 1 -1 0 > [11,] 1 -1 0 > [12,] 1 -1 0 > > I'm not sure if I can use such model. > if I use it: > > > corfit=duplicateCorrelation(MA, design=design2, ndups=1, > block=blockrep) > > corfit$consensus > [1] -0.04530506 > > The second problem is that each probe on my array is duplicated so in > the final top table I have each gene doubled- I read it is not possible > in Limma to analyse both technical duplicates and gene replicas on the > array. Could you give me any hint how to solve this problem? > > I will be glad for any help in this cases > > Best regards, > > Michal Goralski, PhD student, > Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, > Poland. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
@micha-goralski-3888
Last seen 7.1 years ago
Dear All, Thank you for the advices. They were very helpful. Now I'm going to check everything. Best regards, Michal Goralski, PhD student, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.