Question on limma single-channel analysis of dual-channel microarray
1
0
Entering edit mode
@deanne-taylor-2380
Last seen 7.4 years ago
Hi... I have a few complex analyses I am trying to perform on some older data. I've tried looking through the mailing lists and literature and perhaps what I have here is a difficulty in understanding. So, on to my question... I have 48 microarrays of two-channel Agilent data, no technical replicates or dye swaps, but 3 biological replicates of 16 groups. For instance, one one chip would be Disease_24h_1mg vs Normal_24h_1mg. And so forth... There need to be comparisons made between the groups of doses and time series, which I'm comfortable with (at least conceptually and with single-channel data). There is no common reference, and most of this data are time series comparing two conditions and doses of treatment on diseased and disease-free: Condition Time Treatment Disease 24h 1mg Normal 24h 1mg Disease 48h 1mg Normal 48h 1mg (...) Disease 48h 2mg Normal 48h 2mg Disease 24h 2mg Normal 24h 2mg (...) (and so forth) I would like to make contrasts between diseased at different time series (24, 48, 120hours, etc) as well as across doses (1mg, 2mg, etc) and compare all of it to some undosed normal controls. I think I'm almost there, but I find I'm getting unexpected results. I have taken the following steps which all seem to work. In this script, I have left in the control spots in MA because I found some odd behavior when I tried to take them out with MA$genes$ControlType ==0 library(limma) ID="Targets" targets<-readTargets(paste(ID, ".csv",sep=""), sep=",") files<-targets$FileName RG<-read.maimages(files, source="agilent") RG_norm<-normalizeWithinArrays(RG, bc.method="none", method="loess") #is this step necessary here? MA<-normalizeBetweenArrays(RG_norm, method="Aquantile") targets2 <- targetsA2C(targets) targets2 u <- unique(targets2$Target) f <- factor(targets2$Target, levels=u) design <- model.matrix(~0+f) colnames(design) <- u corfit <- intraspotCorrelation(MA, design) fit <- lmscFit(MA, design, correlation=corfit$consensus) #Now, I wanted to check out the first "test" of what I can get with a normal contrast between Cy3 and Cy5 on the chips themselves: cont.dif<-makeContrasts(Disease_24_1mg - Normal_24_1mg, levels=design) fit2<-contrasts.fit(fit, cont.dif) fit2<-eBayes(fit2) toptable(fit2) Now, what i've just done up there is compare two groups that are on the same arrays, but the p-values and gene IDs I get after lmscFit are very dissimilar to the ones I get if I just run a simple limma analysis across JUST the three two channel chips representing Disease_24_1mg/Normal_24_1mg. I can't understand why there is such a complete difference between what one gets with a fit across three two-color microarrays (Cy3 vs Cy5) and when comparing the same two channels that were once represented on three single chips, together after the single-channel fit. Is my model wrong? Can someone point me to an answer? Thank you so much for your help. --- Deanne Taylor PhD Department of Biostatistics Harvard School of Public Health 655 Huntington Avenue Boston, MA 02115 dtaylor at hsph.harvard.edu
• 1.3k views
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 9 months ago
United States
Hi Deanne, Gordon can correct me here if I am wrong, but I think that when there is no intercept in the design matrix then intraspotCorrelation uses deviations from 0, rather than deviations from the mean, which is the wrong type of correlation to account for the array effect. Besides this, I have noticed (and I think others have noted on this list) that small changes to microarray data, such as the normalization method, exclusion of a few arrays, etc, can lead to large changes in the list of significant genes. On the other hand, the genes that jump on and off the list are often low expressing genes. For these genes, the F or t-test denominator is very sensitive to small changes in the data. Regularization methods such as limma and SAM damp this somewhat, but in small sample sizes you can still get sizeable effects. If you use all of the arrays, you are doing a much different denominator regularization than if you use only a few arrays. --Naomi At 08:21 PM 9/15/2007, Deanne Taylor wrote: >Hi... > >I have a few complex analyses I am trying to perform on some older >data. I've tried looking through the mailing lists and literature >and perhaps what I have here is a difficulty in understanding. So, >on to my question... > >I have 48 microarrays of two-channel Agilent data, no technical >replicates or dye swaps, but 3 biological replicates of 16 groups. > >For instance, one one chip would be Disease_24h_1mg vs >Normal_24h_1mg. And so forth... > >There need to be comparisons made between the groups of doses and >time series, which I'm comfortable with (at least conceptually and >with single-channel data). There is no common reference, and most of >this data are time series comparing two conditions and doses of >treatment on diseased and disease-free: > > >Condition Time Treatment >Disease 24h 1mg >Normal 24h 1mg >Disease 48h 1mg >Normal 48h 1mg > (...) > >Disease 48h 2mg >Normal 48h 2mg >Disease 24h 2mg >Normal 24h 2mg > (...) > >(and so forth) > >I would like to make contrasts between diseased at different time >series (24, 48, 120hours, etc) as well as across doses (1mg, 2mg, >etc) and compare all of it to some undosed normal controls. I think >I'm almost there, but I find I'm getting unexpected results. > > >I have taken the following steps which all seem to work. In this >script, I have left in the control spots in MA because I found some >odd behavior when I tried to take them out with MA$genes$ControlType ==0 > >library(limma) >ID="Targets" >targets<-readTargets(paste(ID, ".csv",sep=""), sep=",") >files<-targets$FileName >RG<-read.maimages(files, source="agilent") >RG_norm<-normalizeWithinArrays(RG, bc.method="none", method="loess") >#is this step necessary here? >MA<-normalizeBetweenArrays(RG_norm, method="Aquantile") >targets2 <- targetsA2C(targets) >targets2 >u <- unique(targets2$Target) >f <- factor(targets2$Target, levels=u) >design <- model.matrix(~0+f) >colnames(design) <- u >corfit <- intraspotCorrelation(MA, design) >fit <- lmscFit(MA, design, correlation=corfit$consensus) >#Now, I wanted to check out the first "test" of what I can get with >a normal contrast between Cy3 and Cy5 on the chips themselves: >cont.dif<-makeContrasts(Disease_24_1mg - Normal_24_1mg, levels=design) >fit2<-contrasts.fit(fit, cont.dif) >fit2<-eBayes(fit2) >toptable(fit2) > > >Now, what i've just done up there is compare two groups that are on >the same arrays, but the p-values and gene IDs I get after lmscFit >are very dissimilar to the ones I get if I just run a simple limma >analysis across JUST the three two channel chips representing >Disease_24_1mg/Normal_24_1mg. > >I can't understand why there is such a complete difference between >what one gets with a fit across three two-color microarrays (Cy3 vs >Cy5) and when comparing the same two channels that were once >represented on three single chips, together after the single-channel fit. > >Is my model wrong? Can someone point me to an answer? Thank you so >much for your help. > > > > > >--- >Deanne Taylor PhD >Department of Biostatistics >Harvard School of Public Health >655 Huntington Avenue >Boston, MA 02115 >dtaylor at hsph.harvard.edu > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
0
Entering edit mode
Thanks, Naomi. When I looked at the simple limma analysis of the 3 biological replicates (disease vs normal) the results closely correlated to some immunohistochemistry results we had -- so I tend to believe (even tho a small sample size) the simple limma analysis results on the three biological replicates. The differences in the IHC-measured genes in the single-channel approach did not match the IHC results at all, so I have a more difficult time believing the single-channel approach is approximating the biology. However, I do want to look across disease and across dose, so the single channel method is my only option it seems. I appreciate your suggestion (changing the intercept to "1"?) to look for deviations from the mean (as in model.matrix(~1 + f) with the single channel approach. Deanne --- Deanne Taylor PhD Department of Biostatistics Harvard School of Public Health 655 Huntington Avenue Boston, MA 02115 dtaylor at hsph.harvard.edu >>> Naomi Altman <naomi at="" stat.psu.edu=""> 09/15/07 8:51 PM >>> Hi Deanne, Gordon can correct me here if I am wrong, but I think that when there is no intercept in the design matrix then intraspotCorrelation uses deviations from 0, rather than deviations from the mean, which is the wrong type of correlation to account for the array effect. Besides this, I have noticed (and I think others have noted on this list) that small changes to microarray data, such as the normalization method, exclusion of a few arrays, etc, can lead to large changes in the list of significant genes. On the other hand, the genes that jump on and off the list are often low expressing genes. For these genes, the F or t-test denominator is very sensitive to small changes in the data. Regularization methods such as limma and SAM damp this somewhat, but in small sample sizes you can still get sizeable effects. If you use all of the arrays, you are doing a much different denominator regularization than if you use only a few arrays. --Naomi At 08:21 PM 9/15/2007, Deanne Taylor wrote: >Hi... > >I have a few complex analyses I am trying to perform on some older >data. I've tried looking through the mailing lists and literature >and perhaps what I have here is a difficulty in understanding. So, >on to my question... > >I have 48 microarrays of two-channel Agilent data, no technical >replicates or dye swaps, but 3 biological replicates of 16 groups. > >For instance, one one chip would be Disease_24h_1mg vs >Normal_24h_1mg. And so forth... > >There need to be comparisons made between the groups of doses and >time series, which I'm comfortable with (at least conceptually and >with single-channel data). There is no common reference, and most of >this data are time series comparing two conditions and doses of >treatment on diseased and disease-free: > > >Condition Time Treatment >Disease 24h 1mg >Normal 24h 1mg >Disease 48h 1mg >Normal 48h 1mg > (...) > >Disease 48h 2mg >Normal 48h 2mg >Disease 24h 2mg >Normal 24h 2mg > (...) > >(and so forth) > >I would like to make contrasts between diseased at different time >series (24, 48, 120hours, etc) as well as across doses (1mg, 2mg, >etc) and compare all of it to some undosed normal controls. I think >I'm almost there, but I find I'm getting unexpected results. > > >I have taken the following steps which all seem to work. In this >script, I have left in the control spots in MA because I found some >odd behavior when I tried to take them out with MA$genes$ControlType ==0 > >library(limma) >ID="Targets" >targets<-readTargets(paste(ID, ".csv",sep=""), sep=",") >files<-targets$FileName >RG<-read.maimages(files, source="agilent") >RG_norm<-normalizeWithinArrays(RG, bc.method="none", method="loess") >#is this step necessary here? >MA<-normalizeBetweenArrays(RG_norm, method="Aquantile") >targets2 <- targetsA2C(targets) >targets2 >u <- unique(targets2$Target) >f <- factor(targets2$Target, levels=u) >design <- model.matrix(~0+f) >colnames(design) <- u >corfit <- intraspotCorrelation(MA, design) >fit <- lmscFit(MA, design, correlation=corfit$consensus) >#Now, I wanted to check out the first "test" of what I can get with >a normal contrast between Cy3 and Cy5 on the chips themselves: >cont.dif<-makeContrasts(Disease_24_1mg - Normal_24_1mg, levels=design) >fit2<-contrasts.fit(fit, cont.dif) >fit2<-eBayes(fit2) >toptable(fit2) > > >Now, what i've just done up there is compare two groups that are on >the same arrays, but the p-values and gene IDs I get after lmscFit >are very dissimilar to the ones I get if I just run a simple limma >analysis across JUST the three two channel chips representing >Disease_24_1mg/Normal_24_1mg. > >I can't understand why there is such a complete difference between >what one gets with a fit across three two-color microarrays (Cy3 vs >Cy5) and when comparing the same two channels that were once >represented on three single chips, together after the single-channel fit. > >Is my model wrong? Can someone point me to an answer? Thank you so >much for your help. > > > > > >--- >Deanne Taylor PhD >Department of Biostatistics >Harvard School of Public Health >655 Huntington Avenue >Boston, MA 02115 >dtaylor at hsph.harvard.edu > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111