Cross-comparison of independent intensities from different experiments (genepix) (sorry I don\'t know how to describe the problem better)

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

Dear all, could please anyone help me with the following problem: Experiments were performed using two color cDNA .gpr files (genepix). We have an experimental setup with two independent time series (each of it with 4 time-points (in the following T1 - T4). In the first time series Wildtype(WT) cells were stressed at time point zero with a certain drug and probes were taken at 4 time points afterwards. These probes were compared with the unstressed WT. In the second time series mutant-cells (MU) were treated identically and compared with the unstressed MU cell. Here is the target file > targets FileName Cy3 Cy5 1 13754122.gpr WT WT_stress_T1 2 13754112.gpr WT_stress_T1 WT 3 14039687.gpr WT WT_stress_T2 4 13754123.gpr WT WT_stress_T2 5 13754109.gpr WT WT_stress_T3 6 14039055.gpr WT_stress_T3 WT 7 14004643.gpr WT WT_stress_T4 8 14039058.gpr WT_stress_T4 WT 9 14039688.gpr MU MU_stress_T1 10 13754114.gpr MU_stress_T1 MU 11 14039061.gpr MU MU_stress_T2 12 14039059.gpr MU_stress_T2 MU 13 13754124.gpr MU MU_stress_T3 14 13754115.gpr MU_stress_T3 MU 15 14039057.gpr MU MU_stress_T4 16 14039056.gpr MU_stress_T4 MU I was working a lot with these data and we had some very interesting results, however, I am not able to solve the following problem: How can a make a comparison between a) MU and WT b) MU_stressed and WT A am not the experimenter and it is also not possible to repeat the experiment and produce a direct comparison. However, I think - even if it is not the most elegant way - there should be a way to make this comparison with the existing data. I was already thinking of simple "copy and past" the single channel intensities from the .gpr-files into a new matrix, but I guess this would cause a lot of problems concerning normalization steps. Perhaps the answer is very easy, - then sorry for bothering you - but I swear I was reading a lot (tutorials) but actually I even don't know what keywords to search (google) for this problem. What I do right now (after preprocessing) is: # # Average <- avedups(genes, ndups=2, spacing=1) Average$A[ is.na(Average$A) ] <- 0.0 Average$M[ is.na(Average$M) ] <- 0.0 # designWT <- modelMatrix(targets,ref="WT") designWT <- designWT[1:8,1:4] designWT designMU <- modelMatrix(targets,ref="MU") designMU <- designMU[9:16,6:9] designMU AverageWT <- Average[,1:8] AverageMU <- Average[,9:16] # fit_WT <- lmFit(AverageWT, designWT) fit_WT <- eBayes(fit_WT) topTable(fit_WT) fit_MU <- lmFit(AverageMU, designMU) fit_MU <- eBayes(fit_MU) topTable(fit_MU) # .... and further analysis and evaluation procedures # Please, what would be the best way to make the comparison a) MU_(T1-4) with WT as reference and b) MU_stressed (T1-4 )with WT as a reference ? Thanks a lot in advance for the help ! I would be so grateful if someone could give me an answer. Best regards, Susanne -- output of sessionInfo(): > sessionInfo() R version 2.13.2 (2011-09-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C/en_US.UTF-8/C/C/C/C attached base packages: [1] splines tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.3-14 calibrate_1.7 Heatplus_1.22.0 XML_3.4-3 annaffy_1.24.0 KEGG.db_2.5.0 [7] goProfiles_1.14.0 GO.db_2.5.0 annotate_1.30.1 yeast2.db_2.5.0 org.Sc.sgd.db_2.5.0 RSQLite_0.10.0 [13] DBI_0.2-5 AnnotationDbi_1.14.1 statmod_1.4.14 vsn_3.20.0 arrayQuality_1.30.0 convert_1.28.0 [19] affy_1.30.0 marray_1.30.0 limma_3.8.3 maSigPro_1.24.1 DynDoc_1.30.0 widgetTools_1.30.0 [25] Biobase_2.12.2 loaded via a namespace (and not attached): [1] Mfuzz_2.10.0 RColorBrewer_1.0-5 affyio_1.20.0 grid_2.13.2 gridBase_0.4-4 hexbin_1.26.0 [7] lattice_0.19-33 preprocessCore_1.14.0 tkWidgets_1.30.0 tools_2.13.2 xtable_1.6-0 > -- Sent via the guest posting facility at bioconductor.org.

Normalization GO Normalization GO • 922 views

ADD COMMENT • link updated 13.2 years ago by James W. MacDonald 68k • written 13.2 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Susanne, On 2/3/2012 7:53 AM, Susanne Gerber [guest] wrote: > Dear all, > could please anyone help me with the following problem: > > Experiments were performed using two color cDNA .gpr files (genepix). > We have an experimental setup with two independent time series (each of it with 4 time-points (in the following T1 - T4). > > In the first time series Wildtype(WT) cells were stressed at time point zero with a certain drug and probes were taken at 4 time points afterwards. > These probes were compared with the unstressed WT. > > In the second time series mutant-cells (MU) were treated identically and compared with the unstressed MU cell. > > > Here is the target file > >> targets > FileName Cy3 Cy5 > 1 13754122.gpr WT WT_stress_T1 > 2 13754112.gpr WT_stress_T1 WT > 3 14039687.gpr WT WT_stress_T2 > 4 13754123.gpr WT WT_stress_T2 > 5 13754109.gpr WT WT_stress_T3 > 6 14039055.gpr WT_stress_T3 WT > 7 14004643.gpr WT WT_stress_T4 > 8 14039058.gpr WT_stress_T4 WT > 9 14039688.gpr MU MU_stress_T1 > 10 13754114.gpr MU_stress_T1 MU > 11 14039061.gpr MU MU_stress_T2 > 12 14039059.gpr MU_stress_T2 MU > 13 13754124.gpr MU MU_stress_T3 > 14 13754115.gpr MU_stress_T3 MU > 15 14039057.gpr MU MU_stress_T4 > 16 14039056.gpr MU_stress_T4 MU > > I was working a lot with these data and we had some very interesting results, however, I am not able to solve the following problem: > > How can a make a comparison between > a) MU and WT > b) MU_stressed and WT That's because this is an unsolvable problem with the data in hand. I assume that by 'two independent time series' you mean that these experiments were conducted at different times, perhaps in different labs, etc? There are two problems here. First, depending on what you mean by 'independent time series', a batch effect may have been introduced, which you will not be able to account for statistically. However, depending on the nature of the independence between these time series, you may be able to get away with assuming little or no batch effect. But you will have to make that assumption without really being able to test it. The second problem is due to the fact that you never hybridized MU and WT samples on the same chip, which has introduced another untestable and unquantifiable 'chip' effect. You could hypothetically do a single channel analysis with these data, but any comparison between MU and WT would include both biological and technical variability, and you won't be able to say how much of either. Again, you can assume that the technical variability is small, but you won't really be able to say for sure if this assumption is true. To a certain extent, both time series have to be independent, as MU and WT cells are different. So if 'independent time series' just means that the experimenter did the WT time series and then did the MU time series, that's a batch effect that people ignore all the time, and I don't see a need to repeat the experiment. But if the experimenter really wants to compare the MU and WT samples directly, they need to be hybridized to the same chips, preferably in one of these 'round-robin' type designs where you do things like MU1 vs WT1 MU stressed1 vs WT2 MU stressed2 vs WT stressed1 MU2 vs WT stressed2 which tends to reduce variability for comparisons. There may be something about these types of design in the limma user's guide. The maanova package was designed specifically for this type of analysis, so you might look at that package as well; I assume there is a vignette that may have helpful insights. You could also look at some of Katie Kerr's papers (do a google scholar search for kerr anova microarray). Best, Jim > > A am not the experimenter and it is also not possible to repeat the experiment and produce a direct comparison. > > However, I think - even if it is not the most elegant way - there should be a way to make this comparison with the existing data. > > I was already thinking of simple "copy and past" the single channel intensities from the .gpr-files into a new matrix, but I guess this would cause a lot of problems concerning normalization steps. > Perhaps the answer is very easy, - then sorry for bothering you - but I swear I was reading a lot (tutorials) but actually I even don't know what keywords to search (google) for this problem. > > What I do right now (after preprocessing) is: > # > # > Average<- avedups(genes, ndups=2, spacing=1) > Average$A[ is.na(Average$A) ]<- 0.0 > Average$M[ is.na(Average$M) ]<- 0.0 > # > designWT<- modelMatrix(targets,ref="WT") > designWT<- designWT[1:8,1:4] > designWT > designMU<- modelMatrix(targets,ref="MU") > designMU<- designMU[9:16,6:9] > designMU > > AverageWT<- Average[,1:8] > AverageMU<- Average[,9:16] > # > fit_WT<- lmFit(AverageWT, designWT) > fit_WT<- eBayes(fit_WT) > topTable(fit_WT) > fit_MU<- lmFit(AverageMU, designMU) > fit_MU<- eBayes(fit_MU) > topTable(fit_MU) > > # > .... and further analysis and evaluation procedures > # > > > Please, what would be the best way to make the comparison > > a) MU_(T1-4) with WT as reference > and > b) MU_stressed (T1-4 )with WT as a reference ? > > Thanks a lot in advance for the help ! > I would be so grateful if someone could give me an answer. > > Best regards, > Susanne > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 2.13.2 (2011-09-30) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C/en_US.UTF-8/C/C/C/C > > attached base packages: > [1] splines tcltk stats graphics grDevices utils datasets methods base > > other attached packages: > [1] MASS_7.3-14 calibrate_1.7 Heatplus_1.22.0 XML_3.4-3 annaffy_1.24.0 KEGG.db_2.5.0 > [7] goProfiles_1.14.0 GO.db_2.5.0 annotate_1.30.1 yeast2.db_2.5.0 org.Sc.sgd.db_2.5.0 RSQLite_0.10.0 > [13] DBI_0.2-5 AnnotationDbi_1.14.1 statmod_1.4.14 vsn_3.20.0 arrayQuality_1.30.0 convert_1.28.0 > [19] affy_1.30.0 marray_1.30.0 limma_3.8.3 maSigPro_1.24.1 DynDoc_1.30.0 widgetTools_1.30.0 > [25] Biobase_2.12.2 > > loaded via a namespace (and not attached): > [1] Mfuzz_2.10.0 RColorBrewer_1.0-5 affyio_1.20.0 grid_2.13.2 gridBase_0.4-4 hexbin_1.26.0 > [7] lattice_0.19-33 preprocessCore_1.14.0 tkWidgets_1.30.0 tools_2.13.2 xtable_1.6-0 > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 13.2 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James, thank you so much for the very fast and detailed response. I will start answering your questions: > I assume that by 'two independent time series' you mean that these experiments were conducted at different times, perhaps in different labs, etc? The first experiment was performed half a year earlier but within the same lab and by the same experimenter. >The second problem is due to the fact that you never hybridized MU and WT samples on the same chip, >which has introduced another untestable and unquantifiable 'chip' effect. Well, this is actually the problem I am struggling with. >You could hypothetically do a single channel analysis with these data, > but any comparison between MU and WT would include both biological and technical variability, > and you won't be able to say how much of either. > Again, you can assume that the technical variability is small, but you won't really be able to say for sure if this assumption is true. I think I have to do so, since these data are the only dataset I have. I am not an experimenter and the lab where the data originally came from can not perform these new experiments (the cells are not available any more, project ran out last year, no money, no staff...). Thats what I meant by saying "it is also not possible to repeat the experiment and produce a direct comparison." Sorry for being so imprecise. :) The maanova package is great and I already used it, however I still do not know how to perform this single channel analysis you were talking about with my two-colour data. What would be the best way (or is there already an existing package for this) to treat the data and to extract the information ? Thanks again so much for your help Best regards, Susanne -- Dr. Susanne Gerber Computational Time Series Analysis Institute of Computational Science University of Lugano Via Giuseppe Buffi 13 6904 Lugano http://www.ics.inf.usi.ch/people/dr-susanne-gerber.html 2012/2/3 James W. MacDonald <jmacdon at="" med.umich.edu="">: > Hi Susanne, > > > On 2/3/2012 7:53 AM, Susanne Gerber [guest] wrote: >> >> Dear all, >> could please anyone help me with the following problem: >> >> Experiments were performed using two color cDNA .gpr files (genepix). >> We have an experimental setup with two independent time series (each of it >> with 4 time-points (in the following T1 - T4). >> >> In the first time series Wildtype(WT) cells were stressed at time point >> zero with a certain drug and probes were taken at 4 time points afterwards. >> These probes were compared with the unstressed WT. >> >> In the second time series mutant-cells (MU) were treated identically and >> compared with the unstressed MU cell. >> >> >> Here is the target file >> >>> targets >> >> ? ? ? ?FileName ? ? ? ? ?Cy3 ? ? ? ? ? ? ? ? ? ? ? Cy5 >> 1 ?13754122.gpr ? ? ?WT ? ? ? ? ? ? ? ? ? ? ? WT_stress_T1 >> 2 ?13754112.gpr ? ? ?WT_stress_T1 ? ? ? WT >> 3 ?14039687.gpr ? ? ?WT ? ? ? ? ? ? ? ? ? ? ? WT_stress_T2 >> 4 ?13754123.gpr ? ? ?WT ? ? ? ? ? ? ? ? ? ? ? WT_stress_T2 >> 5 ?13754109.gpr ? ? ?WT ? ? ? ? ? ? ? ? ? ? ? WT_stress_T3 >> 6 ?14039055.gpr ? ? ?WT_stress_T3 ? ? ? WT >> 7 ?14004643.gpr ? ? ?WT ? ? ? ? ? ? ? ? ? ? ? WT_stress_T4 >> 8 ?14039058.gpr ? ? ?WT_stress_T4 ? ? ? WT >> 9 ?14039688.gpr ? ? ?MU ? ? ? ? ? ? ? ? ? ? ? MU_stress_T1 >> 10 13754114.gpr ? ? MU_stress_T1 ? ? ? MU >> 11 14039061.gpr ? ? MU ? ? ? ? ? ? ? ? ? ? ? MU_stress_T2 >> 12 14039059.gpr ? ? MU_stress_T2 ? ? ? MU >> 13 13754124.gpr ? ? MU ? ? ? ? ? ? ? ? ? ? ? MU_stress_T3 >> 14 13754115.gpr ? ? MU_stress_T3 ? ? ? MU >> 15 14039057.gpr ? ? MU ? ? ? ? ? ? ? ? ? ? ? MU_stress_T4 >> 16 14039056.gpr ? ? MU_stress_T4 ? ? ? MU >> >> I was working a lot with these data and we had some very interesting >> results, however, ?I am not able to solve the following problem: >> >> How can a make a comparison between >> a) MU and WT >> b) MU_stressed and WT > > > That's because this is an unsolvable problem with the data in hand. I assume > that by 'two independent time series' you mean that these experiments were > conducted at different times, perhaps in different labs, etc? > > There are two problems here. First, depending on what you mean by > 'independent time series', a batch effect may have been introduced, which > you will not be able to account for statistically. However, depending on the > nature of the independence between these time series, you may be able to get > away with assuming little or no batch effect. But you will have to make that > assumption without really being able to test it. > > The second problem is due to the fact that you never hybridized MU and WT > samples on the same chip, which has introduced another untestable and > unquantifiable 'chip' effect. You could hypothetically do a single channel > analysis with these data, but any comparison between MU and WT would include > both biological and technical variability, and you won't be able to say how > much of either. Again, you can assume that the technical variability is > small, but you won't really be able to say for sure if this assumption is > true. > > To a certain extent, both time series have to be independent, as MU and WT > cells are different. So if 'independent time series' just means that the > experimenter did the WT time series and then did the MU time series, that's > a batch effect that people ignore all the time, and I don't see a need to > repeat the experiment. But if the experimenter really wants to compare the > MU and WT samples directly, they need to be hybridized to the same chips, > preferably in one of these 'round-robin' type designs where you do things > like > > MU1 vs WT1 > MU stressed1 vs WT2 > MU stressed2 vs WT stressed1 > MU2 vs WT stressed2 > > which tends to reduce variability for comparisons. There may be something > about these types of design in the limma user's guide. The maanova package > was designed specifically for this type of analysis, so you might look at > that package as well; I assume there is a vignette that may have helpful > insights. You could also look at some of Katie Kerr's papers (do a google > scholar search for kerr anova microarray). > > Best, > > Jim >> >> >> A am not the experimenter and it is also not possible to repeat the >> experiment and produce a direct comparison. >> >> However, I think - even if it is not the most elegant way - there should >> be a way to make this comparison with the existing data. >> >> I was already thinking of simple "copy and past" the single channel >> intensities from the .gpr-files into a new matrix, but I guess this would >> cause a lot of problems concerning normalization steps. >> Perhaps the answer is very easy, - then sorry for bothering you - but I >> swear I was reading a lot (tutorials) but actually I even don't know what >> keywords to search (google) for this problem. >> >> What I do right now (after preprocessing) is: >> # >> # >> Average<- avedups(genes, ndups=2, spacing=1) >> Average$A[ is.na(Average$A) ]<- 0.0 >> Average$M[ is.na(Average$M) ]<- 0.0 >> # >> designWT<- modelMatrix(targets,ref="WT") >> designWT<- designWT[1:8,1:4] >> designWT >> designMU<- modelMatrix(targets,ref="MU") >> designMU<- designMU[9:16,6:9] >> designMU >> >> AverageWT<- Average[,1:8] >> AverageMU<- Average[,9:16] >> # >> fit_WT<- lmFit(AverageWT, designWT) >> fit_WT<- eBayes(fit_WT) >> topTable(fit_WT) >> fit_MU<- lmFit(AverageMU, designMU) >> fit_MU<- eBayes(fit_MU) >> topTable(fit_MU) >> >> # >> .... and further analysis and evaluation procedures >> # >> >> >> Please, what would be the best way to make the comparison >> >> a) MU_(T1-4) with WT as reference >> and >> b) MU_stressed (T1-4 )with WT as a reference ?? >> >> Thanks a lot in advance for the help ! >> I would be so grateful if someone could give me an answer. >> >> Best regards, >> Susanne >> >> >> >> ?-- output of sessionInfo(): >> >>> sessionInfo() >> >> R version 2.13.2 (2011-09-30) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C/en_US.UTF-8/C/C/C/C >> >> attached base packages: >> [1] splines ? tcltk ? ? stats ? ? graphics ?grDevices utils ? ? datasets >> ?methods ? base >> >> other attached packages: >> ?[1] MASS_7.3-14 ? ? ? ? ?calibrate_1.7 ? ? ? ?Heatplus_1.22.0 >> ?XML_3.4-3 ? ? ? ? ? ?annaffy_1.24.0 ? ? ? KEGG.db_2.5.0 >> ?[7] goProfiles_1.14.0 ? ?GO.db_2.5.0 ? ? ? ? ?annotate_1.30.1 >> ?yeast2.db_2.5.0 ? ? ?org.Sc.sgd.db_2.5.0 ?RSQLite_0.10.0 >> [13] DBI_0.2-5 ? ? ? ? ? ?AnnotationDbi_1.14.1 statmod_1.4.14 >> vsn_3.20.0 ? ? ? ? ? arrayQuality_1.30.0 ?convert_1.28.0 >> [19] affy_1.30.0 ? ? ? ? ?marray_1.30.0 ? ? ? ?limma_3.8.3 >> ?maSigPro_1.24.1 ? ? ?DynDoc_1.30.0 ? ? ? ?widgetTools_1.30.0 >> [25] Biobase_2.12.2 >> >> loaded via a namespace (and not attached): >> ?[1] Mfuzz_2.10.0 ? ? ? ? ?RColorBrewer_1.0-5 ? ?affyio_1.20.0 >> grid_2.13.2 ? ? ? ? ? gridBase_0.4-4 ? ? ? ?hexbin_1.26.0 >> ?[7] lattice_0.19-33 ? ? ? preprocessCore_1.14.0 tkWidgets_1.30.0 >> ?tools_2.13.2 ? ? ? ? ?xtable_1.6-0 >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues

ADD REPLY • link 13.2 years ago Susanne Gerber ▴ 10

Login before adding your answer.