batch effect on variances

0

Entering edit mode

Lana Schaffer ★ 1.3k

@lana-schaffer-1056

Last seen 9.7 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060925/ 1d12ccaa/attachment.pl

• 467 views

ADD COMMENT • link 17.7 years ago Lana Schaffer ★ 1.3k

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 9.0 years ago

United States

Lana Schaffer wrote: > Hi, > I want to find out if there is a batch effect (FEM or REM) on the variance for 2 sets of > data which are discrete (different) treatments (time). The GeneMeta package is designed > to combine batches which measure the same treatment effects. However, I have what > corresponds to 2 different treatment effects. Is it valid to check homogeneity for the > 2 batches? Hi, You can do some things, but I am not sure why you care? If the two experiments do not have the same treatments then there is no sensible analysis that combines them, so whether or not the variances are the same, seems like an odd question, at least to me. What would you want to say about it and how might you try and use it? You can fit an appropriate model to each gene in each experiment separately, say using limma or any of the multitude of packages in BioC to do this. Once that has been done, you can estimate per gene variances, and then their ratio, suitably normalized will almost surely follow some form of F statistic (provided that samples are not too small and that the models are reasonable). But I am still not sure what you would do with such information. best wishes Robert > > Lana Schaffer > Biostatistics/Informatics > The Scripps Research Institute > DNA Array Core Facility > La Jolla, CA 92037 > (858) 784-2263 > (858) 784-2994 > schaffer at scripps.edu > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 17.7 years ago rgentleman ★ 5.5k

0

Entering edit mode

Lana Schaffer ★ 1.3k

@lana-schaffer-1056

Last seen 9.7 years ago

Hi, This is just a case of matching profiles during something like a time course. In packages like GeneSpring and Spotfile the user is allowed to chose a profile and then find other "genes" with matching profiles for varying correlations. In this case I have pvalues that are significant for differential expression at short time periods and not significant at long time periods. I want to get the list of genes with that profile. Alternatively, the log expression ratio is high at short time period and then levels off at long time periods for all the genes of interest. I thought that the high pvalues was due to increased variance and therefore heterogeneity. I don't know how to think about the decreased expression along with this, since I am dealing with differential expression. I have done co-expression analysis for all these genes and find them to be co-expressed between 2 modules. Show the goal is to show levels of hetergeneity between the time periods. I am wondering if I used limma correctly for I divided up the samples into 3 "time periods" and then then fit the samples together. I then used contrasts to get adjusted pvalues for the genes for the 3 "time periods". When I graphed the trends in pvalues for each of the genes over time I get profiles which increase and then flatten for a set of genes (I want to get that set of genes) and then other profiles. I want to show that the variance (hetergeneity) increases with time with some of the genes. I think that I could do a multivariate regression to indicate a regression in differential expression, but then if the ratio is leveling off then regression won't tell me anything. I hope you can understand where I am going. Lana ----- Original Message ----- From: "Robert Gentleman" <rgentlem@fhcrc.org> To: "Lana Schaffer" <schaffer at="" scripps.edu=""> Cc: <bioconductor at="" stat.math.ethz.ch=""> Sent: Tuesday, September 26, 2006 9:02 AM Subject: Re: [BioC] batch effect on variances > > > Lana Schaffer wrote: >> Hi, >> I want to find out if there is a batch effect (FEM or REM) on the >> variance for 2 sets of >> data which are discrete (different) treatments (time). The GeneMeta >> package is designed >> to combine batches which measure the same treatment effects. However, I >> have what >> corresponds to 2 different treatment effects. Is it valid to check >> homogeneity for the 2 batches? > > Hi, > You can do some things, but I am not sure why you care? If the two > experiments do not have the same treatments then there is no sensible > analysis that combines them, so whether or not the variances are the same, > seems like an odd question, at least to me. What would you want to say > about it and how might you try and use it? > > You can fit an appropriate model to each gene in each experiment > separately, say using limma or any of the multitude of packages in BioC to > do this. Once that has been done, you can estimate per gene variances, and > then their ratio, suitably normalized will almost surely follow some form > of F statistic (provided that samples are not too small and that the > models are reasonable). But I am still not sure what you would do with > such information. > > best wishes > Robert > > >> >> Lana Schaffer >> Biostatistics/Informatics >> The Scripps Research Institute >> DNA Array Core Facility >> La Jolla, CA 92037 >> (858) 784-2263 >> (858) 784-2994 >> schaffer at scripps.edu >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org >

ADD COMMENT • link 17.7 years ago Lana Schaffer ★ 1.3k

0

Entering edit mode

Lana Schaffer wrote: > Hi, > This is just a case of matching profiles during something > like a time course. In packages like GeneSpring and Spotfile the user > is allowed to chose a profile and then find other "genes" with matching > profiles for varying correlations. In this case I have pvalues that are > significant for differential expression at short time periods and not > significant at long time periods. I want to get the list of genes with that > profile. genefinder in the genefilter package - does something like this. > Alternatively, the log expression ratio is high at short time period and > then levels off > at long time periods for all the genes of interest. I thought that the high > pvalues > was due to increased variance and therefore heterogeneity. I don't know how > to > think about the decreased expression along with this, since I am dealing > with > differential expression. I have done co-expression analysis for all these > genes and > find them to be co-expressed between 2 modules. Show the goal is to show > levels of hetergeneity between the time periods. > > I am wondering if I used limma correctly for I divided up the samples into > 3 "time periods" and then then fit the samples together. I then used > contrasts to get adjusted pvalues for the genes for the 3 "time periods". > When > I graphed the trends in pvalues for each of the genes over time I get > profiles which > increase and then flatten for a set of genes (I want to get that set of > genes) > and then other profiles. I want to show that the > variance (hetergeneity) increases with time with some of the genes. I do not understand your pre-occupation with p-values. I think you should be interested in patterns of expression, not patterns in the p-values. > I think that I could do a multivariate regression to indicate a regression > in > differential expression, but then if the ratio is leveling off then > regression > won't tell me anything. That is why you need to fully specify the profile of interest, and then measure distances from it. > I hope you can understand where I am going. > Lana > > > ----- Original Message ----- > From: "Robert Gentleman" <rgentlem at="" fhcrc.org=""> > To: "Lana Schaffer" <schaffer at="" scripps.edu=""> > Cc: <bioconductor at="" stat.math.ethz.ch=""> > Sent: Tuesday, September 26, 2006 9:02 AM > Subject: Re: [BioC] batch effect on variances > > >> >> Lana Schaffer wrote: >>> Hi, >>> I want to find out if there is a batch effect (FEM or REM) on the >>> variance for 2 sets of >>> data which are discrete (different) treatments (time). The GeneMeta >>> package is designed >>> to combine batches which measure the same treatment effects. However, I >>> have what >>> corresponds to 2 different treatment effects. Is it valid to check >>> homogeneity for the 2 batches? >> Hi, >> You can do some things, but I am not sure why you care? If the two >> experiments do not have the same treatments then there is no sensible >> analysis that combines them, so whether or not the variances are the same, >> seems like an odd question, at least to me. What would you want to say >> about it and how might you try and use it? >> >> You can fit an appropriate model to each gene in each experiment >> separately, say using limma or any of the multitude of packages in BioC to >> do this. Once that has been done, you can estimate per gene variances, and >> then their ratio, suitably normalized will almost surely follow some form >> of F statistic (provided that samples are not too small and that the >> models are reasonable). But I am still not sure what you would do with >> such information. >> >> best wishes >> Robert >> >> >>> Lana Schaffer >>> Biostatistics/Informatics >>> The Scripps Research Institute >>> DNA Array Core Facility >>> La Jolla, CA 92037 >>> (858) 784-2263 >>> (858) 784-2994 >>> schaffer at scripps.edu >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> -- >> Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 17.7 years ago rgentleman ★ 5.5k

0

Entering edit mode

On Tuesday 26 September 2006 13:29, Lana Schaffer wrote: > Hi, > This is just a case of matching profiles during something > like a time course. In packages like GeneSpring and Spotfile the user > is allowed to chose a profile and then find other "genes" with matching > profiles for varying correlations. In this case I have pvalues that are > significant for differential expression at short time periods and not > significant at long time periods. I want to get the list of genes with > that profile. Hi, Lana. You might want to choose your genes not based on the p-value for a particular time point, but rather based on an f-statistic, of which there is only one per gene. Then, do whatever type of clustering you like using those genes that show a high f-statistic. I think a lot of folks would use kmeans or hierarchical clustering to then group genes that show a similar profile, but there are many ways to approach clustering. Sean

ADD REPLY • link 17.7 years ago Sean Davis 21k

0

Entering edit mode

Lana Schaffer ★ 1.3k

@lana-schaffer-1056

Last seen 9.7 years ago

Robert, The point is that the standard way of calculating p-values for significance of expression does not work especially for the hetergenous dataset. So I need to show why the standard analysis does not work while showing co-expression or patterns of expression does give meaningful results. Lana

ADD COMMENT • link 17.7 years ago Lana Schaffer ★ 1.3k

Login before adding your answer.