Looking for strongly correlated gene expression data

0

Entering edit mode

Kim, K.I. ▴ 20

@kim-ki-2072

Last seen 11.4 years ago

Hi BioConductor Users, I am looking for gene expression data sets with very strong correlation features. (positive or negative) So, I hope I can't expect independent uniform distributions for true null p-values of those data sets. If anyone knows such data sets, please let me know. Thank you. Kyung In Kim. PhD student, Department of mathematics and computer science, Technical University of Eindhoven.

• 1.4k views

ADD COMMENT • link updated 18.9 years ago by Michal Okoniewski ▴ 120 • written 18.9 years ago by Kim, K.I. ▴ 20

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 4 weeks ago

United States

On Monday 12 March 2007 08:33, Kim, K.I. wrote: > Hi BioConductor Users, > > I am looking for gene expression data sets with very strong correlation > features. (positive or negative) So, I hope I can't expect independent > uniform distributions for true null p-values of those data sets. > > If anyone knows such data sets, please let me know? Kyung, Could you simply test this in a bunch of datasets? In particular, could you download many (or all) of the datasets from NCBI GEO and test your hypothesis that such datasets exist and in what proportion? I may be misunderstanding what you want to do, though. Sean

ADD COMMENT • link 18.9 years ago Sean Davis 21k

0

Entering edit mode

I'd like to explain more. Simply I am considering multiple testings using gene expression data. In the usual two group multiple testing set-up, if we assume true null p-values are distributed independently and for example, 90% of p-values are truly null, then we can see around 90% of p-values are uniformly distributed. (for example, "golub" dataset in R multtest package) But if there exist strong correlations among p-values (or genes), then we can't expect such features. I guess histograms under dependent cases are more curved than flat line even for the large p-values. Actually, I am looking for gene expression datasets which shows "very" different histogram from the histograms of usual independent assumption and I want to do multiple testing using such datasets. I also thought downloading some gene expression files from a large database and then doing multiple testing but then I need to do some preprocessing jobs on the downloaded files and they will take some time. Instead I hoped to get "easy" dataset (already preprocessed like "golub" dataset in multtest package) in bioconductor. If there is no other convenient way to do it, then I may need to try NCBI GEO. Thank you for your advice. Kyung In. -----Original Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: Monday, March 12, 2007 2:05 PM To: bioconductor at stat.math.ethz.ch Cc: Kim, K.I. Subject: Re: [BioC] Looking for strongly correlated gene expression data On Monday 12 March 2007 08:33, Kim, K.I. wrote: > Hi BioConductor Users, > > I am looking for gene expression data sets with very strong correlation > features. (positive or negative) So, I hope I can't expect independent > uniform distributions for true null p-values of those data sets. > > If anyone knows such data sets, please let me know? Kyung, Could you simply test this in a bunch of datasets? In particular, could you download many (or all) of the datasets from NCBI GEO and test your hypothesis that such datasets exist and in what proportion? I may be misunderstanding what you want to do, though. Sean

ADD REPLY • link 18.9 years ago Kim, K.I. ▴ 20

0

Entering edit mode

On Tuesday 13 March 2007 06:01, Kim, K.I. wrote: > I'd like to explain more. Simply I am considering multiple testings > using gene expression data. > In the usual two group multiple testing set-up, if we assume true null > p-values are distributed independently and for example, 90% of p-values > are truly null, then we can see around 90% of p-values are uniformly > distributed. (for example, "golub" dataset in R multtest package) But if > there exist strong correlations among p-values (or genes), then we can't > expect such features. I guess histograms under dependent cases are more > curved than flat line even for the large p-values. > > Actually, I am looking for gene expression datasets which shows "very" > different histogram from the histograms of usual independent assumption > and I want to do multiple testing using such datasets. > > I also thought downloading some gene expression files from a large > database and then doing multiple testing but then I need to do some > preprocessing jobs on the downloaded files and they will take some time. > Instead I hoped to get "easy" dataset (already preprocessed like "golub" > dataset in multtest package) in bioconductor. If there is no other > convenient way to do it, then I may need to try NCBI GEO. Just sticking to the NCBI GEO idea (I have a not-so-hidden agend as the author of GEOquery), you can simply use the GDSs from GEO. They are already preprocessed and can be easily transformed into Bioconductor objects like exprSets and used for t-testing. It would take only a few lines of code to do what you are suggesting for as many GDSs as you like. So, before writing off all the data in GEO, you might look at the GEOquery vignette to see if it might serve your needs. Sean

ADD REPLY • link 18.9 years ago Sean Davis 21k

0

Entering edit mode

Michal Okoniewski ▴ 120

@michal-okoniewski-1752

Last seen 11.4 years ago

Hi Kim, I don't know if it is related to your research, but: In the paper http://www.biomedcentral.com/1471-2105/7/276 we described some features of unusually high correlation due to some specific qualities of Affymetrix arrays (multiple targeting). The paper includes in the Additional Data a table of probesets that are likely to be highly correlated and some hints on datasets use (two of them are from Array Express) groetjes, :) Michal -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Kim, K.I. Sent: 12 March 2007 12:34 To: Bioconductor at stat.math.ethz.ch Subject: [BioC] Looking for strongly correlated gene expression data Hi BioConductor Users, I am looking for gene expression data sets with very strong correlation features. (positive or negative) So, I hope I can't expect independent uniform distributions for true null p-values of those data sets. If anyone knows such data sets, please let me know. Thank you. Kyung In Kim. PhD student, Department of mathematics and computer science, Technical University of Eindhoven. _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}

ADD COMMENT • link 18.9 years ago Michal Okoniewski ▴ 120

Login before adding your answer.