Looking for strongly correlated gene expression data
2
0
Entering edit mode
Kim, K.I. ▴ 20
@kim-ki-2072
Last seen 10.3 years ago
Hi BioConductor Users, I am looking for gene expression data sets with very strong correlation features. (positive or negative) So, I hope I can't expect independent uniform distributions for true null p-values of those data sets. If anyone knows such data sets, please let me know. Thank you. Kyung In Kim. PhD student, Department of mathematics and computer science, Technical University of Eindhoven.
• 1.1k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Monday 12 March 2007 08:33, Kim, K.I. wrote: > Hi BioConductor Users, > > I am looking for gene expression data sets with very strong correlation > features. (positive or negative) So, I hope I can't expect independent > uniform distributions for true null p-values of those data sets. > > If anyone knows such data sets, please let me know? Kyung, Could you simply test this in a bunch of datasets? In particular, could you download many (or all) of the datasets from NCBI GEO and test your hypothesis that such datasets exist and in what proportion? I may be misunderstanding what you want to do, though. Sean
ADD COMMENT
0
Entering edit mode
I'd like to explain more. Simply I am considering multiple testings using gene expression data. In the usual two group multiple testing set-up, if we assume true null p-values are distributed independently and for example, 90% of p-values are truly null, then we can see around 90% of p-values are uniformly distributed. (for example, "golub" dataset in R multtest package) But if there exist strong correlations among p-values (or genes), then we can't expect such features. I guess histograms under dependent cases are more curved than flat line even for the large p-values. Actually, I am looking for gene expression datasets which shows "very" different histogram from the histograms of usual independent assumption and I want to do multiple testing using such datasets. I also thought downloading some gene expression files from a large database and then doing multiple testing but then I need to do some preprocessing jobs on the downloaded files and they will take some time. Instead I hoped to get "easy" dataset (already preprocessed like "golub" dataset in multtest package) in bioconductor. If there is no other convenient way to do it, then I may need to try NCBI GEO. Thank you for your advice. Kyung In. -----Original Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: Monday, March 12, 2007 2:05 PM To: bioconductor at stat.math.ethz.ch Cc: Kim, K.I. Subject: Re: [BioC] Looking for strongly correlated gene expression data On Monday 12 March 2007 08:33, Kim, K.I. wrote: > Hi BioConductor Users, > > I am looking for gene expression data sets with very strong correlation > features. (positive or negative) So, I hope I can't expect independent > uniform distributions for true null p-values of those data sets. > > If anyone knows such data sets, please let me know? Kyung, Could you simply test this in a bunch of datasets? In particular, could you download many (or all) of the datasets from NCBI GEO and test your hypothesis that such datasets exist and in what proportion? I may be misunderstanding what you want to do, though. Sean
ADD REPLY
0
Entering edit mode
On Tuesday 13 March 2007 06:01, Kim, K.I. wrote: > I'd like to explain more. Simply I am considering multiple testings > using gene expression data. > In the usual two group multiple testing set-up, if we assume true null > p-values are distributed independently and for example, 90% of p-values > are truly null, then we can see around 90% of p-values are uniformly > distributed. (for example, "golub" dataset in R multtest package) But if > there exist strong correlations among p-values (or genes), then we can't > expect such features. I guess histograms under dependent cases are more > curved than flat line even for the large p-values. > > Actually, I am looking for gene expression datasets which shows "very" > different histogram from the histograms of usual independent assumption > and I want to do multiple testing using such datasets. > > I also thought downloading some gene expression files from a large > database and then doing multiple testing but then I need to do some > preprocessing jobs on the downloaded files and they will take some time. > Instead I hoped to get "easy" dataset (already preprocessed like "golub" > dataset in multtest package) in bioconductor. If there is no other > convenient way to do it, then I may need to try NCBI GEO. Just sticking to the NCBI GEO idea (I have a not-so-hidden agend as the author of GEOquery), you can simply use the GDSs from GEO. They are already preprocessed and can be easily transformed into Bioconductor objects like exprSets and used for t-testing. It would take only a few lines of code to do what you are suggesting for as many GDSs as you like. So, before writing off all the data in GEO, you might look at the GEOquery vignette to see if it might serve your needs. Sean
ADD REPLY
0
Entering edit mode
@michal-okoniewski-1752
Last seen 10.3 years ago
Hi Kim, I don't know if it is related to your research, but: In the paper http://www.biomedcentral.com/1471-2105/7/276 we described some features of unusually high correlation due to some specific qualities of Affymetrix arrays (multiple targeting). The paper includes in the Additional Data a table of probesets that are likely to be highly correlated and some hints on datasets use (two of them are from Array Express) groetjes, :) Michal -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Kim, K.I. Sent: 12 March 2007 12:34 To: Bioconductor at stat.math.ethz.ch Subject: [BioC] Looking for strongly correlated gene expression data Hi BioConductor Users, I am looking for gene expression data sets with very strong correlation features. (positive or negative) So, I hope I can't expect independent uniform distributions for true null p-values of those data sets. If anyone knows such data sets, please let me know. Thank you. Kyung In Kim. PhD student, Department of mathematics and computer science, Technical University of Eindhoven. _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}
ADD COMMENT

Login before adding your answer.

Traffic: 736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6