Search
Question: Clustering of 30,000+ genes
2
gravatar for January Weiner
6.2 years ago by
European Union
January Weiner230 wrote:
Hello, I'm struggling with co-expression analysis, and for that I would like to try to cluster all the genes I have in my microarray set, including those which are not differentially expressed between the study groups. I am using CoXpress at the moment and will try my luck with GSCA as well, but both packages seem to have been layed out for 3000 rather than 30000 genes. How do you do that in R? I get errors about R not being able to allocate enough memory. Clearly, the amount of memory required to calculate all correlations the simple way might be a bit on the large side, but I can think of one or two tricks to get this done; I wonder whether it has been implemented already. Other than that -- how should I reasonably limit the number of genes to study? i don't want to bias the outcome of the analysis by selecting only genes that are DE, actually -- I would be very interested in genes that show differential co-expression, but no differences in expression. Kind regards, j. --
ADD COMMENTlink modified 6.2 years ago by Naomi Altman6.0k • written 6.2 years ago by January Weiner230
1
gravatar for Sean Davis
6.2 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
Hi, January. One common way of reducing the number of features is to choose the top X% by variance or coefficient of variation. A large percentage of genes are not even expressed in a given tissue type and another large percentage do not vary across a sample set. You can use the genefilter package to perform such filtering. Sean On Wed, Sep 7, 2011 at 5:29 PM, January Weiner <january.weiner at="" gmail.com=""> wrote: > Hello, > > I'm struggling with co-expression analysis, and for that I would like > to try to cluster all the genes I have in my microarray set, including > those which are not differentially expressed between the study groups. > I am using CoXpress at the moment and will try my luck with GSCA as > well, but both packages seem to have been layed out for 3000 rather > than 30000 genes. > > How do you do that in R? I get errors about R not being able to > allocate enough memory. Clearly, the amount of memory required to > calculate all correlations the simple way might be a bit on the large > side, but I can think of one or two tricks to get this done; I wonder > whether it has been implemented already. > > Other than that -- how should I reasonably limit the number of genes > to study? i don't want to bias the outcome of the analysis by > selecting only genes that are DE, actually -- I would be very > interested in genes that ?show differential co-expression, but no > differences in expression. > > Kind regards, > > j. > > -- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENTlink written 6.2 years ago by Sean Davis21k
1
That said, there are a number of differential coexpression papers out there noting that, among the remaining transcripts, calculating (shrunken or unshrunken) estimates of the covariance matrices can be... interesting. 'corpcor', 'glasso', 'huge', and 'WGCNA' may come in handy for the latter task, with WGCNA explicitly designed for finding differential coexpression. The authors of one such (throwaway -- no implementation released) paper note that they crammed 128GB of physical RAM into the machine used for the analyses in the paper, but it's quite possible the authors did not realize that filtering could have saved them a lot of time and memory. On Fri, Sep 9, 2011 at 3:08 AM, Sean Davis <sdavis2@mail.nih.gov> wrote: > Hi, January. > > One common way of reducing the number of features is to choose the top > X% by variance or coefficient of variation. A large percentage of > genes are not even expressed in a given tissue type and another large > percentage do not vary across a sample set. You can use the > genefilter package to perform such filtering. > > Sean > > On Wed, Sep 7, 2011 at 5:29 PM, January Weiner <january.weiner@gmail.com> > wrote: > > Hello, > > > > I'm struggling with co-expression analysis, and for that I would like > > to try to cluster all the genes I have in my microarray set, including > > those which are not differentially expressed between the study groups. > > I am using CoXpress at the moment and will try my luck with GSCA as > > well, but both packages seem to have been layed out for 3000 rather > > than 30000 genes. > > > > How do you do that in R? I get errors about R not being able to > > allocate enough memory. Clearly, the amount of memory required to > > calculate all correlations the simple way might be a bit on the large > > side, but I can think of one or two tricks to get this done; I wonder > > whether it has been implemented already. > > > > Other than that -- how should I reasonably limit the number of genes > > to study? i don't want to bias the outcome of the analysis by > > selecting only genes that are DE, actually -- I would be very > > interested in genes that show differential co-expression, but no > > differences in expression. > > > > Kind regards, > > > > j. > > > > -- > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]
ADD REPLYlink written 6.2 years ago by Tim Triche4.2k
0
gravatar for Naomi Altman
6.2 years ago by
Naomi Altman6.0k
Naomi Altman6.0k wrote:
Try WGCNA which is available from CRAN. --Naomi At 05:29 PM 9/7/2011, January Weiner wrote: >Hello, > >I'm struggling with co-expression analysis, and for that I would like >to try to cluster all the genes I have in my microarray set, including >those which are not differentially expressed between the study groups. >I am using CoXpress at the moment and will try my luck with GSCA as >well, but both packages seem to have been layed out for 3000 rather >than 30000 genes. > >How do you do that in R? I get errors about R not being able to >allocate enough memory. Clearly, the amount of memory required to >calculate all correlations the simple way might be a bit on the large >side, but I can think of one or two tricks to get this done; I wonder >whether it has been implemented already. > >Other than that -- how should I reasonably limit the number of genes >to study? i don't want to bias the outcome of the analysis by >selecting only genes that are DE, actually -- I would be very >interested in genes that show differential co-expression, but no >differences in expression. > >Kind regards, > >j. > >-- > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 6.2 years ago by Naomi Altman6.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 280 users visited in the last hour