clustering to find out similarity in promoters
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 11.2 years ago
I have a dataset in the following way : ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 I would like to cluster it column-wise to find out the similar promoters( columns represent the promoters and the rows - the genes). I used pv clust to do it earlier but would like to have a more detailed clustering ( maybe with hierarchial type ?) but am not sure how to do it. -- output of sessionInfo(): R version 2.15 LINUX OPERATING SYSYTEM -- Sent via the guest posting facility at bioconductor.org.
Clustering Clustering • 728 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 9 months ago
United States
On Thu, Nov 1, 2012 at 6:17 AM, priya [guest] <guest@bioconductor.org>wrote: > > I have a dataset in the following way : > > ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 > 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 > 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 > 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 > 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 > 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 > 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 > > I would like to cluster it column-wise to find out the similar promoters( > columns represent the promoters and the rows - the genes). I used pv clust > to do it earlier but would like to have a more detailed clustering ( maybe > with hierarchial type ?) but am not sure how to do it. > > It sounds like you want to do unsupervised analysis of your data. There are MANY ways to do this including using MDS plots, PCA, NMF, various forms of hierarchical clustering, and many others. See the help for hclust if you just want to try hierarchical clustering. I would suggest using a heatmap, though, as a starting place. See the help for heatmap.2 in the gplots package, for example. Either way, you'll need to reduce the number of genes that go into the heatmap; a typical approach to doing so is to use a variance filter and then choose the top N most variable (informative) genes. Sean > -- output of sessionInfo(): > > R version 2.15 > LINUX OPERATING SYSYTEM > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 1374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6