clustering to find out similarity in promoters

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

I have a dataset in the following way : ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 I would like to cluster it column-wise to find out the similar promoters( columns represent the promoters and the rows - the genes). I used pv clust to do it earlier but would like to have a more detailed clustering ( maybe with hierarchial type ?) but am not sure how to do it. -- output of sessionInfo(): R version 2.15 LINUX OPERATING SYSYTEM -- Sent via the guest posting facility at bioconductor.org.

Clustering Clustering • 741 views

ADD COMMENT • link updated 13.2 years ago by Sean Davis 21k • written 13.2 years ago by Guest User ★ 13k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 17 days ago

United States

On Thu, Nov 1, 2012 at 6:17 AM, priya [guest] <guest@bioconductor.org>wrote: > > I have a dataset in the following way : > > ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 > 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 > 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 > 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 > 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 > 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 > 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 > > I would like to cluster it column-wise to find out the similar promoters( > columns represent the promoters and the rows - the genes). I used pv clust > to do it earlier but would like to have a more detailed clustering ( maybe > with hierarchial type ?) but am not sure how to do it. > > It sounds like you want to do unsupervised analysis of your data. There are MANY ways to do this including using MDS plots, PCA, NMF, various forms of hierarchical clustering, and many others. See the help for hclust if you just want to try hierarchical clustering. I would suggest using a heatmap, though, as a starting place. See the help for heatmap.2 in the gplots package, for example. Either way, you'll need to reduce the number of genes that go into the heatmap; a typical approach to doing so is to use a variance filter and then choose the top N most variable (informative) genes. Sean > -- output of sessionInfo(): > > R version 2.15 > LINUX OPERATING SYSYTEM > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.2 years ago Sean Davis 21k

Login before adding your answer.