clustering in R
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
I have a RMA normalized genes expression datset with 22810 rows and 9 columns( types of promoters) and a subset of the data is as follows: ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 -- output of sessionInfo(): I want to do a clustering of the above and tried the hierarchical clustering: d <- dist(as.matrix(deg), method = "euclidean") where deg is the a matrix of the differentially expressed genes ( 4300 in number ).And I get the following warning: Warning message: In dist(as.matrix(deg), method = "euclidean") : NAs introduced by coercion Is it allright to proceed with the clustering inspite of the warning ? hc <- hclust(d) plot(hc, hang = -0.01, cex = 0.7) I get a dendrogram which is very dense and the labels are not clear: Also I do not know which of the 9 promoters are classified in the tree for the several genes: How would it be possible to label the tree with the promoters and also how to visualize the genes into a clearer dendrogram? There are around 4300 genes and would like to get a better dendrogram so that I could visualize it better. -- Sent via the guest posting facility at bioconductor.org.
Clustering Clustering • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Priya, On 10/23/2012 3:34 AM, priya [guest] wrote: > I have a RMA normalized genes expression datset with 22810 rows and 9 columns( types of promoters) and a subset of the data is as follows: > > ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 > 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 > 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 > 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 > 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 > 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 > 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 > > > > > > -- output of sessionInfo(): > > I want to do a clustering of the above and tried the hierarchical clustering: > > d<- dist(as.matrix(deg), method = "euclidean") > where deg is the a matrix of the differentially expressed genes ( 4300 in number ).And I get the following warning: > > Warning message: > In dist(as.matrix(deg), method = "euclidean") : NAs introduced by coercion > > Is it allright to proceed with the clustering inspite of the warning ? Well, you shouldn't get that warning if your matrix is all numeric. And if your matrix isn't all numeric, it will usually all be coerced to character, so I would want to check that out and see what is happening. > > > hc<- hclust(d) > plot(hc, hang = -0.01, cex = 0.7) > > I get a dendrogram which is very dense and the labels are not clear: Also I do not know which of the 9 promoters are classified in the tree for the several genes: How would it be possible to label the tree with the promoters and also how to visualize the genes into a clearer dendrogram? There are around 4300 genes and would like to get a better dendrogram so that I could visualize it better. That is a lot of genes, so you will have to make the dendrogram really big if you actually want to see things. The best thing to do IMO is to put it in a pdf of the correct size, and then you can zoom in and look at different regions. It would probably be easiest to make the pdf really wide, so something like pdf("dendrogram.pdf", width = 200, height = 8) plot(hc, hang = -0.01, cex = 0.7) dev.off() As for the promoters being classified by the tree, I am not sure what you are asking. If it is simply a labeling issue, note that your 'hc' object is a list with a 'labels' member that contains whatever is going to be used in labeling the dendrogram. If you want to change what the labels are, then you can modify that. Best, Jim > > > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6