Importing .cdt files generated by Cluster3.0 into R
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Dear All, I have used the Cluster3.0 to generate cluster for gene expression data. I would like to import these files (.cdt,.gtr) into R. to generate silohuette plots. Basically, I would like to check for the robustness of the clusters. Could the files from Cluster3.0 be imported into R? I would appreciate any other suggestions. Thanks. -- output of sessionInfo(): none -- Sent via the guest posting facility at bioconductor.org.
• 2.3k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.2 years ago
United States
Hello, For importing these files into R see ?read.table and ?read.delim. Reading the gtr should be fairly straightforward with read.table. The cdt file might be better with read.delim and set fill=TRUE. Valerie On 10/18/2011 07:58 AM, Sohail [guest] wrote: > Dear All, > > I have used the Cluster3.0 to generate cluster for gene expression data. I would like to import these files (.cdt,.gtr) into R. to generate silohuette plots. > Basically, I would like to check for the robustness of the clusters. Could the files from Cluster3.0 be imported into R? I would appreciate any other suggestions. > Thanks. > > -- output of sessionInfo(): > > none > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
readCDT <- function(filename) { fname <- sub('.cdt$', '', filename) # get rid of the extension atr <- read.table(paste(fname, 'atr', sep='.'), sep='\t', header=FALSE, as.is=TRUE) gtr <- read.table(paste(fname, 'gtr', sep='.'), sep='\t', header=FALSE, as.is=TRUE) cdt <- read.table(paste(fname, 'cdt', sep='.'), sep='\t', header=TRUE, row.names=NULL) # we only need the first column of the CDT file, which contains # the order information, and the third column, which contains the # labels. The rest of the file contains the data matrix. rown <- as.character(cdt[,"GID"]) coln <- colnames(cdt) firstRow <- 1 + which(rown=="EWEIGHT") firstCol <- 1 + which(coln=="GWEIGHT") gid <- as.character(cdt[,"GID"])[firstRow:nrow(cdt)] aid <- cdt[rown=="AID",][firstCol:ncol(cdt)] aid <- as.character(as.matrix(aid)) # names all start with 'GENE' or 'NODE' (or 'ARRY') and end with 'X' gene.order <- 1 + as.numeric(substring(gid, 5, nchar(gid)-1)) arry.order <- 1 + as.numeric(substring(aid, 5, nchar(aid)-1)) # Because Cluster reorders things and because hclust and plclust wants # to do the same, we have to reinvert the ordering during passage from # one to the other gene.labels <- as.character(cdt$NAME)[firstRow:nrow(cdt)][order(gene.order)] arry.labels <- coln[firstCol:ncol(cdt)][order(arry.order)] temp <- as.matrix(cdt[firstRow:nrow(cdt), firstCol:ncol(cdt)]) temp <- temp[order(gene.order), order(arry.order)] data <- matrix(as.numeric(temp), ncol=ncol(temp)) dimnames(data) <- list(gene.labels, arry.labels) # The gtr file contains the "distances" in column 4. Actually, # Eisen's Cluster program reports similarities instead of # distances. This fix assumes that some kind of correlation was # the meaure of similarity.... gene.height <- 1 - gtr$V4 arry.height <- 1 - atr$V4 # Columns 2 and 3 describe the two branches below each node. # Nodes are listed from bottom to top since clustering is # agglomerative. # foo <- function(alt) { # Again, we get the numeric part of the label base <- as.numeric(substring(alt, 5, nchar(alt)-1)) # We also need to know whether the label is a "GENE" or a "NODE". # The 'hclust' objects use negative integers to indicate nodes. type1 <- rep(1, length(base)) type1[substring(alt, 1, 4) %in% c('GENE', "ARRY")] <- -1 base <- base*type1 # make nodes negative adder <- (type1-1)/2 # offset the negatives to change from starting # at 1 to starting at 0. base + adder } gene.merge1 <- foo(gtr$V3) gene.merge2 <- foo(gtr$V2) arry.merge1 <- foo(atr$V3) arry.merge2 <- foo(atr$V2) # put everything together into a list and make it an hclust object gene <- list(merge=as.matrix(cbind(gene.merge1, gene.merge2)), height=gene.height, order=gene.order, labels=gene.labels, method='modified centroid', call=NULL, dist.method='Pearson correlation') class(gene) <- 'hclust' arry <- list(merge=as.matrix(cbind(arry.merge1, arry.merge2)), height=arry.height, order=arry.order, labels=arry.labels, method='modified centroid', call=NULL, dist.method='Pearson correlation') class(arry) <- 'hclust' list(gene=gene,arry=arry, data=data) } if(0) { library(ClassDiscovery) filename <- "eacdata2.cdt" cdt <- readCDT(filename) g <- cdt$gene plclust(g) d <- cdt$data image(d, col=rg) a <- cdt$arry classes3 <- cutree(a, k=3) colset <- c('red', 'orange', 'magenta') plotColoredClusters(a, lab=a$labels, col=colset[classes3]) }
ADD REPLY

Login before adding your answer.

Traffic: 944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6