colnames and get means for the columns with the "same" names
1
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
Hi, Weiwei. You probably want to look at a combination of merge() to combine your data with your conversion table followed by aggregate(). Read up on the help for those two functions and that should do it, if I understand what you want to do. However, keep in mind that "averaging" the probesets representing the same gene may not represent the best solution. Also, if you search the archive a bit, I know this question has come up before. Sean -----Original Message----- From: Weiwei Shi [mailto:helprhelp@gmail.com] Sent: Mon 11/6/2006 4:53 PM To: r-help Cc: bioconductor Subject: [BioC] colnames and get means for the columns with the "same" names hi, I have a conversion table for colnames like this: Probe_ID HUMAN_LLID 1 AF106325_PROBE1 7052 2 NM_019386_PROBE1 7052 3 NM_012907_PROBE1 339 4 AW917796_PROBE1 84196 5 L27651_PROBE1 10864 The Probe_ID contains a list of colnames for another data.frame, say x1. I need to convert such colnames to another ID's system, HUMAN_LLID by using the table. The colnames of x1 with the same names (in HUMAN_LLID) need to be averaged. Is there a good way to do it? I also put this question in bioconductor since I believe it might be solved by some package. thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
convert convert • 631 views
ADD COMMENT
0
Entering edit mode
Weiwei Shi ★ 1.2k
@weiwei-shi-1407
Last seen 9.7 years ago
hi, I played around with these two functions but did not get what i want. So I wrote a function by using a loop to do it and it is done in a reasonable time: > system.time(t3 <- iconix.convert(processed, 9, 7486, probes2llid.genego[,c(2,5)])) [1] 12.356 4.494 16.836 0.000 0.000 > dim(t3) [1] 129 4255 I am more interested in the approach instead of "averaging". I will look into the archive since it is a very common problem Microarray analysis has. I post my function here in case someone needs it in the future. iconix.convert <- function(orig, st=9, ed=7486, c.table){ t1 <- orig[, st:ed] # treat missing t1 <- sapply(t1, function(x){ x[is.na(x)]<-0; x}) x0 <- unique(c.table[,2]) out <- matrix(0, dim(t1)[1], length(x0)) j = 1 for (i in x0){ avg.col <- c.table[c.table[,2]==i, 1] if (length(avg.col) > 1){ # has 1:multiple ids t2 <- apply(t1[, avg.col], 1, mean) } else{ t2 <- t1[, avg.col] } out[,j] <- t2 j <- j + 1 } out <- as.data.frame(out) colnames(out) <- x0 out2 <- cbind(orig[, c(1:(st-1))], out, orig[,c((ed+1):dim(orig)[2])]) colnames(out2)[dim(out2)[2]] <- "Group" out2 } On 11/6/06, Davis, Sean (NIH/NCI) [E] <sdavis2 at="" mail.nih.gov=""> wrote: > Hi, Weiwei. > > You probably want to look at a combination of merge() to combine your data with your conversion table followed by aggregate(). Read up on the help for those two functions and that should do it, if I understand what you want to do. However, keep in mind that "averaging" the probesets representing the same gene may not represent the best solution. Also, if you search the archive a bit, I know this question has come up before. > > Sean > > > > -----Original Message----- > From: Weiwei Shi [mailto:helprhelp at gmail.com] > Sent: Mon 11/6/2006 4:53 PM > To: r-help > Cc: bioconductor > Subject: [BioC] colnames and get means for the columns with the "same" names > > hi, > I have a conversion table for colnames like this: > Probe_ID HUMAN_LLID > 1 AF106325_PROBE1 7052 > 2 NM_019386_PROBE1 7052 > 3 NM_012907_PROBE1 339 > 4 AW917796_PROBE1 84196 > 5 L27651_PROBE1 10864 > > The Probe_ID contains a list of colnames for another data.frame, say x1. > I need to convert such colnames to another ID's system, HUMAN_LLID by > using the table. The colnames of x1 with the same names (in > HUMAN_LLID) need to be averaged. Is there a good way to do it? > > I also put this question in bioconductor since I believe it might be > solved by some package. > > thanks. > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
ADD COMMENT

Login before adding your answer.

Traffic: 580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6