Question

function "uniqueframe"

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.5 years ago

Austria

Dear Naima, You are right, I must have missed this. Please replace "ds <- rbind(ds, tmp)" with: ds <- rbind(ds, tmp[setdiff(rownames(tmp),rownames(ds)),]) However, please note that it does not matter since lateron I intersect the rownames with the rownames of the expression data. Furthermore, please note that this was only a trial to compare the three arrays and there is no warranty that "script4bestmatch.R" correct. Other people might have better solutions to compare the three arrays based on the BestMatch.txt files of Affymetrix. Best regards Christian On 11/4/10 11:28 AM, Na?ma Oumouhou wrote: > Dear Christian, > > I read your vignette ? Introduction to the xps Package: Comparison to > Affymetrix Power Tools ? and I tried to compare 2 gene expression arrays > : U133 Plus 2 andHuman Gene ST 1. > > I followed your R instructions in the script ?script4bestmatch.R?. But I > noticed something strange in my output. > > I downloaded ?U133PlusVsHuGene_BestMatch.txt? in Affymetrix website. > > My instructions are : > > #Function "uniqueframe" > > uniqueframe <- function(ma) { > > maxunique <- function(id, m) { > > m <- m[which(m[,1] == id),]; > > m <- m[which(m[,2] == max(m[,2])),]; > > return(m[1,]); > > } > > dup <- duplicated(ma[,1]) > > uni <- unique(ma[dup,1]) > > ds <- NULL > > for (i in uni) {ds <- rbind(ds, maxunique(i,ma))} > > tmp <- ma[dup==F,] > > ds <- rbind(ds, tmp) > > ds <- ds[order(rownames(ds)),] > > return(ds) > > } > > # Importation of "U133PlusVsHuGene_BestMatch.txt" > > up2hg<-read.delim("D:/Naima/CancerMoelleOsseuse_EFS/Analyse_Package_ XPS/U133PlusVsHuGene_BestMatch.txt",row.names=3,comment.char="") > > dim(up2hg) > > [1] 2912919 > > up2hg<-up2hg[,5:6] > > up2hg_cor<-uniqueframe(up2hg) > > colnames(up2hg_cor)<-c("HuGene","PercentU2G") > > dim(up2hg_cor) > > [1] 252512 > > write.csv2(up2hg_cor,"D:/Naima/CancerMoelleOsseuse_EFS/Outputs/Probe sets_U133PlusVsHuGene.csv") > > The initial data frame ?up2hg? contains 29 129 lines and when I do the > instruction ?uniqueframe?, the data frame obtaining is composed of 25251 > lines. But the number of unique probesets for human Gene array is 17984. > > When I see the output (Probesets_U133PlusVsHuGene.csv), there is > something strange: > > For example: > > U1332P > > > > HuGene > > > > PercentU2G > > 1552257_a_at > > > > 8076569 > > > > 99,41 > > 1552257_a_at1 > > > > 8076569 > > > > 99,41 > > 1552264_a_at > > > > 8074791 > > > > 98,42 > > 1552264_a_at1 > > > > 8074791 > > > > 98,42 > > There is still duplicated probesets in HuGene probesets and new > probesets in U1332P are created ?1552257_a_at1?. > > I've done something wrong? > > Thank you for your help. > > Na?ma >

xps xps • 850 views

ADD COMMENT • link updated 14.4 years ago by Naïma Oumouhou ▴ 80 • written 14.4 years ago by cstrato ★ 3.9k

score 0 · Answer 1 · 2010-11-04

Dear Christian, I read your vignette « Introduction to the xps Package: Comparison to Affymetrix Power Tools » and I tried to compare 2 gene expression arrays : U133 Plus 2 andHuman Gene ST 1. I followed your R instructions in the script "script4bestmatch.R". But I noticed something strange in my output. I downloaded "U133PlusVsHuGene_BestMatch.txt" in Affymetrix website. My instructions are : #Function "uniqueframe" uniqueframe <- function(ma) { maxunique <- function(id, m) { m <- m[which(m[,1] == id),]; m <- m[which(m[,2] == max(m[,2])),]; return(m[1,]); } dup <- duplicated(ma[,1]) uni <- unique(ma[dup,1]) ds <- NULL for (i in uni) {ds <- rbind(ds, maxunique(i,ma))} tmp <- ma[dup==F,] ds <- rbind(ds, tmp) ds <- ds[order(rownames(ds)),] return(ds) } # Importation of "U133PlusVsHuGene_BestMatch.txt" up2hg<-read.delim("D:/Naima/CancerMoelleOsseuse_EFS/Analyse_Package_XP S/U133PlusVsHuGene_BestMatch.txt",row.names=3,comment.char="") dim(up2hg) [1] 2912919 up2hg<-up2hg[,5:6] up2hg_cor<-uniqueframe(up2hg) colnames(up2hg_cor)<-c("HuGene","PercentU2G") dim(up2hg_cor) [1] 252512 write.csv2(up2hg_cor,"D:/Naima/CancerMoelleOsseuse_EFS/Outputs/Probe sets_U133PlusVsHuGene.csv") The initial data frame "up2hg" contains 29 129 lines and when I do the instruction "uniqueframe", the data frame obtaining is composed of 25251 lines. But the number of unique probesets for human Gene array is 17984. When I see the output (Probesets_U133PlusVsHuGene.csv), there is something strange: For example: U1332P HuGene PercentU2G 1552257_a_at 8076569 99,41 1552257_a_at1 8076569 99,41 1552264_a_at 8074791 98,42 1552264_a_at1 8074791 98,42 There is still duplicated probesets in HuGene probesets and new probesets in U1332P are created "1552257_a_at1". I've done something wrong? Thank you for your help. Naïma [[alternative HTML version deleted]]