function "uniqueframe"
1
0
Entering edit mode
cstrato ★ 3.9k
@cstrato-908
Last seen 5.6 years ago
Austria
Dear Naima, You are right, I must have missed this. Please replace "ds <- rbind(ds, tmp)" with: ds <- rbind(ds, tmp[setdiff(rownames(tmp),rownames(ds)),]) However, please note that it does not matter since lateron I intersect the rownames with the rownames of the expression data. Furthermore, please note that this was only a trial to compare the three arrays and there is no warranty that "script4bestmatch.R" correct. Other people might have better solutions to compare the three arrays based on the BestMatch.txt files of Affymetrix. Best regards Christian On 11/4/10 11:28 AM, Na?ma Oumouhou wrote: > Dear Christian, > > I read your vignette ? Introduction to the xps Package: Comparison to > Affymetrix Power Tools ? and I tried to compare 2 gene expression arrays > : U133 Plus 2 andHuman Gene ST 1. > > I followed your R instructions in the script ?script4bestmatch.R?. But I > noticed something strange in my output. > > I downloaded ?U133PlusVsHuGene_BestMatch.txt? in Affymetrix website. > > My instructions are : > > #Function "uniqueframe" > > uniqueframe <- function(ma) { > > maxunique <- function(id, m) { > > m <- m[which(m[,1] == id),]; > > m <- m[which(m[,2] == max(m[,2])),]; > > return(m[1,]); > > } > > dup <- duplicated(ma[,1]) > > uni <- unique(ma[dup,1]) > > ds <- NULL > > for (i in uni) {ds <- rbind(ds, maxunique(i,ma))} > > tmp <- ma[dup==F,] > > ds <- rbind(ds, tmp) > > ds <- ds[order(rownames(ds)),] > > return(ds) > > } > > # Importation of "U133PlusVsHuGene_BestMatch.txt" > > up2hg<-read.delim("D:/Naima/CancerMoelleOsseuse_EFS/Analyse_Package_ XPS/U133PlusVsHuGene_BestMatch.txt",row.names=3,comment.char="") > > dim(up2hg) > > [1] 2912919 > > up2hg<-up2hg[,5:6] > > up2hg_cor<-uniqueframe(up2hg) > > colnames(up2hg_cor)<-c("HuGene","PercentU2G") > > dim(up2hg_cor) > > [1] 252512 > > write.csv2(up2hg_cor,"D:/Naima/CancerMoelleOsseuse_EFS/Outputs/Probe sets_U133PlusVsHuGene.csv") > > The initial data frame ?up2hg? contains 29 129 lines and when I do the > instruction ?uniqueframe?, the data frame obtaining is composed of 25251 > lines. But the number of unique probesets for human Gene array is 17984. > > When I see the output (Probesets_U133PlusVsHuGene.csv), there is > something strange: > > For example: > > U1332P > > > > HuGene > > > > PercentU2G > > 1552257_a_at > > > > 8076569 > > > > 99,41 > > 1552257_a_at1 > > > > 8076569 > > > > 99,41 > > 1552264_a_at > > > > 8074791 > > > > 98,42 > > 1552264_a_at1 > > > > 8074791 > > > > 98,42 > > There is still duplicated probesets in HuGene probesets and new > probesets in U1332P are created ?1552257_a_at1?. > > I've done something wrong? > > Thank you for your help. > > Na?ma >
xps xps • 740 views
ADD COMMENT
0
Entering edit mode
@naima-oumouhou-4270
Last seen 9.7 years ago
Dear Christian, I read your vignette « Introduction to the xps Package: Comparison to Affymetrix Power Tools » and I tried to compare 2 gene expression arrays : U133 Plus 2 andHuman Gene ST 1. I followed your R instructions in the script "script4bestmatch.R". But I noticed something strange in my output. I downloaded "U133PlusVsHuGene_BestMatch.txt" in Affymetrix website. My instructions are : #Function "uniqueframe" uniqueframe <- function(ma) { maxunique <- function(id, m) { m <- m[which(m[,1] == id),]; m <- m[which(m[,2] == max(m[,2])),]; return(m[1,]); } dup <- duplicated(ma[,1]) uni <- unique(ma[dup,1]) ds <- NULL for (i in uni) {ds <- rbind(ds, maxunique(i,ma))} tmp <- ma[dup==F,] ds <- rbind(ds, tmp) ds <- ds[order(rownames(ds)),] return(ds) } # Importation of "U133PlusVsHuGene_BestMatch.txt" up2hg<-read.delim("D:/Naima/CancerMoelleOsseuse_EFS/Analyse_Package_XP S/U133PlusVsHuGene_BestMatch.txt",row.names=3,comment.char="") dim(up2hg) [1] 2912919 up2hg<-up2hg[,5:6] up2hg_cor<-uniqueframe(up2hg) colnames(up2hg_cor)<-c("HuGene","PercentU2G") dim(up2hg_cor) [1] 252512 write.csv2(up2hg_cor,"D:/Naima/CancerMoelleOsseuse_EFS/Outputs/Probe sets_U133PlusVsHuGene.csv") The initial data frame "up2hg" contains 29 129 lines and when I do the instruction "uniqueframe", the data frame obtaining is composed of 25251 lines. But the number of unique probesets for human Gene array is 17984. When I see the output (Probesets_U133PlusVsHuGene.csv), there is something strange: For example: U1332P HuGene PercentU2G 1552257_a_at 8076569 99,41 1552257_a_at1 8076569 99,41 1552264_a_at 8074791 98,42 1552264_a_at1 8074791 98,42 There is still duplicated probesets in HuGene probesets and new probesets in U1332P are created "1552257_a_at1". I've done something wrong? Thank you for your help. Naïma [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6