Yes. A similar strategy has worked for me in the past...attached is
some old code that may be helpful.
The ann.file referenced in the following code is a simple tab
delimited file that holds the old identifier in the first column and
the new identifier in the second.
something like
VAI.ID SEQ.ID Refseq.ID Entrez.Gene Symbol Band
Description
300008 702770914 "XM_536923.1" 479795 "LOC479795" ""
"similar to eukaryotic translation initiation factor 3, subunit 8,
110kDa"
300009 702770914 "XM_536923.1" 479795 "LOC479795" ""
"similar to eukaryotic translation initiation factor 3, subunit 8,
110kDa"
300010 702770334 "XM_537565.1" 480447 "LOC480447" ""
"similar to zinc finger, FYVE domain containing 21"
300011 702770334 "XM_537565.1" 480447 "LOC480447" ""
"similar to zinc finger, FYVE domain containing 21"
300012 702771982 "XM_535904.1" 478737 "LOC478737" ""
"similar to TRAF and TNF receptor-associated protein"
300013 702771982 "XM_535904.1" 478737 "LOC478737" ""
"similar to TRAF and TNF receptor-associated protein"
-------
readGPRHeader <- function (file) {
con <- file(file, "r")
if (substring(readLines(con, n = 1), 1, 3) != "ATF")
stop("File is not in Axon Text File (ATF) format")
nfields <- as.numeric(strsplit(readLines(con, n = 1), split =
"\t")[[1]])
close(con)
nfields[1]
}
ann.file <- "./VAI.ann.rel17.txt"
if(length(ann.file) == 0)
stop("cannot find annotation file")
ann <- read.delim(ann.file,as.is=T)
path <- "."
files <- dir(path,pattern="_u.gpr")
for(f in files) {
cat("Reading ",f,"\n" );
skip <- readGPRHeader(f)+3
con <- file(f,"r")
header <- readLines(con,n=skip)
gpr <-
read.delim(f,skip=skip,as.is=T,header=F,colClasses="character")
close(con)
cat("Updating ",f,"\n")
vids <- gpr[,5]
ix <- match(vids,ann$VAI.ID)
acc <- ann$SEQ.ID[ix]
refseq <- ann$Refseq.ID[ix]
gene <- ann$Entrez.Gene[ix]
new.name <- paste(ann$Description[ix],paste('[',acc,",",refseq,",",g
ene,']',sep=''))
gpr[,4] <- new.name
new.filename <- file.path(path,gsub('\.gpr','\.rel17.gpr',f))
out <- file(new.filename,"at")
write.table(header,file=out,row.names=F,col.names=F,quote=F)
write.table(gpr,file=out,row.names=F,col.names=F,quote=c(4,5),sep="\
t")
close(out)
}
On 4/8/09 2:00 PM, "Jenny Drnevich" <drnevich at="" illinois.edu=""> wrote:
Hi everyone,
I was hoping someone had some quick suggestions for me: a client has
hundreds of gpr files of the same microarray from many experiments
over many years. Which clone is in which spot hasn't changed, only
what they call the clone in the ID and Name slot. I've got the
final(?) naming scheme, and they want me to update all the .gpr files
to have the same ID and Name columns. I don't have much time to
figure this out on my own, so I was hoping one of you could help me!
This is what I was thinking needed to be done, but I'm not sure how
to implement it off the top of my head.
1. Read in / open a connection to a .gpr file
2. Replace the Name and ID column, keeping all the header info the
same (could replace entire Block,Row,Column,Name,ID if it's easier)
3. Save with a new file name.
THANKS!!
Jenny
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
This email message, including any attachments, is for
th...{{dropped:6}}