changing gpr files?

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 13 months ago

United States

Hi everyone, I was hoping someone had some quick suggestions for me: a client has hundreds of gpr files of the same microarray from many experiments over many years. Which clone is in which spot hasn't changed, only what they call the clone in the ID and Name slot. I've got the final(?) naming scheme, and they want me to update all the .gpr files to have the same ID and Name columns. I don't have much time to figure this out on my own, so I was hoping one of you could help me! This is what I was thinking needed to be done, but I'm not sure how to implement it off the top of my head. 1. Read in / open a connection to a .gpr file 2. Replace the Name and ID column, keeping all the header info the same (could replace entire Block,Row,Column,Name,ID if it's easier) 3. Save with a new file name. THANKS!! Jenny Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

Microarray Microarray • 1.2k views

ADD COMMENT • link updated 16.8 years ago by Furge, Kyle ▴ 210 • written 16.8 years ago by Jenny Drnevich ★ 2.0k

0

Entering edit mode

Furge, Kyle ▴ 210

@furge-kyle-501

Last seen 11.4 years ago

Yes. A similar strategy has worked for me in the past...attached is some old code that may be helpful. The ann.file referenced in the following code is a simple tab delimited file that holds the old identifier in the first column and the new identifier in the second. something like VAI.ID SEQ.ID Refseq.ID Entrez.Gene Symbol Band Description 300008 702770914 "XM_536923.1" 479795 "LOC479795" "" "similar to eukaryotic translation initiation factor 3, subunit 8, 110kDa" 300009 702770914 "XM_536923.1" 479795 "LOC479795" "" "similar to eukaryotic translation initiation factor 3, subunit 8, 110kDa" 300010 702770334 "XM_537565.1" 480447 "LOC480447" "" "similar to zinc finger, FYVE domain containing 21" 300011 702770334 "XM_537565.1" 480447 "LOC480447" "" "similar to zinc finger, FYVE domain containing 21" 300012 702771982 "XM_535904.1" 478737 "LOC478737" "" "similar to TRAF and TNF receptor-associated protein" 300013 702771982 "XM_535904.1" 478737 "LOC478737" "" "similar to TRAF and TNF receptor-associated protein" ------- readGPRHeader <- function (file) { con <- file(file, "r") if (substring(readLines(con, n = 1), 1, 3) != "ATF") stop("File is not in Axon Text File (ATF) format") nfields <- as.numeric(strsplit(readLines(con, n = 1), split = "\t")[[1]]) close(con) nfields[1] } ann.file <- "./VAI.ann.rel17.txt" if(length(ann.file) == 0) stop("cannot find annotation file") ann <- read.delim(ann.file,as.is=T) path <- "." files <- dir(path,pattern="_u.gpr") for(f in files) { cat("Reading ",f,"\n" ); skip <- readGPRHeader(f)+3 con <- file(f,"r") header <- readLines(con,n=skip) gpr <- read.delim(f,skip=skip,as.is=T,header=F,colClasses="character") close(con) cat("Updating ",f,"\n") vids <- gpr[,5] ix <- match(vids,ann$VAI.ID) acc <- ann$SEQ.ID[ix] refseq <- ann$Refseq.ID[ix] gene <- ann$Entrez.Gene[ix] new.name <- paste(ann$Description[ix],paste('[',acc,",",refseq,",",g ene,']',sep='')) gpr[,4] <- new.name new.filename <- file.path(path,gsub('\.gpr','\.rel17.gpr',f)) out <- file(new.filename,"at") write.table(header,file=out,row.names=F,col.names=F,quote=F) write.table(gpr,file=out,row.names=F,col.names=F,quote=c(4,5),sep="\ t") close(out) } On 4/8/09 2:00 PM, "Jenny Drnevich" <drnevich at="" illinois.edu=""> wrote: Hi everyone, I was hoping someone had some quick suggestions for me: a client has hundreds of gpr files of the same microarray from many experiments over many years. Which clone is in which spot hasn't changed, only what they call the clone in the ID and Name slot. I've got the final(?) naming scheme, and they want me to update all the .gpr files to have the same ID and Name columns. I don't have much time to figure this out on my own, so I was hoping one of you could help me! This is what I was thinking needed to be done, but I'm not sure how to implement it off the top of my head. 1. Read in / open a connection to a .gpr file 2. Replace the Name and ID column, keeping all the header info the same (could replace entire Block,Row,Column,Name,ID if it's easier) 3. Save with a new file name. THANKS!! Jenny Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor This email message, including any attachments, is for th...{{dropped:6}}

ADD COMMENT • link 16.8 years ago Furge, Kyle ▴ 210

Login before adding your answer.