changing gpr files?
1
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 5 months ago
United States
Hi everyone, I was hoping someone had some quick suggestions for me: a client has hundreds of gpr files of the same microarray from many experiments over many years. Which clone is in which spot hasn't changed, only what they call the clone in the ID and Name slot. I've got the final(?) naming scheme, and they want me to update all the .gpr files to have the same ID and Name columns. I don't have much time to figure this out on my own, so I was hoping one of you could help me! This is what I was thinking needed to be done, but I'm not sure how to implement it off the top of my head. 1. Read in / open a connection to a .gpr file 2. Replace the Name and ID column, keeping all the header info the same (could replace entire Block,Row,Column,Name,ID if it's easier) 3. Save with a new file name. THANKS!! Jenny Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu
Microarray Microarray • 1.0k views
ADD COMMENT
0
Entering edit mode
Furge, Kyle ▴ 210
@furge-kyle-501
Last seen 10.2 years ago
Yes. A similar strategy has worked for me in the past...attached is some old code that may be helpful. The ann.file referenced in the following code is a simple tab delimited file that holds the old identifier in the first column and the new identifier in the second. something like VAI.ID SEQ.ID Refseq.ID Entrez.Gene Symbol Band Description 300008 702770914 "XM_536923.1" 479795 "LOC479795" "" "similar to eukaryotic translation initiation factor 3, subunit 8, 110kDa" 300009 702770914 "XM_536923.1" 479795 "LOC479795" "" "similar to eukaryotic translation initiation factor 3, subunit 8, 110kDa" 300010 702770334 "XM_537565.1" 480447 "LOC480447" "" "similar to zinc finger, FYVE domain containing 21" 300011 702770334 "XM_537565.1" 480447 "LOC480447" "" "similar to zinc finger, FYVE domain containing 21" 300012 702771982 "XM_535904.1" 478737 "LOC478737" "" "similar to TRAF and TNF receptor-associated protein" 300013 702771982 "XM_535904.1" 478737 "LOC478737" "" "similar to TRAF and TNF receptor-associated protein" ------- readGPRHeader <- function (file) { con <- file(file, "r") if (substring(readLines(con, n = 1), 1, 3) != "ATF") stop("File is not in Axon Text File (ATF) format") nfields <- as.numeric(strsplit(readLines(con, n = 1), split = "\t")[[1]]) close(con) nfields[1] } ann.file <- "./VAI.ann.rel17.txt" if(length(ann.file) == 0) stop("cannot find annotation file") ann <- read.delim(ann.file,as.is=T) path <- "." files <- dir(path,pattern="_u.gpr") for(f in files) { cat("Reading ",f,"\n" ); skip <- readGPRHeader(f)+3 con <- file(f,"r") header <- readLines(con,n=skip) gpr <- read.delim(f,skip=skip,as.is=T,header=F,colClasses="character") close(con) cat("Updating ",f,"\n") vids <- gpr[,5] ix <- match(vids,ann$VAI.ID) acc <- ann$SEQ.ID[ix] refseq <- ann$Refseq.ID[ix] gene <- ann$Entrez.Gene[ix] new.name <- paste(ann$Description[ix],paste('[',acc,",",refseq,",",g ene,']',sep='')) gpr[,4] <- new.name new.filename <- file.path(path,gsub('\.gpr','\.rel17.gpr',f)) out <- file(new.filename,"at") write.table(header,file=out,row.names=F,col.names=F,quote=F) write.table(gpr,file=out,row.names=F,col.names=F,quote=c(4,5),sep="\ t") close(out) } On 4/8/09 2:00 PM, "Jenny Drnevich" <drnevich at="" illinois.edu=""> wrote: Hi everyone, I was hoping someone had some quick suggestions for me: a client has hundreds of gpr files of the same microarray from many experiments over many years. Which clone is in which spot hasn't changed, only what they call the clone in the ID and Name slot. I've got the final(?) naming scheme, and they want me to update all the .gpr files to have the same ID and Name columns. I don't have much time to figure this out on my own, so I was hoping one of you could help me! This is what I was thinking needed to be done, but I'm not sure how to implement it off the top of my head. 1. Read in / open a connection to a .gpr file 2. Replace the Name and ID column, keeping all the header info the same (could replace entire Block,Row,Column,Name,ID if it's easier) 3. Save with a new file name. THANKS!! Jenny Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor This email message, including any attachments, is for th...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6