Question

pd.mapping 10Karray

0

Entering edit mode

Marianne Tuefferd ▴ 10

@marianne-tuefferd-2320

Last seen 11.4 years ago

Dear all, I would like to analyze 10K SNParray using oligo package. I've first tried using makePDpackage() function as detailed previously https://stat.ethz.ch/pipermail/bioconductor/2006-February/012078.html > library("makePlatformDesign") > cdfFile <- "Mapping10K_Xba142.CDF" > csvAnno <- "Mapping10K_Xba142.na23.annot.csv" > csvSeq <- "Mapping10K_probe_tab" > makePDpackage(designFile=cdfFile, file1=csvseq,file2=csvannot + ,manufacturer="affymetrix",type="SNP",textCDF = TRUE ) Error in makePDpackage(designFile = cdfFile, file1 = csvseq, file2 = csvannot, : unused argument(s) (textCDF = TRUE) Using pdInfoBuilder package I get the following error message. Does anyone has a clue? Thank you very much for your help, Marianne > library("pdInfoBuilder") > pkg <- new("AffySNPPDInfoPkgSeed", + version = "0.0", + email = "tuefferd at vjf.inserm.fr", + biocViews = "AnnotationData", + cdfFile = cdfFile, + csvAnnoFile = csvAnno, + csvSeqFile = csvSeq) > makePdInfoPackage(pkg, destDir = ".") Creating package in ./pd.mapping10k.xba142 Error in gsub(pattern, replacement, x, ignore.case, extended, fixed, useBytes) : invalid argument > traceback() 7: gsub(nm[i], symbolValues[[i]], res) 6: subsFileName(tmp[length(tmp)]) 5: cpSubs(src, dest) 4: copySubstitute(dir(originDir, full.names = TRUE), pkgdir, symbolValues, recursive = TRUE) 3: createPackage(pkgname = pkgName, destinationDir = destDir, originDir = templateDir, symbolValues = syms, quiet = quiet) 2: makePdInfoPackage(pkg, destDir = ".") 1: makePdInfoPackage(pkg, destDir = ".") > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "splines" "tools" "stats" "graphics" "grDevices" "utils" [7] "datasets" "methods" "base" other attached packages: makePlatformDesign "1.0.0" pdInfoBuilder oligo BufferedMatrixMethods "1.0.0" "1.0.2-3" "1.0.0" BufferedMatrix affyio affxparser "1.0.0" "1.4.0" "1.8.0" RSQLite DBI Biobase "0.5-6" "0.2-3" "1.14.0"

BiocViews cdf affyio biocViews oligo pdInfoBuilder BufferedMatrix BiocViews BiocViews cdf • 2.0k views

ADD COMMENT • link updated 18.5 years ago by Seth Falcon ★ 7.4k • written 18.5 years ago by Marianne Tuefferd ▴ 10

score 0 · Answer 1 · 2007-08-14

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 11.4 years ago

Hi Marianne, I'm not yet sure what is going on with pdInfoBuilder, but perhaps we can sort it out... "Marianne Tuefferd" <tuefferd at="" vjf.inserm.fr=""> writes: > > library("pdInfoBuilder") > > pkg <- new("AffySNPPDInfoPkgSeed", > + version = "0.0", > + email = "tuefferd at vjf.inserm.fr", > + biocViews = "AnnotationData", > + cdfFile = cdfFile, > + csvAnnoFile = csvAnno, > + csvSeqFile = csvSeq) > > makePdInfoPackage(pkg, destDir = ".") > Creating package in ./pd.mapping10k.xba142 > Error in gsub(pattern, replacement, x, ignore.case, extended, fixed, > useBytes) : > invalid argument > > traceback() > 7: gsub(nm[i], symbolValues[[i]], res) > 6: subsFileName(tmp[length(tmp)]) > 5: cpSubs(src, dest) > 4: copySubstitute(dir(originDir, full.names = TRUE), pkgdir, symbolValues, > recursive = TRUE) > 3: createPackage(pkgname = pkgName, destinationDir = destDir, originDir > = templateDir, > symbolValues = syms, quiet = quiet) > 2: makePdInfoPackage(pkg, destDir = ".") > 1: makePdInfoPackage(pkg, destDir = ".") This is useful output. Can you try two things? 1. Try setting options(error=recover) and then rerun the above example. When the error occurs you will be put into the debugger and can select a frame to enter (numbered like the stack trace above). Find the frame with the gsub call, and print out the value of nm[i], sumbolValues[[i]], res. Since the error is telling us that one of these is not somehow valid 2. I notice that your locale setting is not "C" and I wonder if rerunning the example after setting Sys.setlocale(locale="C") changes anything. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/

ADD COMMENT • link 18.5 years ago Seth Falcon ★ 7.4k

0

Entering edit mode

Hi Seth, Thanks for your response. I had the time to look a bit further into this. Seth Falcon wrote: > Hi Marianne, > > I'm not yet sure what is going on with pdInfoBuilder, but perhaps we > can sort it out... > > "Marianne Tuefferd" <tuefferd at="" vjf.inserm.fr=""> writes: >> > library("pdInfoBuilder") >> > pkg <- new("AffySNPPDInfoPkgSeed", >> + version = "0.0", >> + email = "tuefferd at vjf.inserm.fr", >> + biocViews = "AnnotationData", >> + cdfFile = cdfFile, >> + csvAnnoFile = csvAnno, >> + csvSeqFile = csvSeq) >> > makePdInfoPackage(pkg, destDir = ".") >> Creating package in ./pd.mapping10k.xba142 >> Error in gsub(pattern, replacement, x, ignore.case, extended, fixed, >> useBytes) : >> invalid argument >> > traceback() >> 7: gsub(nm[i], symbolValues[[i]], res) >> 6: subsFileName(tmp[length(tmp)]) >> 5: cpSubs(src, dest) >> 4: copySubstitute(dir(originDir, full.names = TRUE), pkgdir, symbolValues, >> recursive = TRUE) >> 3: createPackage(pkgname = pkgName, destinationDir = destDir, originDir >> = templateDir, >> symbolValues = syms, quiet = quiet) >> 2: makePdInfoPackage(pkg, destDir = ".") >> 1: makePdInfoPackage(pkg, destDir = ".") It appears the cause is that the author and genomebuild field are empty. It might be a good idea to check for this or enforce the presence of these fields. However, along the way, we discovered other issues. For example, in the loadAffyCsv function (loaders.R), there is a selection of columns based on column number that is not appropriate for the 10k files: This is the relevant snippet: wantedCols <- c(1,2,3,4,7,8,10,12,13,14,17) # added 10/14 df <- read.table(con, sep=",", stringsAsFactors=FALSE, nrows=10, na.strings="---", header=TRUE)[, wantedCols] To match the needed columns for 10k files, the numbers 5, 6 and 15 are needed as well. It might however be a better idea to just read in the header and match on a character vector with prespecified names to determine the wanted columns (before reading in the rest for real). Once this problem is solved, the function runs fine. There is however another error message in the loadAffySeqCsv file t <- ST(loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size=batch_size)) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be unique) Timing stopped at: 0.58 0.05 0.73 NA NA traceback() 9: .Call("RS_SQLite_exec", conId, statement, bind.data, PACKAGE = .SQLitePkgName) 8: sqliteExecStatement(con, statement, bind.data) 7: sqliteQuickSQL(conn, statement, bind.data, ...) 6: dbGetPreparedQuery(db, sql, bind.data = mmdf) 5: dbGetPreparedQuery(db, sql, bind.data = mmdf) 4: loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size = batch_size) 3: eval(expr, envir, enclos) 2: eval(expr, envir = loc.frame) 1: ST(loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size = batch_size)) I will try to track this down as well, but if anyone recognizes this kind of problem, I would be most grateful for a pointer. Kind regards, Tobias > > This is useful output. Can you try two things? > > 1. Try setting options(error=recover) and then rerun the above > example. When the error occurs you will be put into the debugger > and can select a frame to enter (numbered like the stack trace > above). Find the frame with the gsub call, and print out the value > of nm[i], sumbolValues[[i]], res. Since the error is telling us > that one of these is not somehow valid > > 2. I notice that your locale setting is not "C" and I wonder if > rerunning the example after setting Sys.setlocale(locale="C") > changes anything. > > + seth > -- Tobias Verbeke - Consultant Business & Decision Benelux Rue de la r?volution 8 1000 Brussels - BELGIUM +32 499 36 33 15 tobias.verbeke at businessdecision.com

ADD REPLY • link 18.5 years ago Tobias Verbeke ▴ 10

score 0 · Answer 2 · 2007-08-18

Hi Tobias, Tobias Verbeke <tobias.verbeke at="" telenet.be=""> writes: > It appears the cause is that the author and genomebuild > field are empty. It might be a good idea to check for > this or enforce the presence of these fields. I agree. The pdInfoBuilder code is a bit rough around the edges. We wanted to make a prototype available asap and the interface is not as friendly as it could be. > However, along the way, we discovered other issues. > For example, in the loadAffyCsv function (loaders.R), > there is a selection of columns based on column number > that is not appropriate for the 10k files: > > This is the relevant snippet: > > wantedCols <- c(1,2,3,4,7,8,10,12,13,14,17) > # added 10/14 > df <- read.table(con, sep=",", stringsAsFactors=FALSE, nrows=10, > na.strings="---", header=TRUE)[, wantedCols] > > To match the needed columns for 10k files, the numbers 5, 6 and 15 are > needed as well. It might however be a better idea to just read in > the header and match on a character vector with prespecified names > to determine the wanted columns (before reading in the rest for real). Yes, I'm not sure why we are using the column numbers instead of names. > Once this problem is solved, the function runs fine. There is however > another error message in the loadAffySeqCsv > file > > t <- ST(loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size=batch_size)) > > Error in sqliteExecStatement(con, statement, bind.data) : > RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be > unique) > Timing stopped at: 0.58 0.05 0.73 NA NA The error is telling you that you are trying to insert a record into the sequence table with a feature ID (fid) that is already in the table. Why that would be occuring, I'm not sure. There could be something different about how the 10k chips are organized, I suppose. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/