Pd info package affy 10K array

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 7 days ago

United States

Hi, FYI and related to this one, I've posted a 'Request for more consistent filenames for chip type files' to the "General" forum of the Affymetrix Developers Network, cf. http://www.affymetrix.com/community/forums/thread.jspa?threadID=6481. /Henrik On Mon, Jun 30, 2008 at 11:49 AM, Henrik Bengtsson <hb at="" stat.berkeley.edu=""> wrote: > Hi, > > I can confirm that the probe sequence file for Mapping10K_Xba142 > [http://www.affymetrix.com/Auth/analysis/downloads/data/Mapping10Kv2 _probe_tab.zip] > linked to at the 'Mapping 10K 2.0 Array - Support Materials' page > [http://www.affymetrix.com/support/technical/byproduct.affx?product= 10k-20] > does indeed look like it is for Mapping10K_Xba131, e.g. the available > X and Y positions are in [1,710] and [1,707] which is clearly outside > the dimension of the Mapping10K_Xba142 chip type 658x658. > > Did you post this in the Affymetrix Forum > > https://www.affymetrix.com/community/forums/index.jspa > > or directly to the support? Is there a thread where I can post a follow up? > > -Henrik > > > On Thu, Jun 26, 2008 at 2:25 PM, Michael Gormley > <michael.gormley at="" gmail.com=""> wrote: >> This is the same source where I obtained the files originally. I have >> brought this issue to the attention of affy technical support. Hoping they >> can get me the correct probe sequence file. >> >> On Thu, Jun 26, 2008 at 2:26 PM, James W. MacDonald <jmacdon at="" med.umich.edu=""> >> wrote: >>> >>> Interesting. >>> >>> To test the problems Michael was having, I simply went to Affy's product >>> support page and downloaded the library file, annotation file, and sequence >>> file. So it appears they have things mixed up on that page, and there isn't >>> anything obvious about the sequence file that would inform anybody it is >>> wrong: >>> >>> > dir(pattern = "^Mapping") >>> [1] "Mapping10K_probe_tab" "Mapping10K_Xba142.CDF" >>> [3] "Mapping10K_Xba142.na25.annot.csv" >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> Henrik Bengtsson wrote: >>>> >>>> Note that there are two different Affymetrix 10K chip types, namely >>>> Mapping10K_Xba131 (aka 'Mapping 10K Array') and Mapping10K_Xba142 (aka >>>> 'Mapping 10K Array 2.0'). The probe sequence file you refer to seems >>>> to be for the former, which is a larger chip. Details on the official >>>> Affymetrix CDFs (converted to binary though): >>>> >>>>> library(aroma.affymetrix) >>>>> cdf <- AffymetrixCdfFile$byChipType("Mapping10K_Xba142") >>>>> cdf >>>> >>>> AffymetrixCdfFile: >>>> Path: annotationData/chipTypes/Mapping10K_Xba142 >>>> Filename: Mapping10K_Xba142.cdf >>>> Filesize: 9.53MB >>>> Chip type: Mapping10K_Xba142 >>>> RAM: 0.00MB >>>> File format: v4 (binary; XDA) >>>> Dimension: 658x658 >>>> Number of cells: 432964 >>>> Number of units: 10208 >>>> Cells per unit: 42.41 >>>> Number of QC units: 9 >>>> >>>>> cdf <- AffymetrixCdfFile$byChipType("Mapping10K_Xba131") >>>>> cdf >>>> >>>> AffymetrixCdfFile: >>>> Path: annotationData/chipTypes/Mapping10K_Xba131 >>>> Filename: Mapping10K_Xba131.cdf >>>> Filesize: 10.79MB >>>> Chip type: Mapping10K_Xba131 >>>> RAM: 0.00MB >>>> File format: v4 (binary; XDA) >>>> Dimension: 712x712 >>>> Number of cells: 506944 >>>> Number of units: 11564 >>>> Cells per unit: 43.84 >>>> Number of QC units: 9 >>>> >>>> FYI: I try to collect information about various Affymetrix chip types at: >>>> >>>> >>>> http://groups.google.com/group/aroma-affymetrix/web /documentation-on-chip-types >>>> >>>> Final comment: I would like to emphasize the difference between 'chip >>>> type' and 'CDF'; a chip type refers to a unique product coming out of >>>> Affymetrix, whereas a CDF refers to an annotation of a chip type. >>>> There can be many different CDFs for each chip type, but only one chip >>>> type per CDF. >>>> >>>> Cheers >>>> >>>> Henrik >>>> >>>> On Thu, Jun 26, 2008 at 9:42 AM, James W. MacDonald >>>> <jmacdon at="" med.umich.edu=""> wrote: >>>>> >>>>> Hi Michael, >>>>> >>>>> Michael Gormley wrote: >>>>>> >>>>>> I get an error when running the makePdInfoPackage function to make a >>>>>> PdInfo >>>>>> package for the 10K mapping array. The output from the function reads: >>>>>> >>>>>>> makePdInfoPackage(pkg,destDir=".") >>>>>> >>>>>> Creating package in ./pd.mapping10k.xba142 >>>>>> loadUnitsByBatch took 22.86 sec >>>>>> loadAffyCsv took 2.79 sec >>>>>> Error in sqliteExecStatement(con, statement, bind.data) : >>>>>> RS-DBI driver: (RS_SQLite_exec: could not execute: PRIMARY KEY must be >>>>>> unique) >>>>>> In addition: Warning messages: >>>>>> 1: In is.na(v) : is.na() applied to non-(list or vector) of type 'NULL' >>>>>> 2: In is.na(v) : is.na() applied to non-(list or vector) of type 'NULL' >>>>>> 3: In is.na(v) : is.na() applied to non-(list or vector) of type 'NULL' >>>>>> Timing stopped at: 0.36 0.01 0.44 >>>>> >>>>> I have spent some time looking at this, and it appears that the problem >>>>> is >>>>> due to inconsistencies between the cdf and probe sequence files. As far >>>>> as I >>>>> can tell there are many probe locations ((x, y) coordinates) in the cdf >>>>> that >>>>> don't exist in the probe sequence file, and vice versa. >>>>> >>>>> The function loadAffySeqCsv() reads in a chunk of data from the probe >>>>> sequence file, then matches the indices (computed from the (x, y) >>>>> coordinates) of these data with the indices that were generated using >>>>> the >>>>> cdf data. In the first chunk of 1000 probesets, there are only 8223 >>>>> probesets that match between the two data sources. I don't think this >>>>> would >>>>> normally be a problem, except for the fact that 1000 probesets from the >>>>> sequence file should *exactly* line up with what we got from the cdf. >>>>> >>>>> But the real problem that arises is this: >>>>> >>>>> The computation of indices is based on the dimensions of the chip. If we >>>>> query the cdf to find what the dimensions are we get this: >>>>> >>>>> readCdfHeader(cdfFile) >>>>> $ncols >>>>> [1] 658 >>>>> >>>>> $nrows >>>>> [1] 658 >>>>> >>>>> So we compute the indices thus: >>>>> >>>>> index <- x + 1 + y * ncols >>>>> >>>>> This will give unique indices for all (x, y) coordinates on the chip, >>>>> assuming we agree that the dimensions of the chip are 658 x 658. >>>>> However, >>>>> the sequence file doesn't agree: >>>>> >>>>> pmdf[pmdf$fid == 9264,] >>>>> fset.name x y offset seq tstrand type >>>>> tallele >>>>> 7077 SNP_A-1507675 709 13 0 TGCCCTGAATGTTTCAGCACATCTA r PM >>>>> T >>>>> fid >>>>> 7077 9264 >>>>> >>>>> The above is one line from the first 1000 probesets. Note that the (x, >>>>> y) >>>>> coordinates are (709, 13)! When we calculate the index (fid) we get >>>>> 9264. >>>>> Unfortunately, if we use (51, 14) we also get 9264. Because the sequence >>>>> file isn't playing by the rules, we end up with a total of 25 duplicate >>>>> indices. Since the index values are the primary key for the table we are >>>>> trying to populate we get an error because you can't have duplicated >>>>> primary >>>>> keys. >>>>> >>>>> So long story short, the sequence file for this chip is broken - the >>>>> apparent maximum (x, y) coordinate is (710, 707) which is well beyond >>>>> what >>>>> the cdf claims. Or maybe the cdf is broken - I don't really know. The >>>>> end >>>>> result is that this will never work until Affy comes up with some >>>>> consistent >>>>> information for the chip. >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> >>>>>>> traceback() >>>>>> >>>>>> 12: .Call("RS_SQLite_exec", conId, statement, bind.data, PACKAGE = >>>>>> .SQLitePkgName) >>>>>> 11: sqliteExecStatement(con, statement, bind.data) >>>>>> 10: sqliteQuickSQL(conn, statement, bind.data, ...) >>>>>> 9: dbGetPreparedQuery(db, sql, bind.data = mmdf) >>>>>> 8: dbGetPreparedQuery(db, sql, bind.data = mmdf) >>>>>> 7: loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size = batch_size) >>>>>> 6: eval(expr, envir, enclos) >>>>>> 5: eval(expr, envir = loc.frame) >>>>>> 4: ST(loadAffySeqCsv(db, csvSeqFile, cdfFile, batch_size = batch_size)) >>>>>> 3: buildPdInfoDb(object at cdfFile, object at csvAnnoFile, object at csvSeqFile, >>>>>> dbFilePath, seqMatFile, batch_size = batch_size, verbose = !quiet) >>>>>> 2: makePdInfoPackage(pkg, destDir = ".") >>>>>> 1: makePdInfoPackage(pkg, destDir = ".") >>>>>> >>>>>> I noticed a prior post that suggested that this may be due to entering >>>>>> a >>>>>> record into a table with a Feature ID that is already in the table. Is >>>>>> this >>>>>> the case? Is there a work-around here? >>>>>> >>>>>> Thanks, >>>>>> Mike Gormley >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> Affymetrix and cDNA Microarray Core >>>>> University of Michigan Cancer Center >>>>> 1500 E. Medical Center Drive >>>>> 7410 CCGC >>>>> Ann Arbor MI 48109 >>>>> 734-647-5623 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Affymetrix and cDNA Microarray Core >>> University of Michigan Cancer Center >>> 1500 E. Medical Center Drive >>> 7410 CCGC >>> Ann Arbor MI 48109 >>> 734-647-5623 >> >> >

Microarray Annotation Network Cancer cdf probe affy Microarray Annotation Network Cancer • 741 views

ADD COMMENT • link 15.8 years ago Henrik Bengtsson ★ 2.4k

Login before adding your answer.