Annotation package generation for HTA-2_0 by pdInfoBuilder fails
1
0
Entering edit mode
@marton-papp-23822
Last seen 22 months ago

Dear all,

I currently started to work with microarray analysis and got a job to analyse HTA 2.0 arrays of tumor xenografts. We would like to rearrange the transcript cluster probe sets before the normalisation based on their similarity to the mouse genome to obtain probe sets that capture signal most likely from the tumor cells. As first step I was trying to create a custom annotation package based on the files that I've downloaded from the ThermoFisher/Affymetrix website (https://www.thermofisher.com/order/catalog/product/902233#/902233). When I tried to make the annotation package I've got several warnings and errors. The oligo package couldn't find the dbListFields while the pdInfoBuilder couldn't find the dbGetQuery method in the RSQLite package. After I've called the makePdInfoPackage method to construct the package it broke after a while with the error message: "Creating index idx_pmfid on pmfeature... Error: UNIQUE constraint failed: pmfeature.fid". Unfortunately I couldn't find any solution to the problem browsing previous issues and I don't know how to proceed to solve my problem or where to look for the cause of the error. I would be very grateful if any of you could help! I also included the output of the whole R session, hope that I gave enough description about my problem.

Thank you, Marton Papp

> # Loading the required packages 
> library(oligo)
Loading required package: BiocGenerics
Loading required package: parallel


Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: oligoClasses
Welcome to oligoClasses version 1.50.0
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

No methods found in package ‘RSQLite’ for request: ‘dbListFields’ when loading ‘oligo’
================================================================================
Welcome to oligo version 1.52.0
================================================================================
> library(Biobase)
> library(pdInfoBuilder)
Loading required package: RSQLite
Loading required package: affxparser
No methods found in package ‘RSQLite’ for request: ‘dbGetQuery’ when loading ‘pdInfoBuilder’
> 
> # Set the working directory
> setwd('...')
> getwd()
[1] "..."
> 
> # Path to required files
> path_to_files = '...'
> annot_probeset_name = 'HTA-2_0.na36.hg19.probeset.csv'
> annotat_tc_name = 'HTA-2_0.r3.na36.hg19.a1.transcript.csv'
> clf = list.files(path_to_files, pattern = ".clf", full.names = TRUE)
> pgf = list.files(path_to_files, pattern = ".pgf", full.names = TRUE)
> mps = list.files(path_to_files, pattern = ".mps", full.names = TRUE)
> 
> # Initialising the class and building the package
> my_hta_2.0 = new('AffyExonPDInfoPkgSeed', probeFile = annot_probeset_name, 
+                  transFile = annotat_tc_name, coreMps = mps, pgfFile = pgf, 
+                  clfFile = clf, extendedMps = mps, fullMps = mps, chipName = 'HTA-2_0')
> makePdInfoPackage(my_hta_2.0, destDir = ".")
================================================================================
Building annotation package for Affymetrix Exon ST Array
PGF.........: HTA-2_0.r3.pgf
CLF.........: HTA-2_0.r3.clf
Probeset....: HTA-2_0.na36.hg19.probeset.csv
Transcript..: HTA-2_0.r3.na36.hg19.a1.transcript.csv
Core MPS....: HTA-2_0.r3.Psrs.mps
Full MPS....: HTA-2_0.r3.Psrs.mps
Extended MPS: HTA-2_0.r3.Psrs.mps
================================================================================
Parsing file: HTA-2_0.r3.pgf... 
OK
Parsing file: HTA-2_0.r3.clf... OK
Creating initial table for probes... OK
Creating dictionaries... OK
Parsing file: HTA-2_0.na36.hg19.probeset.csv... OK
Parsing file: HTA-2_0.r3.Psrs.mps... OK
Parsing file: HTA-2_0.r3.Psrs.mps... OK
Parsing file: HTA-2_0.r3.Psrs.mps... OK
Creating package in ./pd.hta.2.0 
Inserting 850 rows into table chrom_dict... OK
Inserting 5 rows into table level_dict... OK
Inserting 9 rows into table type_dict... OK
Inserting 577432 rows into table core_mps... OK
Inserting 577432 rows into table full_mps... OK
Inserting 577432 rows into table extended_mps... OK
Inserting 1836622 rows into table featureSet... OK
Inserting 7576209 rows into table pmfeature... OK
Inserting 1121 rows into table mmfeature... OK
Counting rows in chrom_dict
Counting rows in core_mps
Counting rows in extended_mps
Counting rows in featureSet
Counting rows in full_mps
Counting rows in level_dict
Counting rows in mmfeature
Counting rows in pmfeature
Counting rows in type_dict
Creating index idx_pmfsetid on pmfeature... OK
Creating index idx_pmfid on pmfeature... Error: UNIQUE constraint failed: pmfeature.fid
In addition: There were 12 warnings (use warnings() to see them)
> 
> warnings()
Warning messages:
1: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
2: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
3: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
4: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
5: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
6: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
7: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
8: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
9: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
10: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
11: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
12: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
> # Session Info
> sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /.../tools/R-4.0.2/lib/libRblas.so
LAPACK: /.../tools/R-4.0.2/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] pdInfoBuilder_1.52.0 affxparser_1.60.0    RSQLite_2.2.0       
 [4] oligo_1.52.0         Biostrings_2.56.0    XVector_0.28.0      
 [7] IRanges_2.22.2       S4Vectors_0.26.1     Biobase_2.48.0      
[10] oligoClasses_1.50.0  BiocGenerics_0.34.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                  compiler_4.0.2             
 [3] BiocManager_1.30.10         GenomeInfoDb_1.24.2        
 [5] bitops_1.0-6                iterators_1.0.12           
 [7] tools_4.0.2                 zlibbioc_1.34.0            
 [9] digest_0.6.25               bit_1.1-15.2               
[11] memoise_1.1.0               preprocessCore_1.50.0      
[13] lattice_0.20-41             ff_2.2-14.2                
[15] pkgconfig_2.0.3             rlang_0.4.6                
[17] Matrix_1.2-18               foreach_1.5.0              
[19] DelayedArray_0.14.0         DBI_1.1.0                  
[21] GenomeInfoDbData_1.2.3      vctrs_0.3.1                
[23] bit64_0.9-7                 grid_4.0.2                 
[25] blob_1.2.1                  codetools_0.2-16           
[27] matrixStats_0.56.0          GenomicRanges_1.40.0       
[29] splines_4.0.2               SummarizedExperiment_1.18.1
[31] RCurl_1.98-1.2              crayon_1.3.4               
[33] affyio_1.58.0              
> 
> 

microarray pdInfoBuilder oligo • 430 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

The package seed for an HTA array is AffyHTAPDInfoPkgSeed, not AffyExonPDInfoPkgSeed. And the warnings you see aren't saying that a function couldn't be found, but instead that a different function should be used. Which while true (and should get fixed before the deprecated versions are made defunct) doesn't affect anything.

ADD COMMENT
0
Entering edit mode

Dear James,

Thank you for your response. I've tried this method and everything worked perfectly. Thank you very much for your help, even though my problem was only a mistaken seed. Next time I'm gone be more cautious to use the correct one.

Thank you again, Marton

ADD REPLY
0
Entering edit mode

Dear James,

Thank you for your response. I've tried this method and everything worked perfectly. Thank you very much for your help, even though my problem was only a mistaken seed. Next time I'm gone be more cautious to use the correct one.

Thank you again, Marton

ADD REPLY

Login before adding your answer.

Traffic: 260 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6