Dear all,
I currently started to work with microarray analysis and got a job to analyse HTA 2.0 arrays of tumor xenografts. We would like to rearrange the transcript cluster probe sets before the normalisation based on their similarity to the mouse genome to obtain probe sets that capture signal most likely from the tumor cells. As first step I was trying to create a custom annotation package based on the files that I've downloaded from the ThermoFisher/Affymetrix website (https://www.thermofisher.com/order/catalog/product/902233#/902233). When I tried to make the annotation package I've got several warnings and errors. The oligo package couldn't find the dbListFields while the pdInfoBuilder couldn't find the dbGetQuery method in the RSQLite package. After I've called the makePdInfoPackage method to construct the package it broke after a while with the error message: "Creating index idx_pmfid on pmfeature... Error: UNIQUE constraint failed: pmfeature.fid". Unfortunately I couldn't find any solution to the problem browsing previous issues and I don't know how to proceed to solve my problem or where to look for the cause of the error. I would be very grateful if any of you could help! I also included the output of the whole R session, hope that I gave enough description about my problem.
Thank you, Marton Papp
> # Loading the required packages
> library(oligo)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: oligoClasses
Welcome to oligoClasses version 1.50.0
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: ‘Biostrings’
The following object is masked from ‘package:base’:
strsplit
No methods found in package ‘RSQLite’ for request: ‘dbListFields’ when loading ‘oligo’
================================================================================
Welcome to oligo version 1.52.0
================================================================================
> library(Biobase)
> library(pdInfoBuilder)
Loading required package: RSQLite
Loading required package: affxparser
No methods found in package ‘RSQLite’ for request: ‘dbGetQuery’ when loading ‘pdInfoBuilder’
>
> # Set the working directory
> setwd('...')
> getwd()
[1] "..."
>
> # Path to required files
> path_to_files = '...'
> annot_probeset_name = 'HTA-2_0.na36.hg19.probeset.csv'
> annotat_tc_name = 'HTA-2_0.r3.na36.hg19.a1.transcript.csv'
> clf = list.files(path_to_files, pattern = ".clf", full.names = TRUE)
> pgf = list.files(path_to_files, pattern = ".pgf", full.names = TRUE)
> mps = list.files(path_to_files, pattern = ".mps", full.names = TRUE)
>
> # Initialising the class and building the package
> my_hta_2.0 = new('AffyExonPDInfoPkgSeed', probeFile = annot_probeset_name,
+ transFile = annotat_tc_name, coreMps = mps, pgfFile = pgf,
+ clfFile = clf, extendedMps = mps, fullMps = mps, chipName = 'HTA-2_0')
> makePdInfoPackage(my_hta_2.0, destDir = ".")
================================================================================
Building annotation package for Affymetrix Exon ST Array
PGF.........: HTA-2_0.r3.pgf
CLF.........: HTA-2_0.r3.clf
Probeset....: HTA-2_0.na36.hg19.probeset.csv
Transcript..: HTA-2_0.r3.na36.hg19.a1.transcript.csv
Core MPS....: HTA-2_0.r3.Psrs.mps
Full MPS....: HTA-2_0.r3.Psrs.mps
Extended MPS: HTA-2_0.r3.Psrs.mps
================================================================================
Parsing file: HTA-2_0.r3.pgf...
OK
Parsing file: HTA-2_0.r3.clf... OK
Creating initial table for probes... OK
Creating dictionaries... OK
Parsing file: HTA-2_0.na36.hg19.probeset.csv... OK
Parsing file: HTA-2_0.r3.Psrs.mps... OK
Parsing file: HTA-2_0.r3.Psrs.mps... OK
Parsing file: HTA-2_0.r3.Psrs.mps... OK
Creating package in ./pd.hta.2.0
Inserting 850 rows into table chrom_dict... OK
Inserting 5 rows into table level_dict... OK
Inserting 9 rows into table type_dict... OK
Inserting 577432 rows into table core_mps... OK
Inserting 577432 rows into table full_mps... OK
Inserting 577432 rows into table extended_mps... OK
Inserting 1836622 rows into table featureSet... OK
Inserting 7576209 rows into table pmfeature... OK
Inserting 1121 rows into table mmfeature... OK
Counting rows in chrom_dict
Counting rows in core_mps
Counting rows in extended_mps
Counting rows in featureSet
Counting rows in full_mps
Counting rows in level_dict
Counting rows in mmfeature
Counting rows in pmfeature
Counting rows in type_dict
Creating index idx_pmfsetid on pmfeature... OK
Creating index idx_pmfid on pmfeature... Error: UNIQUE constraint failed: pmfeature.fid
In addition: There were 12 warnings (use warnings() to see them)
>
> warnings()
Warning messages:
1: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
2: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
3: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
4: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
5: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
6: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
7: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
8: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
9: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
10: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
11: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
12: In result_fetch(res@ptr, n = n) :
SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
> # Session Info
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /.../tools/R-4.0.2/lib/libRblas.so
LAPACK: /.../tools/R-4.0.2/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] pdInfoBuilder_1.52.0 affxparser_1.60.0 RSQLite_2.2.0
[4] oligo_1.52.0 Biostrings_2.56.0 XVector_0.28.0
[7] IRanges_2.22.2 S4Vectors_0.26.1 Biobase_2.48.0
[10] oligoClasses_1.50.0 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 compiler_4.0.2
[3] BiocManager_1.30.10 GenomeInfoDb_1.24.2
[5] bitops_1.0-6 iterators_1.0.12
[7] tools_4.0.2 zlibbioc_1.34.0
[9] digest_0.6.25 bit_1.1-15.2
[11] memoise_1.1.0 preprocessCore_1.50.0
[13] lattice_0.20-41 ff_2.2-14.2
[15] pkgconfig_2.0.3 rlang_0.4.6
[17] Matrix_1.2-18 foreach_1.5.0
[19] DelayedArray_0.14.0 DBI_1.1.0
[21] GenomeInfoDbData_1.2.3 vctrs_0.3.1
[23] bit64_0.9-7 grid_4.0.2
[25] blob_1.2.1 codetools_0.2-16
[27] matrixStats_0.56.0 GenomicRanges_1.40.0
[29] splines_4.0.2 SummarizedExperiment_1.18.1
[31] RCurl_1.98-1.2 crayon_1.3.4
[33] affyio_1.58.0
>
>
Dear James,
Thank you for your response. I've tried this method and everything worked perfectly. Thank you very much for your help, even though my problem was only a mistaken seed. Next time I'm gone be more cautious to use the correct one.
Thank you again, Marton
Dear James,
Thank you for your response. I've tried this method and everything worked perfectly. Thank you very much for your help, even though my problem was only a mistaken seed. Next time I'm gone be more cautious to use the correct one.
Thank you again, Marton