Question: crlmm support for Illumina arrays with no annotation package?
gravatar for deanpett
21 days ago by
deanpett0 wrote:

I've been experiencing some trouble attempting to use the function crlmmIllumina for preprocessing and genotyping of Illumina QC Array-24 chip using the CRLMM algorithm. According to the current crlmm documentation, if there is no bioconductor annotation package for your array, you should still be able to import the data with a properly formatted anno data.frame.

From the help page for genotype.Illumina(): "In general, a chip specific annotation package is required to use the genotype.Illumina function. If this is not available (newer chip types or custom chips often don't have a chip-specific package available on Bioconductor), consider using cdfName='nopackage' and specifying anno and genome, which runs 'krlmm' on the samples available. Here anno is a data.frame read in from the relevant chip-specific manifest, which must have additional columns 'isSnp' which is a logical that indicates whether a probe is polymorphic or not, 'position', 'chromosome' and 'featureNames' that give the location on the chromosome and SNP name."

I've prepared my anno data.frame:

> head(manifest)
  chromosome  position    featureNames isSnp                           IlmnID            Name IlmnStrand   SNP AddressA_ID AlleleA_ProbeSeq AddressB_ID
1          1 159174749 1:159174749-C-T  TRUE 1:159174749-C-T-0_B_F_2304232049 1:159174749-C-T        BOT [T/C]    65600245               NA           0
2          1 159174749 1:159175193-A-G  TRUE 1:159175193-A-G-0_B_R_2304232052 1:159175193-A-G        BOT [T/C]    13658935               NA           0
3          1 159174749 1:159175211-C-T  TRUE 1:159175211-C-T-0_T_R_2304232054 1:159175211-C-T        TOP [A/G]    14755267               NA           0
4          1 159174749 1:159175253-G-A  TRUE 1:159175253-G-A-0_T_F_2304232055 1:159175253-G-A        TOP [A/G]    78702422               NA           0
5          1 159174749 1:159175495-G-A  TRUE 1:159175495-G-A-0_T_F_2304232061 1:159175495-G-A        TOP [A/G]    73657552               NA           0
6          1 159174749  1:159175540-TC  TRUE  1:159175540-TC-0_T_R_2299219123  1:159175540-TC        TOP [A/G]    19715188               NA           0
  AlleleB_ProbeSeq Chr   MapInfo  Ploidy      Species CustomerStrand IlmnStrand_1 IllumicodeSeq TopGenomicSeq
1               NA   1 159204959 diploid Homo sapiens            BOT          BOT            NA            NA
2               NA   1 159205403 diploid Homo sapiens            TOP          BOT            NA            NA
3               NA   1 159205421 diploid Homo sapiens            BOT          TOP            NA            NA
4               NA   1 159205463 diploid Homo sapiens            TOP          TOP            NA            NA
5               NA   1 159205705 diploid Homo sapiens            TOP          TOP            NA            NA
6               NA   1 159205750 diploid Homo sapiens            BOT          TOP            NA            NA

and i make my call to genotype.Illumina() as follows:

crlmmResult = genotype.Illumina(path = "../path/to/idat_files",
                                arrayNames= NULL,
                                sep = "_",
                                highDensity = F,
                                fileExt=list(green="Grn.idat", red="Red.idat"),
                                cdfName= 'nopackage',
                                call.method = "krlmm",
                                anno = manifest,
                                genome = "hg19")

Despite the documentation suggesting that anno should be a: "data.frame containing SNP annotation information from manifest and additional columns 'isSnp', 'position', 'chromosome' and 'featureNames'. For use when cdfName='nopackage'' it still throws the following error:

Instantiate CNSet container.
Initializing container for genotyping and copy number estimation
Processing sample stratum 1 of 1
Error in colnames(anno@data) : 
  trying to get slot "data" from an object (class "data.frame") that is not an S4 object 
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.4.0         stringr_1.4.0         dplyr_0.8.3           purrr_0.3.3           readr_1.3.1           tidyr_1.0.0           tibble_2.1.3         
 [8] ggplot2_3.2.1         tidyverse_1.2.1       crlmm_1.43.0          preprocessCore_1.47.1 oligoClasses_1.47.0   ff_2.2-14             bit_1.1-14           
[15] Biobase_2.45.1        BiocGenerics_0.31.6  

loaded via a namespace (and not attached):
 [1] nlme_3.1-139                bitops_1.0-6                matrixStats_0.55.0          lubridate_1.7.4             httr_1.4.1                 
 [6] GenomeInfoDb_1.21.2         tools_3.6.0                 backports_1.1.5             utf8_1.1.4                  R6_2.4.0                   
[11] affyio_1.55.0               DBI_1.0.0                   lazyeval_0.2.2              colorspace_1.4-1            withr_2.1.2                
[16] tidyselect_0.2.5            base64_2.0                  compiler_3.6.0              cli_1.1.0                   rvest_0.3.4                
[21] xml2_1.2.2                  DelayedArray_0.11.8         scales_1.0.0                mvtnorm_1.0-11              askpass_1.1                
[26] illuminaio_0.27.1           XVector_0.25.0              pkgconfig_2.0.3             limma_3.41.18               rlang_0.4.0                
[31] readxl_1.3.1                rstudioapi_0.10             VGAM_1.1-1                  generics_0.0.2              jsonlite_1.6               
[36] BiocParallel_1.19.4         RCurl_1.95-4.12             magrittr_1.5                GenomeInfoDbData_1.2.1      Matrix_1.2-17              
[41] Rcpp_1.0.2                  munsell_0.5.0               S4Vectors_0.23.25           fansi_0.4.0                 lifecycle_0.1.0            
[46] stringi_1.4.3               SummarizedExperiment_1.15.9 zlibbioc_1.31.0             grid_3.6.0                  crayon_1.3.4               
[51] lattice_0.20-38             Biostrings_2.53.2           haven_2.1.1                 splines_3.6.0               hms_0.5.1                  
[56] zeallot_0.1.0               knitr_1.25                  beanplot_1.2                pillar_1.4.2                GenomicRanges_1.37.17      
[61] codetools_0.2-16            stats4_3.6.0                glue_1.3.1                  BiocManager_1.30.8          modelr_0.1.5               
[66] vctrs_0.2.0                 foreach_1.4.7               cellranger_1.1.0            gtable_0.3.0                openssl_1.4.1              
[71] assertthat_0.2.1            xfun_0.10                   broom_0.5.2                 RcppEigen_0.         iterators_1.0.12           
[76] IRanges_2.19.17             ellipse_0.4.1      

This all suggests to me that anno cannot actually be just a data.frame and genotype.Illumina() in fact does in fact require a S4 annotation object created with a package for custom and/or currently unsupported arrays rather than accepting a data.frame as the annotation suggests.

I'd love some help to get down to the bottom of this, as I REALLY want to avoid using the super clunky GenomeStudio software so I can fully automate my genotyping process.

Thanks in advance, Dean

annotation crlmm qcarray • 36 views
ADD COMMENTlink modified 21 days ago • written 21 days ago by deanpett0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 226 users visited in the last hour