error when using bumphunter package with illumina 450k data
1
0
Entering edit mode
fleur_p90 • 0
@fleur_p90-10888
Last seen 3.1 years ago
Netherlands

Hello all,

I've done 450k illumina arrays of 128 samples and now I would like to perform the bumphunter analysis to find differentially methylated regions between the different groups that I've divided my samples in (group 1-6). I created all the necessary files I think but when I run the command I get an error. The code looks as follows:

bumps<-bumphunterEngine(beta_sorted,design=characteristics,m$chr,m$pos,mcluster1,coef=2,cutoff=0.5,nullMethod="bootstrap",smooth=TRUE,B=1000,verbose=TRUE,smoothFunction=loessByCluster)

I'll shortly try to explain what everyhing is

beta_sorted = my data file with all beta values for each sample and CpG - it is a matrix with rownames = cpg names as character and columnames = samples names as character

my design file is named characteristics with sample names (characters) and group in which each sample is in and some other general patient information like age and gender

m=annotation file containing the Cpg names (characters) with corresponding chromosomes (factor) and positions (int) of these sites

mcluster1=made with clustermaker with following code: 

mcluster1<-clusterMaker(m$chr, m$pos,assumeSorted=FALSE,maxGap=300)

Coef=the 2nd column in the design file contains the group numbers each patient belongs to

nullMethod=bootstrap because I have more covariates in my design file 

 

The error that I keep getting is:

[bumphunterEngine] Parallelizing using 2 workers/cores (backend: doParallelSNOW, version: 1.0.10).
[bumphunterEngine] Computing coefficients.
Error in qr.default(A) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In qr.default(A) : NAs introduced by coercion

 

I do not understand where I go wrong and what this error tells me, who can help me with this?

Thank you in advance for helping me out!!

Kind regards,

Fleur Peters

 

traceback()

4: qr.default(A)
3: qr(A)
2: .getEstimate(mat = mat, design = design, coef = coef, full = FALSE)
1: bumphunterEngine(beta_sorted, design = characteristics, mGRange$chr, 
       mGRange$pos, mcluster1, coef = 4, cutoff = 0.5, smooth = TRUE, 
       verbose = TRUE, smoothFunction = loessByCluster)

and sessionInfo()

R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.10          minfi_1.16.1               plyr_1.8.4                 BiocInstaller_1.20.3      
 [5] Biostrings_2.38.4          XVector_0.10.0             SummarizedExperiment_1.0.2 lattice_0.20-33           
 [9] Biobase_2.30.0             bumphunter_1.10.0          locfit_1.5-9.1             iterators_1.0.8           
[13] foreach_1.4.3              GenomicRanges_1.22.4       GenomeInfoDb_1.6.3         IRanges_2.4.8             
[17] S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] nor1mix_1.2-1           splines_3.2.5           ellipse_0.3-8           doRNG_1.6               Rsamtools_1.22.0       
 [6] RSQLite_1.0.0           limma_3.26.9            quadprog_1.5-5          digest_0.6.9            RColorBrewer_1.1-2     
[11] colorspace_1.2-6        preprocessCore_1.32.0   Matrix_1.2-6            GEOquery_2.36.0         siggenes_1.44.0        
[16] XML_3.98-1.4            mixOmics_5.2.0          biomaRt_2.26.1          genefilter_1.52.1       zlibbioc_1.16.0        
[21] xtable_1.8-2            corpcor_1.6.8           scales_0.4.0            BiocParallel_1.4.3      openssl_0.9.3          
[26] annotate_1.48.0         beanplot_1.2            pkgmaker_0.22           ggplot2_2.1.0           GenomicFeatures_1.22.13
[31] survival_2.39-4         magrittr_1.5            mclust_5.2              nlme_3.1-128            MASS_7.3-45            
[36] tools_3.2.5             registry_0.3            matrixStats_0.50.2      stringr_1.0.0           munsell_0.4.3          
[41] rngtools_1.2.4          AnnotationDbi_1.32.3    lambda.r_1.1.7          base64_2.0              futile.logger_1.4.1    
[46] grid_3.2.5              RCurl_1.95-4.8          igraph_1.0.1            bitops_1.0-6            gtable_0.2.0           
[51] codetools_0.2-14        multtest_2.26.0         DBI_0.4-1               reshape_0.8.5           illuminaio_0.12.0      
[56] GenomicAlignments_1.6.3 rtracklayer_1.30.4      futile.options_1.0.0    stringi_1.0-1           Rcpp_0.12.5            
[61] rgl_0.95.1441          

minfi bumphunter DNA methylation illumina450k • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

If you have Illumina 450K data, the best way to proceed is to process using the minfi package and just pass your GenomicRatioSet to bumphunter, and let it handle all the grunt work.

You probably don't want to use beta values for this analysis. Edit With that number of samples, you could probably assume that the central limit theorem will kick in and it will all be OK, but why do that when it's easy to convert the beta values to M-values, which are more amenable to modeling based on a null t-distribution. Uh, never mind that part - you are bootstrapping, not comparing to a conventional null distribution. However, I would still recommend using M-values, because they have a continuous distribution, whereas beta values are truncated at 0 and 1, which makes it difficult to detect differences at the extremes.

Your error is coming from .getEstimate, in particular where it is doing the QR decomposition on your design matrix after dropping the coefficient of interest. Which indicates that there is something weird going on with your design matrix, and since you get the error 'NAs introduced by coersion', this indicates that R is coercing non-numeric things to numbers, and some are converted to NA because R doesn't know what to do with them. As an example:

> baddesign <- cbind(rep(1,10), rep(0:1, each = 5), c(rnorm(9), "z"))
> baddesign
      [,1] [,2] [,3]                
 [1,] "1"  "0"  "-1.14742411456328"
 [2,] "1"  "0"  "0.191649535518432"
 [3,] "1"  "0"  "0.314704009938549"
 [4,] "1"  "0"  "-0.186322289297979"
 [5,] "1"  "0"  "0.902489976019952"
 [6,] "1"  "1"  "1.12718766673182"  
 [7,] "1"  "1"  "0.662080973489132"
 [8,] "1"  "1"  "-0.645562919258646"
 [9,] "1"  "1"  "-0.302092270728068"
[10,] "1"  "1"  "z"                

> qr(baddesign)
Error in qr.default(baddesign) :
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In qr.default(baddesign) : NAs introduced by coercion

So you should inspect your design matrix to see what the problem is.

ADD COMMENT

Login before adding your answer.

Traffic: 426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6