bug in cpg.annotate() and DMR.plot: bad defaults for what and arraytype arguments
3
0
Entering edit mode
Jenny Drnevich ★ 2.0k
@jenny-drnevich-2812
Last seen 14 days ago
United States

Hi all,

I've been following the F1000 tutorial on analyzing methylation array data (thanks to all involved in the paper and the packages - it's an amazing resource as I start my foray into methylation arrays!!) and their code now produces an error when running cpg.annotate(). I was able to track it down, and looks like at some point from R 3.3.0 to 3.3.3, the default values for arguments "what" and "arraytype" have changed. Perhaps the defaults used to be what = "M",  arraytype = "450K", and the cpg.annotate() function works when adding these (codes below). I've added these as a comment to the bottom of the F1000 paper, and also tagged minfi on this post to hopefully flag one of the authors down so they can update the paper's code.

However, in looking at the help for ?cpg.annotate, it doesn't list a default value for either "what" or "arraytype", but just lists the possible values what=c("Beta", "M") and arraytype=c("EPIC", "450K"). Usually, the first one listed is taken as the default, but it appears cpg.annotate() is passing on the full character vector for both instead of selecting one. I don't know if this counts a a bug or not, but definitely is unintended behavior. It also appears in DMR.plot() leading to a warning. Luckily, it looks like the arraytype doesnt affect the plot because I get apparently the exact same plot with the example in the F1000 paper. I don't know how many other DMRcate functions may be affected, but I wanted to point it out.

Thanks,

Jenny

 

#Following the codes from https://f1000research.com/articles/5-1281/v2, everything runs fine up to this point:

> myAnnotation <- cpg.annotate(mVals, datatype = "array", #arraytype="450K", what = "M",
+                              analysis.type="differential", design=design, 
+                              contrasts = TRUE, cont.matrix = contMatrix, 
+                              coef="naive - rTreg")
Loading required package: IlluminaHumanMethylationEPICanno.ilm10b2.hg19

Attaching package: 'IlluminaHumanMethylationEPICanno.ilm10b2.hg19'

The following objects are masked from 'package:IlluminaHumanMethylation450kanno.ilmn12.hg19':

    Islands.UCSC, Locations, Manifest, Other, SNPs.132CommonSingle,
    SNPs.135CommonSingle, SNPs.137CommonSingle, SNPs.138CommonSingle,
    SNPs.141CommonSingle, SNPs.142CommonSingle, SNPs.144CommonSingle,
    SNPs.146CommonSingle, SNPs.147CommonSingle, SNPs.Illumina

Error in if (nsig == 0) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In if (arraytype == "450K") { :
  the condition has length > 1 and only the first element will be used
2: In if (arraytype == "EPIC") { :
  the condition has length > 1 and only the first element will be used
3: In logit2(assay(object, "Beta")) : NaNs produced
4: In logit2(assay(object, "Beta")) : NaNs produced
5: Partial NA coefficients for 27744 probe(s) 
> myAnnotation <- cpg.annotate(mVals, datatype = "array", arraytype="450K", what = "M",
+                              analysis.type="differential", design=design, 
+                              contrasts = TRUE, cont.matrix = contMatrix, 
+                              coef="naive - rTreg")
Your contrast returned 3021 individually significant probes. We recommend the default setting of pcutoff in dmrcate().
> DMRs <- dmrcate(myAnnotation, lambda=1000, C=2)
Fitting chr1...
Fitting chr10...
Fitting chr11...
Fitting chr12...
Fitting chr13...
Fitting chr14...
Fitting chr15...
Fitting chr16...
Fitting chr17...
Fitting chr18...
Fitting chr19...
Fitting chr2...
Fitting chr20...
Fitting chr21...
Fitting chr22...
Fitting chr3...
Fitting chr4...
Fitting chr5...
Fitting chr6...
Fitting chr7...
Fitting chr8...
Fitting chr9...
Fitting chrX...
Fitting chrY...
Demarcating regions...
Done!
> data(dmrcatedata)
> results.ranges <- extractRanges(DMRs, genome = "hg19")
> groups <- pal[1:length(unique(targets$Sample_Group))]
> names(groups) <- levels(factor(targets$Sample_Group))
> cols <- groups[as.character(factor(targets$Sample_Group))]
> samps <- 1:nrow(targets)
> x11(10,10)
> par(mfrow=c(1,1))
> DMR.plot(ranges=results.ranges, dmr=1, CpGs=bVals, phen.col=cols,
+          pch=16, toscale=TRUE, plotmedians=TRUE, genome="hg19", samps=samps)
Warning messages:
1: In if (arraytype == "450K") { :
  the condition has length > 1 and only the first element will be used
2: In if (arraytype == "EPIC") { :
  the condition has length > 1 and only the first element will be used
> DMR.plot(ranges=results.ranges, dmr=1, CpGs=bVals, phen.col=cols, what = "Beta", arraytype = "450K",
+          pch=16, toscale=TRUE, plotmedians=TRUE, genome="hg19", samps=samps)
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] splines   grid      stats4    parallel  stats     graphics  grDevices utils    
 [9] datasets  methods   base     

other attached packages:
 [1] IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0
 [2] stringr_1.2.0                                      
 [3] DMRcate_1.10.8                                     
 [4] DMRcatedata_1.10.1                                 
 [5] DSS_2.14.0                                         
 [6] bsseq_1.10.0                                       
 [7] Gviz_1.18.2                                        
 [8] minfiData_0.20.0                                   
 [9] matrixStats_0.51.0                                 
[10] missMethyl_1.8.0                                   
[11] RColorBrewer_1.1-2                                 
[12] IlluminaHumanMethylation450kmanifest_0.4.0         
[13] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 
[14] minfi_1.20.2                                       
[15] bumphunter_1.14.0                                  
[16] locfit_1.5-9.1                                     
[17] iterators_1.0.8                                    
[18] foreach_1.4.3                                      
[19] Biostrings_2.42.1                                  
[20] XVector_0.14.1                                     
[21] SummarizedExperiment_1.4.0                         
[22] GenomicRanges_1.26.4                               
[23] GenomeInfoDb_1.10.3                                
[24] IRanges_2.8.2                                      
[25] S4Vectors_0.12.2                                   
[26] Biobase_2.34.0                                     
[27] BiocGenerics_0.20.0                                
[28] limma_3.30.13                                      

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2              siggenes_1.48.0              
 [3] mclust_5.2.3                  biovizBase_1.22.0            
 [5] htmlTable_1.9                 base64enc_0.1-3              
 [7] dichromat_2.0-0               base64_2.0                   
 [9] interactiveDisplayBase_1.12.0 AnnotationDbi_1.36.2         
[11] codetools_0.2-15              R.methodsS3_1.7.1            
[13] methylumi_2.20.0              knitr_1.15.1                 
[15] Formula_1.2-1                 Rsamtools_1.26.1             
[17] annotate_1.52.1               cluster_2.0.5                
[19] GO.db_3.4.0                   R.oo_1.21.0                  
[21] shiny_1.0.0                   httr_1.2.1                   
[23] backports_1.0.5               assertthat_0.1               
[25] Matrix_1.2-8                  lazyeval_0.2.0               
[27] acepack_1.4.1                 htmltools_0.3.5              
[29] tools_3.3.3                   gtable_0.2.0                 
[31] doRNG_1.6                     Rcpp_0.12.10                 
[33] multtest_2.30.0               preprocessCore_1.36.0        
[35] nlme_3.1-131                  rtracklayer_1.34.2           
[37] mime_0.5                      ensembldb_1.6.2              
[39] rngtools_1.2.4                gtools_3.5.0                 
[41] statmod_1.4.29                XML_3.98-1.5                 
[43] beanplot_1.2                  org.Hs.eg.db_3.4.0           
[45] AnnotationHub_2.6.5           zlibbioc_1.20.0              
[47] MASS_7.3-45                   scales_0.4.1                 
[49] BSgenome_1.42.0               VariantAnnotation_1.20.3     
[51] BiocInstaller_1.24.0          GEOquery_2.40.0              
[53] yaml_2.1.14                   memoise_1.0.0                
[55] gridExtra_2.2.1               ggplot2_2.2.1                
[57] pkgmaker_0.22                 biomaRt_2.30.0               
[59] rpart_4.1-10                  reshape_0.8.6                
[61] latticeExtra_0.6-28           stringi_1.1.3                
[63] RSQLite_1.1-2                 genefilter_1.56.0            
[65] permute_0.9-4                 checkmate_1.8.2              
[67] GenomicFeatures_1.26.3        BiocParallel_1.8.1           
[69] bitops_1.0-6                  nor1mix_1.2-2                
[71] lattice_0.20-34               ruv_0.9.6                    
[73] GenomicAlignments_1.10.1      htmlwidgets_0.8              
[75] plyr_1.8.4                    magrittr_1.5                 
[77] R6_2.2.0                      Hmisc_4.0-2                  
[79] DBI_0.6                       foreign_0.8-67               
[81] survival_2.41-2               RCurl_1.95-4.8               
[83] nnet_7.3-12                   tibble_1.2                   
[85] data.table_1.10.4             digest_0.6.12                
[87] xtable_1.8-2                  httpuv_1.3.3                 
[89] illuminaio_0.16.0             R.utils_2.5.0                
[91] openssl_0.9.6                 munsell_0.4.3                
[93] registry_0.3                  BiasedUrn_1.07               
[95] quadprog_1.5-5   
DMRcate minfi • 4.6k views
ADD COMMENT
0
Entering edit mode
Tim Peters ▴ 200
@tim-peters-7579
Last seen 14 days ago
Australia

Hi Jenny, 

Thanks for spotting this. Looks like I didn't put all my match.arg()s in! I have updated the git and svn, and the fix should propagate to the master within 48 hours.

Thanks again,

Tim

ADD COMMENT
0
Entering edit mode

Dear Tim Peters

I have the same error. While I have the updated package. Can you please help m?

myAnnotation <- cpg.annotate(datatype="array", mVals, what="M", arraytype="450K", analysis.type="differential", design=design, contrasts = TRUE, cont.matrix = contMatrix, coef=1)

Error in if (nsig == 0) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In logit2(assay(object, "Beta")) : NaNs produced
2: In logit2(assay(object, "Beta")) : NaNs produced
3: Partial NA coefficients for 61723 probe(s)

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

attached packages:

 [1] IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0
 [2] DMRcate_1.10.10                                    
 [3] DMRcatedata_1.10.1                                 
 [4] DSS_2.14.0                                         
 [5] bsseq_1.10.0                                       
 [6] Gviz_1.18.2                                        
  [8] BiocInstaller_1.24.0                               
 [9] doParallel_1.0.10                                  
[10] missMethyl_1.8.0                                   
[11] shinyMethyl_1.10.0                                 
[12] shiny_1.0.0                                        
[13] TCGAMethylation450k_1.10.0                         
[14] wateRmelon_1.18.0                                                                           
[16] lumi_2.26.4                                        
[17] GOstats_2.40.0                                     
[24] GenomicAlignments_1.10.0                           
[25] Rsamtools_1.26.1                                   
[26] sva_3.22.0                                         
[31] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 
[32] IlluminaHumanMethylation450kmanifest_0.4.0         
[33] genomation_1.6.0                                   
[34] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0            
[35] rtracklayer_1.34.2                                 
[36] RnBeads.hg38_1.6.0                                 
[37] RnBeads.hg19_1.6.0                                 
[38] RnBeads_1.6.1                                      
[40] methylumi_2.20.0                                  

Many thanks in advance,

Rahel

ADD REPLY
0
Entering edit mode
Tim Peters ▴ 200
@tim-peters-7579
Last seen 14 days ago
Australia

Dear Rahel,

At this point it's likely you have NAs in your mVals matrix giving you your partial NA coefficients. Please check the values in your matrix to make sure there aren't any. If this isn't the case, the only other problem I can forsee is that the mVals are enormous in both directions, which would give 0s and 1s in the GenomicRatioSet under the hood of cpg.annotate(), which would then be transformed to NAs. So as always, it is recommended to sanity-check your data before performing a downstream analysis like DMRcate.

Also, I would highly recommend naming the columns in your contrast matrix instead of using the index (i.e. coef=1), as per the limma manual.

Best,

Tim

ADD COMMENT
0
Entering edit mode

Come to think of it, it is vital that you name your column(s) in contMatrix, and then pass a column name to coef in your call to cpg.annotate(). Otherwise it will simply take the first column of design as your coefficient of interest, which I assume is not what you want. Look at what Jenny has done: she has specified 

coef="naive - rTreg"

in her call, which is correct for contrast setups.

Cheers,

Tim

ADD REPLY
0
Entering edit mode

Dear Tim Peters,

I checked the mVals and there is no NA in the matrix and if the mVals are enormous in both direction, is there any solution in this case?

BTW, I have done cpg.annotate with a name from my contMatrix and I get the same error ...

myAnnotation <- cpg.annotate(datatype = "array", mVals, what="M", arraytype="450K",  analysis.type="differential", design=design, contrasts = TRUE, cont.matrix = contMatrix, coef="PSC")
Error in if (nsig == 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
Partial NA coefficients for 1 probe(s) 

Many thanks,

Rahel

ADD REPLY
0
Entering edit mode
Tim Peters ▴ 200
@tim-peters-7579
Last seen 14 days ago
Australia

Hi Rahel,

Well, it says "Partial NA coefficients for 1 probe(s) " now instead of the 61,723 you had before, so that's progress. This means there is only one row in the matrix for which the model is inestimable. 

I would check a few more things about mVals, you'll need to remove rows that have any of these properties:

1) Whether any of the values are "Inf" or "-Inf" - is.infinite() will obviously help you here

2) Whether any of the rows all have exactly the same value - zero variances may throw limma as well, though I am unsure about the exact behaviour.

3) Look at range(mVals), if it's outside about (-25, 25) then that will transform to (0, 1) and then going back again you'll get NA/Infs.

I consider these standard QC measures that are assumed on the data before you attempt to call DMRs. You would have picked all of these up previously if you followed the fantastic pipeline that has been mentioned earlier in the thread: https://f1000research.com/articles/5-1281/v3.

Rather than jump straight into DMRs it pays to get a feel for your data first and understand what you are working with - don't run before you can walk!

Cheers,

Tim

 

ADD COMMENT
0
Entering edit mode

Dear Tim Peters,

Many thanks for your help and explanation. I am actually walking slowly. I did read the paper you linked here before and also I went through details of minfi, RnBeads, Methylumi and so on to choose the fitted analysis and normalization methods for my data. I have cancer samples from two different stage that cluster very close to each other and Controls. With RnBeads and BMIQ normalisation, I do not get such an error. As it is explained in minfi, It is better to use Funnorm normalization on the samples with Global differences ... I am doing the analysis with minfi, now I solve the error via changing the normalization command from:

>MSet.funnorm <- preprocessFunnorm(RGSet)

to 

>MSet.funnorm <- preprocessFunnorm(RGSet,nPCs = 0,bgCorr=TRUE,sex = NULL, dyeCorr = TRUE, verbose = TRUE)

Now, I have no error in cpg.annotate step and here is the outcome:

> myAnnotation <- cpg.annotate(object = mVals, datatype = "array", what = "M",
                                analysis.type= "differential", design = design,
                                contrasts = TRUE, cont.matrix = contMatrix,
                                coef = "PSC", arraytype = "450K")

Your contrast returned 317471 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

I hope this is correct now and I can go forward with the analysis ...

Many thanks again,

Rahel

ADD REPLY
0
Entering edit mode
You probably do not want to use preprocessFunnorm with nPCs = 0. Best, Kasper On Tue, Apr 11, 2017 at 8:56 AM, rahel14350 [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User rahel14350 <https: support.bioconductor.org="" u="" 8472=""/> wrote Comment: > bug in cpg.annotate() and DMR.plot: bad defaults for what and arraytype > arguments <https: support.bioconductor.org="" p="" 94211="" #94782="">: > > Dear Tim Peters <https: support.bioconductor.org="" u="" 7579=""/>, > > Many thanks for your help and explanation. I am actually walking slowly. I > did read the paper you linked here before and also I went through details > of minfi, RnBeads, Methylumi and so on to choose the fitted analysis and > normalization methods for my data. I have cancer samples from two different > stage that cluster very close to each other and Controls. With RnBeads and > BMIQ normalisation, I do not get such an error. As it is explained in > minfi, It is better to use Funnorm normalization on the samples with Global > differences ... I am doing the analysis with minfi, now I solve the error > via changing the normalization command from: > > >MSet.funnorm <- preprocessFunnorm(RGSet) > > to > > >MSet.funnorm <- preprocessFunnorm(RGSet,nPCs = 0,bgCorr=TRUE,sex = NULL, > dyeCorr = TRUE, verbose = TRUE) > > Now, I have no error in cpg.annotate step and here is the outcome: > > > myAnnotation <- cpg.annotate(object = mVals, datatype = "array", what = > "M", > analysis.type= "differential", design = > design, > contrasts = TRUE, cont.matrix = contMatrix, > coef = "PSC", arraytype = "450K") > > Your contrast returned 317471 individually significant probes. We > recommend the default setting of pcutoff in dmrcate(). > > I hope this is correct now and I can go forward with the analysis ... > > Many thanks again, > > Rahel > > ------------------------------ > > Post tags: DMRcate, minfi > > You may reply via email or visit https://support.bioconductor. > org/p/94211/#94782 >
ADD REPLY
0
Entering edit mode

Dear Kasper Daniel Hansen and Dear Tim Peters,

Many thanks for your help. I did repeat the Funnorm with nPcs=2 and I did find the probe with NA value using: pval = fit2$p.value and fitcoef= fit2$coef

Then I went back to the filtration step and remove it by

>probes=c("cg06545389")
>keep <- !(featureNames(MSetfunnormFlt) %in% probes)
>MSetfunnormFlt<- MSetfunnormFlt[keep,]

and Now I do not have any error in DMR step. Hope this was the correct approach ...

Kind Regards,

Rahel

 

ADD REPLY
0
Entering edit mode

Dear all,

I also face the same errors when doing the cpg.annotate:

make the GenomicRatioSetFromMatrix manually due to using of EPICv2

GRset <- makeGenomicRatioSetFromMatrix( mat = mVals, array = "IlluminaHumanMethylationEPICv2", annotation = "20a1.hg38" )

setting some annotation

myAnnotation <- cpg.annotate(datatype = "array", object=GRset,
what="M", analysis.type ="differential", design = design, contrasts = TRUE, cont.matrix = contMatrix, coef = "TNBC - nonTNBC", arraytype = "EPIC", array = "IlluminaHumanMethylationEPICv2", annotation = "20a1.hg38")

and I got:

Error in if (nsig == 0) { : missing value where TRUE/FALSE needed

In addition: Warning messages:

1: In logit2(assay(object, "Beta")) : NaNs produced

2: In logit2(assay(object, "Beta")) : NaNs produced

3: Partial NA coefficients for 46419 probe(s)

I've checked my mVals as suggested, but I didn't find Inf or -Inf values, and the values are also in (-25, 25) range.

I've also checked the "Beta" from GRset@assays@data@listData$Beta, I didn't find the NaN.

I already consider the standard QC measures that has been mentioned earlier in the thread: https://f1000research.com/articles/5-1281/v3.

is there something that may I've missed?

Many thanks, Fika

ADD REPLY
0
Entering edit mode

Hi Fika,

Thanks for flagging this. DMRcate for EPICv2 is currently undergoing beta testing and should be available within the next few weeks. It won't work with the current version of cpg.annotate(). Once it's available I'll bump this thread. Many thanks for your patience.

Best, Tim

ADD REPLY

Login before adding your answer.

Traffic: 676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6