450k annotation packages compatible version and analysis with blockFinder (minfi)

0

Entering edit mode

Florence Cavalli ▴ 50

@florence-cavalli-4588

Last seen 8.5 years ago

Hello, I am working with 450k methylation data. - Is there a way to modify the annotation linked to an MethylSet object, (or is this something we are not supposed to do)? I created my large MethylSet object under the pervious version of R and Bioc and at the time the annotation package IlluminaHumanMethylation450kannotation.ilmn.v1.2 was used. However this package is not available in BioC 2.13 but IlluminaHumanMethylation450kanno.ilmn12.hg19 replaced it (as far as I understood). I would like to be able to do this since I'd like to use the new cpgCollapse() and blockFinder() functions from minfi that are under BioC 2.13. Indeed cpgCollapse() calls getAnnotation() and since I cannot load the annotation package in R-3.0.2/BioC 2.13 an error is returned (and the new functions are not available under the previous version of BioC). Is there a way around this problem ? I tried the annotation() function but without success. Or do I have to reconstruct my MethylSet object from the very beginning reading the raw data using the library IlluminaHumanMethylation450kanno.ilmn12.hg19, normalysing (and re- running my clustering to be sure to have consistent results) ? It is just that I would save quite some time if I could actually convert my object. - Also I am planning to use the the following workflow to run blockFinder() and find differentially methylated regions. To be able to ultimately run blockFinder(), using your the minfi test data I created a GenomicMethylSet with mapToGenome() function from the MethylSet object then I ran cpgCollapse() to obtain an GenomicRatioSet object and finally used this object and a design matrix to run blockFinder(). See example: As in the vignette: library(IlluminaHumanMethylation450kmanifest) library(IlluminaHumanMethylation450kanno.ilmn12.hg19) library(minfi) require(minfiData) baseDir <- system.file("extdata", package = "minfiData") targets <- read.450k.sheet(baseDir) targets sub(baseDir, "", targets$Basename) RGset <- read.450k.exp(base = baseDir, targets = targets) RGset pd <- pData(RGset) pd[,1:4] Then MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = "controls") Mset.swan <- preprocessSWAN(RGset, MSet.norm) mset <- Mset.swan > mset MethylSet (storageMode: lockedEnvironment) assayData: 485512 features, 6 samples element names: Meth, Unmeth phenoData sampleNames: 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R06C02 (6 total) varLabels: Sample_Name Sample_Well ... filenames (13 total) varMetadata: labelDescription Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19 Preprocessing Method: SWAN (based on a MethylSet preprocesses as 'Illumina, bg.correct = TRUE, normalize = controls, reference = 1' minfi version: 1.8.1 Manifest version: 0.4.0 > > class(mset) [1] "MethylSet" attr(,"package") [1] "minfi" > gmSet = mapToGenome(mset) > class(gmSet) [1] "GenomicMethylSet" attr(,"package") [1] "minfi" > gmSet_cpgCollapse=cpgCollapse(gmSet,what="M",returnBlockInfo=FALSE) [cpgCollapse] Creating annotation. Error in cpgCollapseAnnotation(gr, relationToIsland, islandName, maxGap = maxGap, : object has to be ordered. Not sure about the following !? But I seems that sorting the object works > gmSet_s=sort(gmSet) > gmSet_cpgCollapse=cpgCollapse(gmSet_s,what="M",returnBlockInfo=FALSE) [cpgCollapse] Creating annotation. [cpgCollapseAnnotation] Clustering islands and clusters of probes. [cpgCollapseAnnotation] Computing new annotation. [cpgCollapseAnnotation] Defining blocks. [cpgCollapse] Collapsing data...... > class(gmSet_cpgCollapse) [1] "GenomicRatioSet" attr(,"package") [1] "minfi" ## Sample groups ## > targets[,"Sample_Group"] ## [1] "GroupA" "GroupA" "GroupB" "GroupB" "GroupA" "GroupB" > design <- model.matrix(~ 0+factor(c(1,1,2,2,1,2))) > colnames(design) <- c("groupA", "groupB") > head(design) groupA groupB 1 1 0 2 1 0 3 0 1 4 0 1 5 1 0 6 0 1 > Block_g2=blockFinder(gmSet_cpgCollapse, design=design, coef = 2, what = "M",pickCutoff=TRUE,smooth=FALSE) [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.1). [bumphunterEngine] Computing coefficients. [bumphunterEngine] Performing 1000 permutations. [bumphunterEngine] Computing marginal permutation p-values. [bumphunterEngine] cutoff: 12033998108438140 [bumphunterEngine] Finding regions. [bumphunterEngine] No bumps found! Error in res$table$indexStart : $ operator is invalid for atomic vectors I assume that if some bumps are found it would not return an error!? Would you advice to smooth the signal or not? Is that the proper way of running this analysis ? (I wonder since I did not find any example for such analysis) - Finally I noticed that in the cpgCollapse function the attribute Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name from a data frame return by getAnnotation object are called. However looking at the result of the getAnnotation() function called there isn't any Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name columns attributes (see below)? Are we supposed to use another annotation package? > cpgCollapse function (object, what = c("Beta", "M"), maxGap = 500, blockMaxGap = 2.5 * 10^5, maxClusterWidth = 1500, dataSummary = colMeans, na.rm = FALSE, returnBlockInfo = TRUE, verbose = TRUE, ...) { what <- match.arg(what) gr <- granges(object) ann <- getAnnotation(object) relationToIsland <- ann$Relation_to_UCSC_CpG_Island islandName <- ann$UCSC_CpG_Islands_Name .... .... } class(gmSet) gmSet > class(gmSet) [1] "GenomicMethylSet" attr(,"package") [1] "minfi" > gmSet class: GenomicMethylSet dim: 485512 6 exptData(0): assays(2): Meth Unmeth rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923 rowData metadata column names(0): colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02 5723646053_R06C02 colData names(13): Sample_Name Sample_Well ... Basename filenames Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19 Preprocessing Method: SWAN (based on a MethylSet preprocesses as 'Illumina, bg.correct = TRUE, normalize = controls, reference = 1' minfi version: 1.8.1 Manifest version: 0.4.0 A=getAnnotation(gmSet) colnames(A) > A=getAnnotation(gmSet) > colnames(A) [1] "chr" "pos" [3] "strand" "Name" [5] "AddressA" "AddressB" [7] "ProbeSeqA" "ProbeSeqB" [9] "Type" "NextBase" [11] "Color" "Probe_rs" [13] "Probe_maf" "CpG_rs" [15] "CpG_maf" "SBE_rs" [17] "SBE_maf" "Islands_Name" [19] "Relation_to_Island" "Forward_Sequence" [21] "SourceSeq" "Random_Loci" [23] "Methyl27_Loci" "UCSC_RefGene_Name" [25] "UCSC_RefGene_Accession" "UCSC_RefGene_Group" [27] "Phantom" "DMR" [29] "Enhancer" "HMM_Island" [31] "Regulatory_Feature_Name" "Regulatory_Feature_Group" [33] "DHS" > A$Relation_to_UCSC_CpG_Island NULL > A$UCSC_CpG_Islands_Name NULL sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] minfiData_0.4.2 [2] BiocInstaller_1.12.0 [3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.0 [4] IlluminaHumanMethylation450kmanifest_0.4.0 [5] minfi_1.8.1 [6] bumphunter_1.2.0 [7] locfit_1.5-9.1 [8] iterators_1.0.6 [9] foreach_1.4.1 [10] Biostrings_2.30.0 [11] GenomicRanges_1.14.1 [12] XVector_0.2.0 [13] IRanges_1.20.0 [14] reshape_0.8.4 [15] plyr_1.8 [16] lattice_0.20-24 [17] Biobase_2.22.0 [18] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 base64_1.1 [4] beanplot_1.1 codetools_0.2-8 DBI_0.2-7 [7] digest_0.6.3 doRNG_1.5.5 genefilter_1.44.0 [10] grid_3.0.2 illuminaio_0.4.0 itertools_0.1-1 [13] limma_3.18.0 MASS_7.3-29 matrixStats_0.8.12 [16] mclust_4.2 multtest_2.18.0 nlme_3.1-111 [19] nor1mix_1.1-4 pkgmaker_0.17.4 preprocessCore_1.24.0 [22] R.methodsS3_1.5.2 RColorBrewer_1.0-5 registry_0.2 [25] rngtools_1.2.3 RSQLite_0.11.4 siggenes_1.36.0 [28] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 [31] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 [34] xtable_1.7-1 Thank you very much for developing these very useful packages! Best, Florence -- Florence Cavalli, PhD. The Hospital for Sick Children Brain Tumour Research Centre 101 College Street, TMDT-11-701 Toronto, ON M5G1L7 Canada [[alternative HTML version deleted]]

Annotation Clustering convert minfi Annotation Clustering convert minfi • 3.0k views

ADD COMMENT • link updated 10.5 years ago by Kasper Daniel Hansen ★ 6.5k • written 10.5 years ago by Florence Cavalli ▴ 50

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 10 months ago

United States

1) You can update your objects by running OBJECT = updateObject(OBJECT) This fixes everything and there should be no problems. Unless you have been using the hg18 coordinates from the old annotation package, in which case you're out of luck. The reason for this is that I realized that a substantial amount of the annotation material provided by illumina of course depends on the genome build, for example relation to CpG island. And Illumina only provides this for hg19, not hg18. 2) A lot of the other errors you have seems to be real errors probably caused by some late re-organization of the code. Thanks for providing an extremely nice example where they fail, which should make it very easy to track down. I suspect some late code cleaning errors to be behind this. I will investigate and report back very soon. 3) If you want to find DMRs, which we define as smaller regions like 100bp-5kb or so, you want to use bumphunter() The function blockFinder is for finding "blocks" which are large scale DMRs, on the megabase scale, like in our 'recent' Nat Genet paper. Not sure if that is what you want? Best, Kasper On Tue, Oct 22, 2013 at 4:04 PM, Florence Cavalli <florence@ebi.ac.uk>wrote: > Hello, > > I am working with 450k methylation data. > - Is there a way to modify the annotation linked to an MethylSet object, > (or is this something we are not supposed to do)? > I created my large MethylSet object under the pervious version of R and > Bioc and at the time the annotation package > IlluminaHumanMethylation450kannotation.ilmn.v1.2 was used. However this > package is not available in BioC 2.13 > but IlluminaHumanMethylation450kanno.ilmn12.hg19 replaced it (as far as I > understood). > > I would like to be able to do this since I'd like to use the > new cpgCollapse() and blockFinder() functions from minfi that are under > BioC 2.13. Indeed cpgCollapse() calls getAnnotation() and since I cannot > load the annotation package in R-3.0.2/BioC 2.13 an error is returned (and > the new functions are not available under the previous version of BioC). > > Is there a way around this problem ? I tried the annotation() function but > without success. Or do I have to reconstruct my MethylSet object from the > very beginning reading the raw data using the library > IlluminaHumanMethylation450kanno.ilmn12.hg19, normalysing (and re- running > my clustering to be sure to have consistent results) ? It is just that I > would save quite some time if I could actually convert my object. > > - Also I am planning to use the the following workflow to run blockFinder() > and find differentially methylated regions. > > To be able to ultimately run blockFinder(), using your the minfi test data > I created a GenomicMethylSet with mapToGenome() function from > the MethylSet object then I ran cpgCollapse() to obtain an > GenomicRatioSet object and finally used this object and a design matrix to > run blockFinder(). > See example: > > As in the vignette: > library(IlluminaHumanMethylation450kmanifest) > library(IlluminaHumanMethylation450kanno.ilmn12.hg19) > library(minfi) > require(minfiData) > > baseDir <- system.file("extdata", package = "minfiData") > targets <- read.450k.sheet(baseDir) > targets > > sub(baseDir, "", targets$Basename) > RGset <- read.450k.exp(base = baseDir, targets = targets) > RGset > pd <- pData(RGset) > pd[,1:4] > > Then > > MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = > "controls") > Mset.swan <- preprocessSWAN(RGset, MSet.norm) > > mset <- Mset.swan > > > > mset > MethylSet (storageMode: lockedEnvironment) > assayData: 485512 features, 6 samples > element names: Meth, Unmeth > phenoData > sampleNames: 5723646052_R02C02 5723646052_R04C01 ... > 5723646053_R06C02 (6 total) > varLabels: Sample_Name Sample_Well ... filenames (13 total) > varMetadata: labelDescription > Annotation > array: IlluminaHumanMethylation450k > annotation: ilmn12.hg19 > Preprocessing > Method: SWAN (based on a MethylSet preprocesses as 'Illumina, bg.correct > = TRUE, normalize = controls, reference = 1' > minfi version: 1.8.1 > Manifest version: 0.4.0 > > > > > class(mset) > [1] "MethylSet" > attr(,"package") > [1] "minfi" > > gmSet = mapToGenome(mset) > > class(gmSet) > [1] "GenomicMethylSet" > attr(,"package") > [1] "minfi" > > gmSet_cpgCollapse=cpgCollapse(gmSet,what="M",returnBlockInfo=FALSE) > [cpgCollapse] Creating annotation. > Error in cpgCollapseAnnotation(gr, relationToIsland, islandName, maxGap = > maxGap, : > object has to be ordered. > > Not sure about the following !? But I seems that sorting the object works > > gmSet_s=sort(gmSet) > > gmSet_cpgCollapse=cpgCollapse(gmSet_s,what="M",returnBlockInfo=FALSE) > [cpgCollapse] Creating annotation. > [cpgCollapseAnnotation] Clustering islands and clusters of probes. > [cpgCollapseAnnotation] Computing new annotation. > [cpgCollapseAnnotation] Defining blocks. > [cpgCollapse] Collapsing data...... > > class(gmSet_cpgCollapse) > [1] "GenomicRatioSet" > attr(,"package") > [1] "minfi" > > ## Sample groups > ## > targets[,"Sample_Group"] > ## [1] "GroupA" "GroupA" "GroupB" "GroupB" "GroupA" "GroupB" > > design <- model.matrix(~ 0+factor(c(1,1,2,2,1,2))) > > colnames(design) <- c("groupA", "groupB") > > head(design) > groupA groupB > 1 1 0 > 2 1 0 > 3 0 1 > 4 0 1 > 5 1 0 > 6 0 1 > > > Block_g2=blockFinder(gmSet_cpgCollapse, design=design, coef = 2, what = > "M",pickCutoff=TRUE,smooth=FALSE) > [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.1). > [bumphunterEngine] Computing coefficients. > [bumphunterEngine] Performing 1000 permutations. > [bumphunterEngine] Computing marginal permutation p-values. > [bumphunterEngine] cutoff: 12033998108438140 > [bumphunterEngine] Finding regions. > [bumphunterEngine] No bumps found! > Error in res$table$indexStart : $ operator is invalid for atomic vectors > > I assume that if some bumps are found it would not return an error!? > Would you advice to smooth the signal or not? > > Is that the proper way of running this analysis ? (I wonder since I did not > find any example for such analysis) > > - Finally I noticed that in the cpgCollapse function the attribute > Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name from a data frame > return by getAnnotation object are called. > However looking at the result of the getAnnotation() function called there > isn't any Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name columns > attributes (see below)? Are we supposed to use another annotation package? > > > > cpgCollapse > function (object, what = c("Beta", "M"), maxGap = 500, blockMaxGap = 2.5 * > 10^5, maxClusterWidth = 1500, dataSummary = colMeans, na.rm = FALSE, > returnBlockInfo = TRUE, verbose = TRUE, ...) > { > what <- match.arg(what) > gr <- granges(object) > ann <- getAnnotation(object) > relationToIsland <- ann$Relation_to_UCSC_CpG_Island > islandName <- ann$UCSC_CpG_Islands_Name > .... > .... > } > > class(gmSet) > gmSet > > class(gmSet) > [1] "GenomicMethylSet" > attr(,"package") > [1] "minfi" > > gmSet > class: GenomicMethylSet > dim: 485512 6 > exptData(0): > assays(2): Meth Unmeth > rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923 > rowData metadata column names(0): > colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02 > 5723646053_R06C02 > colData names(13): Sample_Name Sample_Well ... Basename filenames > Annotation > array: IlluminaHumanMethylation450k > annotation: ilmn12.hg19 > Preprocessing > Method: SWAN (based on a MethylSet preprocesses as 'Illumina, bg.correct > = TRUE, normalize = controls, reference = 1' > minfi version: 1.8.1 > Manifest version: 0.4.0 > > > A=getAnnotation(gmSet) > colnames(A) > > > A=getAnnotation(gmSet) > > colnames(A) > [1] "chr" "pos" > [3] "strand" "Name" > [5] "AddressA" "AddressB" > [7] "ProbeSeqA" "ProbeSeqB" > [9] "Type" "NextBase" > [11] "Color" "Probe_rs" > [13] "Probe_maf" "CpG_rs" > [15] "CpG_maf" "SBE_rs" > [17] "SBE_maf" "Islands_Name" > [19] "Relation_to_Island" "Forward_Sequence" > [21] "SourceSeq" "Random_Loci" > [23] "Methyl27_Loci" "UCSC_RefGene_Name" > [25] "UCSC_RefGene_Accession" "UCSC_RefGene_Group" > [27] "Phantom" "DMR" > [29] "Enhancer" "HMM_Island" > [31] "Regulatory_Feature_Name" "Regulatory_Feature_Group" > [33] "DHS" > > A$Relation_to_UCSC_CpG_Island > NULL > > A$UCSC_CpG_Islands_Name > NULL > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] minfiData_0.4.2 > [2] BiocInstaller_1.12.0 > [3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.0 > [4] IlluminaHumanMethylation450kmanifest_0.4.0 > [5] minfi_1.8.1 > [6] bumphunter_1.2.0 > [7] locfit_1.5-9.1 > [8] iterators_1.0.6 > [9] foreach_1.4.1 > [10] Biostrings_2.30.0 > [11] GenomicRanges_1.14.1 > [12] XVector_0.2.0 > [13] IRanges_1.20.0 > [14] reshape_0.8.4 > [15] plyr_1.8 > [16] lattice_0.20-24 > [17] Biobase_2.22.0 > [18] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 base64_1.1 > [4] beanplot_1.1 codetools_0.2-8 DBI_0.2-7 > [7] digest_0.6.3 doRNG_1.5.5 genefilter_1.44.0 > [10] grid_3.0.2 illuminaio_0.4.0 itertools_0.1-1 > [13] limma_3.18.0 MASS_7.3-29 matrixStats_0.8.12 > [16] mclust_4.2 multtest_2.18.0 nlme_3.1-111 > [19] nor1mix_1.1-4 pkgmaker_0.17.4 preprocessCore_1.24.0 > [22] R.methodsS3_1.5.2 RColorBrewer_1.0-5 registry_0.2 > [25] rngtools_1.2.3 RSQLite_0.11.4 siggenes_1.36.0 > [28] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 > [31] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 > [34] xtable_1.7-1 > > > Thank you very much for developing these very useful packages! > Best, > Florence > > -- > Florence Cavalli, PhD. > The Hospital for Sick Children > Brain Tumour Research Centre > 101 College Street, TMDT-11-701 > Toronto, ON M5G1L7 > Canada > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.5 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Here is a list of the reported issues 1) The object returned by mapToGenome has is.unsorted return TRUE. This is now fixed, see NEWS below. 2) When bumphunter() returns zero bumps, it throws an error. On the todo list. 3) Is there an issue with the renaming of Relation_to_UCSC_CpG_Island to Relation_to_Island in cpgCollapse and helper function? The answer is yes, it was a bug and it is fixed now. Re. your example, your design matrix for the block finding is wrong. You are testing whether samples in group B has a methylation of 0.5, which is obviously not true. This is why your inferred cutoff is 12033998108438140 - you're easily off by a factor 10^15 (!!!!). You need to remove the 0+ in the model matrix line, and just have design = model.matrix(~+factor()). In fact, even better is design = model.matrix(~ gmSet$Sample_Group) Finally, we are working on a more detailed example for the vignette, but it is not done yet. Fixes are in minfi 1.8.3 (1.9.3 devel) which should be available from the build system in around 36h. * The function mapToGenome would return something that looked like an unordered GenomicMethylSet. Actually, loci were correctly ordered within chromosomes, the issue had to do with whether the chromosomes were ordered as chr1, chr2, chr3 (used in minfi) or chr1, chr10, chr11 (lexigraphically). Reported by Florence Cavalli <florence@ebi.ac.uk>. * Bug in cpgCollapse led to incorrect results. Your output is affected if table(granges(output[[1]])$type) is all 'OpenSea'. Reported by Florence Cavalli <florence@ebi.ac.uk>. Best, Kasper On Tue, Oct 22, 2013 at 8:27 PM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > 1) You can update your objects by running > OBJECT = updateObject(OBJECT) > This fixes everything and there should be no problems. Unless you have > been using the hg18 coordinates from the old annotation package, in which > case you're out of luck. > > The reason for this is that I realized that a substantial amount of the > annotation material provided by illumina of course depends on the genome > build, for example relation to CpG island. And Illumina only provides this > for hg19, not hg18. > > 2) A lot of the other errors you have seems to be real errors probably > caused by some late re-organization of the code. Thanks for providing an > extremely nice example where they fail, which should make it very easy to > track down. I suspect some late code cleaning errors to be behind this. I > will investigate and report back very soon. > > 3) If you want to find DMRs, which we define as smaller regions like > 100bp-5kb or so, you want to use > bumphunter() > The function blockFinder is for finding "blocks" which are large scale > DMRs, on the megabase scale, like in our 'recent' Nat Genet paper. Not > sure if that is what you want? > > Best, > Kasper > > > On Tue, Oct 22, 2013 at 4:04 PM, Florence Cavalli <florence@ebi.ac.uk>wrote: > >> Hello, >> >> I am working with 450k methylation data. >> - Is there a way to modify the annotation linked to an MethylSet object, >> (or is this something we are not supposed to do)? >> I created my large MethylSet object under the pervious version of R and >> Bioc and at the time the annotation package >> IlluminaHumanMethylation450kannotation.ilmn.v1.2 was used. However this >> package is not available in BioC 2.13 >> but IlluminaHumanMethylation450kanno.ilmn12.hg19 replaced it (as far as I >> understood). >> >> I would like to be able to do this since I'd like to use the >> new cpgCollapse() and blockFinder() functions from minfi that are under >> BioC 2.13. Indeed cpgCollapse() calls getAnnotation() and since I cannot >> load the annotation package in R-3.0.2/BioC 2.13 an error is returned (and >> the new functions are not available under the previous version of BioC). >> >> Is there a way around this problem ? I tried the annotation() function but >> without success. Or do I have to reconstruct my MethylSet object from the >> very beginning reading the raw data using the library >> IlluminaHumanMethylation450kanno.ilmn12.hg19, normalysing (and re- running >> my clustering to be sure to have consistent results) ? It is just that I >> would save quite some time if I could actually convert my object. >> >> - Also I am planning to use the the following workflow to run >> blockFinder() >> and find differentially methylated regions. >> >> To be able to ultimately run blockFinder(), using your the minfi test data >> I created a GenomicMethylSet with mapToGenome() function from >> the MethylSet object then I ran cpgCollapse() to obtain an >> GenomicRatioSet object and finally used this object and a design matrix >> to >> run blockFinder(). >> See example: >> >> As in the vignette: >> library(IlluminaHumanMethylation450kmanifest) >> library(IlluminaHumanMethylation450kanno.ilmn12.hg19) >> library(minfi) >> require(minfiData) >> >> baseDir <- system.file("extdata", package = "minfiData") >> targets <- read.450k.sheet(baseDir) >> targets >> >> sub(baseDir, "", targets$Basename) >> RGset <- read.450k.exp(base = baseDir, targets = targets) >> RGset >> pd <- pData(RGset) >> pd[,1:4] >> >> Then >> >> MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = >> "controls") >> Mset.swan <- preprocessSWAN(RGset, MSet.norm) >> >> mset <- Mset.swan >> >> >> > mset >> MethylSet (storageMode: lockedEnvironment) >> assayData: 485512 features, 6 samples >> element names: Meth, Unmeth >> phenoData >> sampleNames: 5723646052_R02C02 5723646052_R04C01 ... >> 5723646053_R06C02 (6 total) >> varLabels: Sample_Name Sample_Well ... filenames (13 total) >> varMetadata: labelDescription >> Annotation >> array: IlluminaHumanMethylation450k >> annotation: ilmn12.hg19 >> Preprocessing >> Method: SWAN (based on a MethylSet preprocesses as 'Illumina, bg.correct >> = TRUE, normalize = controls, reference = 1' >> minfi version: 1.8.1 >> Manifest version: 0.4.0 >> > >> >> > class(mset) >> [1] "MethylSet" >> attr(,"package") >> [1] "minfi" >> > gmSet = mapToGenome(mset) >> > class(gmSet) >> [1] "GenomicMethylSet" >> attr(,"package") >> [1] "minfi" >> > gmSet_cpgCollapse=cpgCollapse(gmSet,what="M",returnBlockInfo=FALSE) >> [cpgCollapse] Creating annotation. >> Error in cpgCollapseAnnotation(gr, relationToIsland, islandName, maxGap = >> maxGap, : >> object has to be ordered. >> >> Not sure about the following !? But I seems that sorting the object works >> > gmSet_s=sort(gmSet) >> > gmSet_cpgCollapse=cpgCollapse(gmSet_s,what="M",returnBlockInfo=FALSE) >> [cpgCollapse] Creating annotation. >> [cpgCollapseAnnotation] Clustering islands and clusters of probes. >> [cpgCollapseAnnotation] Computing new annotation. >> [cpgCollapseAnnotation] Defining blocks. >> [cpgCollapse] Collapsing data...... >> > class(gmSet_cpgCollapse) >> [1] "GenomicRatioSet" >> attr(,"package") >> [1] "minfi" >> >> ## Sample groups >> ## > targets[,"Sample_Group"] >> ## [1] "GroupA" "GroupA" "GroupB" "GroupB" "GroupA" "GroupB" >> > design <- model.matrix(~ 0+factor(c(1,1,2,2,1,2))) >> > colnames(design) <- c("groupA", "groupB") >> > head(design) >> groupA groupB >> 1 1 0 >> 2 1 0 >> 3 0 1 >> 4 0 1 >> 5 1 0 >> 6 0 1 >> >> > Block_g2=blockFinder(gmSet_cpgCollapse, design=design, coef = 2, what = >> "M",pickCutoff=TRUE,smooth=FALSE) >> [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.1). >> [bumphunterEngine] Computing coefficients. >> [bumphunterEngine] Performing 1000 permutations. >> [bumphunterEngine] Computing marginal permutation p-values. >> [bumphunterEngine] cutoff: 12033998108438140 >> [bumphunterEngine] Finding regions. >> [bumphunterEngine] No bumps found! >> Error in res$table$indexStart : $ operator is invalid for atomic vectors >> >> I assume that if some bumps are found it would not return an error!? >> Would you advice to smooth the signal or not? >> >> Is that the proper way of running this analysis ? (I wonder since I did >> not >> find any example for such analysis) >> >> - Finally I noticed that in the cpgCollapse function the attribute >> Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name from a data frame >> return by getAnnotation object are called. >> However looking at the result of the getAnnotation() function called there >> isn't any Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name columns >> attributes (see below)? Are we supposed to use another annotation package? >> >> >> > cpgCollapse >> function (object, what = c("Beta", "M"), maxGap = 500, blockMaxGap = 2.5 * >> 10^5, maxClusterWidth = 1500, dataSummary = colMeans, na.rm = FALSE, >> returnBlockInfo = TRUE, verbose = TRUE, ...) >> { >> what <- match.arg(what) >> gr <- granges(object) >> ann <- getAnnotation(object) >> relationToIsland <- ann$Relation_to_UCSC_CpG_Island >> islandName <- ann$UCSC_CpG_Islands_Name >> .... >> .... >> } >> >> class(gmSet) >> gmSet >> > class(gmSet) >> [1] "GenomicMethylSet" >> attr(,"package") >> [1] "minfi" >> > gmSet >> class: GenomicMethylSet >> dim: 485512 6 >> exptData(0): >> assays(2): Meth Unmeth >> rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923 >> rowData metadata column names(0): >> colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02 >> 5723646053_R06C02 >> colData names(13): Sample_Name Sample_Well ... Basename filenames >> Annotation >> array: IlluminaHumanMethylation450k >> annotation: ilmn12.hg19 >> Preprocessing >> Method: SWAN (based on a MethylSet preprocesses as 'Illumina, bg.correct >> = TRUE, normalize = controls, reference = 1' >> minfi version: 1.8.1 >> Manifest version: 0.4.0 >> >> >> A=getAnnotation(gmSet) >> colnames(A) >> >> > A=getAnnotation(gmSet) >> > colnames(A) >> [1] "chr" "pos" >> [3] "strand" "Name" >> [5] "AddressA" "AddressB" >> [7] "ProbeSeqA" "ProbeSeqB" >> [9] "Type" "NextBase" >> [11] "Color" "Probe_rs" >> [13] "Probe_maf" "CpG_rs" >> [15] "CpG_maf" "SBE_rs" >> [17] "SBE_maf" "Islands_Name" >> [19] "Relation_to_Island" "Forward_Sequence" >> [21] "SourceSeq" "Random_Loci" >> [23] "Methyl27_Loci" "UCSC_RefGene_Name" >> [25] "UCSC_RefGene_Accession" "UCSC_RefGene_Group" >> [27] "Phantom" "DMR" >> [29] "Enhancer" "HMM_Island" >> [31] "Regulatory_Feature_Name" "Regulatory_Feature_Group" >> [33] "DHS" >> > A$Relation_to_UCSC_CpG_Island >> NULL >> > A$UCSC_CpG_Islands_Name >> NULL >> >> sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] minfiData_0.4.2 >> [2] BiocInstaller_1.12.0 >> [3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.0 >> [4] IlluminaHumanMethylation450kmanifest_0.4.0 >> [5] minfi_1.8.1 >> [6] bumphunter_1.2.0 >> [7] locfit_1.5-9.1 >> [8] iterators_1.0.6 >> [9] foreach_1.4.1 >> [10] Biostrings_2.30.0 >> [11] GenomicRanges_1.14.1 >> [12] XVector_0.2.0 >> [13] IRanges_1.20.0 >> [14] reshape_0.8.4 >> [15] plyr_1.8 >> [16] lattice_0.20-24 >> [17] Biobase_2.22.0 >> [18] BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.40.0 AnnotationDbi_1.24.0 base64_1.1 >> [4] beanplot_1.1 codetools_0.2-8 DBI_0.2-7 >> [7] digest_0.6.3 doRNG_1.5.5 genefilter_1.44.0 >> [10] grid_3.0.2 illuminaio_0.4.0 itertools_0.1-1 >> [13] limma_3.18.0 MASS_7.3-29 matrixStats_0.8.12 >> [16] mclust_4.2 multtest_2.18.0 nlme_3.1-111 >> [19] nor1mix_1.1-4 pkgmaker_0.17.4 preprocessCore_1.24.0 >> [22] R.methodsS3_1.5.2 RColorBrewer_1.0-5 registry_0.2 >> [25] rngtools_1.2.3 RSQLite_0.11.4 siggenes_1.36.0 >> [28] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 >> [31] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 >> [34] xtable_1.7-1 >> >> >> Thank you very much for developing these very useful packages! >> Best, >> Florence >> >> -- >> Florence Cavalli, PhD. >> The Hospital for Sick Children >> Brain Tumour Research Centre >> 101 College Street, TMDT-11-701 >> Toronto, ON M5G1L7 >> Canada >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 10.5 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Btw, thank you for an extremely useful example/bug report. Best, Kasper On Wed, Oct 23, 2013 at 11:59 PM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > Here is a list of the reported issues > 1) The object returned by mapToGenome has is.unsorted return TRUE. This > is now fixed, see NEWS below. > 2) When bumphunter() returns zero bumps, it throws an error. On the todo > list. > 3) Is there an issue with the renaming of Relation_to_UCSC_CpG_Island to > Relation_to_Island in cpgCollapse and helper function? The answer is yes, > it was a bug and it is fixed now. > > Re. your example, your design matrix for the block finding is wrong. You > are testing whether samples in group B has a methylation of 0.5, which is > obviously not true. This is why your inferred cutoff is 12033998108438140 > - you're easily off by a factor 10^15 (!!!!). You need to remove the 0+ in > the model matrix line, and just have design = model.matrix(~+factor()). In > fact, even better is > design = model.matrix(~ gmSet$Sample_Group) > > Finally, we are working on a more detailed example for the vignette, but > it is not done yet. > > Fixes are in minfi 1.8.3 (1.9.3 devel) which should be available from the > build system in around 36h. > > * The function mapToGenome would return something that looked > like an unordered GenomicMethylSet. Actually, loci were > correctly ordered within chromosomes, the issue had to do > with whether the chromosomes were ordered as chr1, chr2, chr3 > (used in minfi) or chr1, chr10, chr11 (lexigraphically). > Reported by Florence Cavalli <florence@ebi.ac.uk>. > > * Bug in cpgCollapse led to incorrect results. Your output is > affected if table(granges(output[[1]])$type) is all > 'OpenSea'. Reported by Florence Cavalli > <florence@ebi.ac.uk>. > > Best, > Kasper > > > > > > On Tue, Oct 22, 2013 at 8:27 PM, Kasper Daniel Hansen < > kasperdanielhansen@gmail.com> wrote: > >> 1) You can update your objects by running >> OBJECT = updateObject(OBJECT) >> This fixes everything and there should be no problems. Unless you have >> been using the hg18 coordinates from the old annotation package, in which >> case you're out of luck. >> >> The reason for this is that I realized that a substantial amount of the >> annotation material provided by illumina of course depends on the genome >> build, for example relation to CpG island. And Illumina only provides this >> for hg19, not hg18. >> >> 2) A lot of the other errors you have seems to be real errors probably >> caused by some late re-organization of the code. Thanks for providing an >> extremely nice example where they fail, which should make it very easy to >> track down. I suspect some late code cleaning errors to be behind this. I >> will investigate and report back very soon. >> >> 3) If you want to find DMRs, which we define as smaller regions like >> 100bp-5kb or so, you want to use >> bumphunter() >> The function blockFinder is for finding "blocks" which are large scale >> DMRs, on the megabase scale, like in our 'recent' Nat Genet paper. Not >> sure if that is what you want? >> >> Best, >> Kasper >> >> >> On Tue, Oct 22, 2013 at 4:04 PM, Florence Cavalli <florence@ebi.ac.uk>wrote: >> >>> Hello, >>> >>> I am working with 450k methylation data. >>> - Is there a way to modify the annotation linked to an MethylSet object, >>> (or is this something we are not supposed to do)? >>> I created my large MethylSet object under the pervious version of R and >>> Bioc and at the time the annotation package >>> IlluminaHumanMethylation450kannotation.ilmn.v1.2 was used. However this >>> package is not available in BioC 2.13 >>> but IlluminaHumanMethylation450kanno.ilmn12.hg19 replaced it (as far as I >>> understood). >>> >>> I would like to be able to do this since I'd like to use the >>> new cpgCollapse() and blockFinder() functions from minfi that are under >>> BioC 2.13. Indeed cpgCollapse() calls getAnnotation() and since I cannot >>> load the annotation package in R-3.0.2/BioC 2.13 an error is returned >>> (and >>> the new functions are not available under the previous version of BioC). >>> >>> Is there a way around this problem ? I tried the annotation() function >>> but >>> without success. Or do I have to reconstruct my MethylSet object from >>> the >>> very beginning reading the raw data using the library >>> IlluminaHumanMethylation450kanno.ilmn12.hg19, normalysing (and re- running >>> my clustering to be sure to have consistent results) ? It is just that I >>> would save quite some time if I could actually convert my object. >>> >>> - Also I am planning to use the the following workflow to run >>> blockFinder() >>> and find differentially methylated regions. >>> >>> To be able to ultimately run blockFinder(), using your the minfi test >>> data >>> I created a GenomicMethylSet with mapToGenome() function from >>> the MethylSet object then I ran cpgCollapse() to obtain an >>> GenomicRatioSet object and finally used this object and a design matrix >>> to >>> run blockFinder(). >>> See example: >>> >>> As in the vignette: >>> library(IlluminaHumanMethylation450kmanifest) >>> library(IlluminaHumanMethylation450kanno.ilmn12.hg19) >>> library(minfi) >>> require(minfiData) >>> >>> baseDir <- system.file("extdata", package = "minfiData") >>> targets <- read.450k.sheet(baseDir) >>> targets >>> >>> sub(baseDir, "", targets$Basename) >>> RGset <- read.450k.exp(base = baseDir, targets = targets) >>> RGset >>> pd <- pData(RGset) >>> pd[,1:4] >>> >>> Then >>> >>> MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = >>> "controls") >>> Mset.swan <- preprocessSWAN(RGset, MSet.norm) >>> >>> mset <- Mset.swan >>> >>> >>> > mset >>> MethylSet (storageMode: lockedEnvironment) >>> assayData: 485512 features, 6 samples >>> element names: Meth, Unmeth >>> phenoData >>> sampleNames: 5723646052_R02C02 5723646052_R04C01 ... >>> 5723646053_R06C02 (6 total) >>> varLabels: Sample_Name Sample_Well ... filenames (13 total) >>> varMetadata: labelDescription >>> Annotation >>> array: IlluminaHumanMethylation450k >>> annotation: ilmn12.hg19 >>> Preprocessing >>> Method: SWAN (based on a MethylSet preprocesses as 'Illumina, >>> bg.correct >>> = TRUE, normalize = controls, reference = 1' >>> minfi version: 1.8.1 >>> Manifest version: 0.4.0 >>> > >>> >>> > class(mset) >>> [1] "MethylSet" >>> attr(,"package") >>> [1] "minfi" >>> > gmSet = mapToGenome(mset) >>> > class(gmSet) >>> [1] "GenomicMethylSet" >>> attr(,"package") >>> [1] "minfi" >>> > gmSet_cpgCollapse=cpgCollapse(gmSet,what="M",returnBlockInfo=FALSE) >>> [cpgCollapse] Creating annotation. >>> Error in cpgCollapseAnnotation(gr, relationToIsland, islandName, maxGap = >>> maxGap, : >>> object has to be ordered. >>> >>> Not sure about the following !? But I seems that sorting the object works >>> > gmSet_s=sort(gmSet) >>> > gmSet_cpgCollapse=cpgCollapse(gmSet_s,what="M",returnBlockInfo=FALSE) >>> [cpgCollapse] Creating annotation. >>> [cpgCollapseAnnotation] Clustering islands and clusters of probes. >>> [cpgCollapseAnnotation] Computing new annotation. >>> [cpgCollapseAnnotation] Defining blocks. >>> [cpgCollapse] Collapsing data...... >>> > class(gmSet_cpgCollapse) >>> [1] "GenomicRatioSet" >>> attr(,"package") >>> [1] "minfi" >>> >>> ## Sample groups >>> ## > targets[,"Sample_Group"] >>> ## [1] "GroupA" "GroupA" "GroupB" "GroupB" "GroupA" "GroupB" >>> > design <- model.matrix(~ 0+factor(c(1,1,2,2,1,2))) >>> > colnames(design) <- c("groupA", "groupB") >>> > head(design) >>> groupA groupB >>> 1 1 0 >>> 2 1 0 >>> 3 0 1 >>> 4 0 1 >>> 5 1 0 >>> 6 0 1 >>> >>> > Block_g2=blockFinder(gmSet_cpgCollapse, design=design, coef = 2, what = >>> "M",pickCutoff=TRUE,smooth=FALSE) >>> [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.1). >>> [bumphunterEngine] Computing coefficients. >>> [bumphunterEngine] Performing 1000 permutations. >>> [bumphunterEngine] Computing marginal permutation p-values. >>> [bumphunterEngine] cutoff: 12033998108438140 >>> [bumphunterEngine] Finding regions. >>> [bumphunterEngine] No bumps found! >>> Error in res$table$indexStart : $ operator is invalid for atomic vectors >>> >>> I assume that if some bumps are found it would not return an error!? >>> Would you advice to smooth the signal or not? >>> >>> Is that the proper way of running this analysis ? (I wonder since I did >>> not >>> find any example for such analysis) >>> >>> - Finally I noticed that in the cpgCollapse function the attribute >>> Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name from a data frame >>> return by getAnnotation object are called. >>> However looking at the result of the getAnnotation() function called >>> there >>> isn't any Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name columns >>> attributes (see below)? Are we supposed to use another annotation >>> package? >>> >>> >>> > cpgCollapse >>> function (object, what = c("Beta", "M"), maxGap = 500, blockMaxGap = 2.5 >>> * >>> 10^5, maxClusterWidth = 1500, dataSummary = colMeans, na.rm = FALSE, >>> returnBlockInfo = TRUE, verbose = TRUE, ...) >>> { >>> what <- match.arg(what) >>> gr <- granges(object) >>> ann <- getAnnotation(object) >>> relationToIsland <- ann$Relation_to_UCSC_CpG_Island >>> islandName <- ann$UCSC_CpG_Islands_Name >>> .... >>> .... >>> } >>> >>> class(gmSet) >>> gmSet >>> > class(gmSet) >>> [1] "GenomicMethylSet" >>> attr(,"package") >>> [1] "minfi" >>> > gmSet >>> class: GenomicMethylSet >>> dim: 485512 6 >>> exptData(0): >>> assays(2): Meth Unmeth >>> rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923 >>> rowData metadata column names(0): >>> colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02 >>> 5723646053_R06C02 >>> colData names(13): Sample_Name Sample_Well ... Basename filenames >>> Annotation >>> array: IlluminaHumanMethylation450k >>> annotation: ilmn12.hg19 >>> Preprocessing >>> Method: SWAN (based on a MethylSet preprocesses as 'Illumina, >>> bg.correct >>> = TRUE, normalize = controls, reference = 1' >>> minfi version: 1.8.1 >>> Manifest version: 0.4.0 >>> >>> >>> A=getAnnotation(gmSet) >>> colnames(A) >>> >>> > A=getAnnotation(gmSet) >>> > colnames(A) >>> [1] "chr" "pos" >>> [3] "strand" "Name" >>> [5] "AddressA" "AddressB" >>> [7] "ProbeSeqA" "ProbeSeqB" >>> [9] "Type" "NextBase" >>> [11] "Color" "Probe_rs" >>> [13] "Probe_maf" "CpG_rs" >>> [15] "CpG_maf" "SBE_rs" >>> [17] "SBE_maf" "Islands_Name" >>> [19] "Relation_to_Island" "Forward_Sequence" >>> [21] "SourceSeq" "Random_Loci" >>> [23] "Methyl27_Loci" "UCSC_RefGene_Name" >>> [25] "UCSC_RefGene_Accession" "UCSC_RefGene_Group" >>> [27] "Phantom" "DMR" >>> [29] "Enhancer" "HMM_Island" >>> [31] "Regulatory_Feature_Name" "Regulatory_Feature_Group" >>> [33] "DHS" >>> > A$Relation_to_UCSC_CpG_Island >>> NULL >>> > A$UCSC_CpG_Islands_Name >>> NULL >>> >>> sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] minfiData_0.4.2 >>> [2] BiocInstaller_1.12.0 >>> [3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.0 >>> [4] IlluminaHumanMethylation450kmanifest_0.4.0 >>> [5] minfi_1.8.1 >>> [6] bumphunter_1.2.0 >>> [7] locfit_1.5-9.1 >>> [8] iterators_1.0.6 >>> [9] foreach_1.4.1 >>> [10] Biostrings_2.30.0 >>> [11] GenomicRanges_1.14.1 >>> [12] XVector_0.2.0 >>> [13] IRanges_1.20.0 >>> [14] reshape_0.8.4 >>> [15] plyr_1.8 >>> [16] lattice_0.20-24 >>> [17] Biobase_2.22.0 >>> [18] BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.40.0 AnnotationDbi_1.24.0 base64_1.1 >>> [4] beanplot_1.1 codetools_0.2-8 DBI_0.2-7 >>> [7] digest_0.6.3 doRNG_1.5.5 genefilter_1.44.0 >>> [10] grid_3.0.2 illuminaio_0.4.0 itertools_0.1-1 >>> [13] limma_3.18.0 MASS_7.3-29 matrixStats_0.8.12 >>> [16] mclust_4.2 multtest_2.18.0 nlme_3.1-111 >>> [19] nor1mix_1.1-4 pkgmaker_0.17.4 preprocessCore_1.24.0 >>> [22] R.methodsS3_1.5.2 RColorBrewer_1.0-5 registry_0.2 >>> [25] rngtools_1.2.3 RSQLite_0.11.4 siggenes_1.36.0 >>> [28] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 >>> [31] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 >>> [34] xtable_1.7-1 >>> >>> >>> Thank you very much for developing these very useful packages! >>> Best, >>> Florence >>> >>> -- >>> Florence Cavalli, PhD. >>> The Hospital for Sick Children >>> Brain Tumour Research Centre >>> 101 College Street, TMDT-11-701 >>> Toronto, ON M5G1L7 >>> Canada >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]

ADD REPLY • link 10.5 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Thank you very much for your quick answers and looking in detail into this as well as updating the package. The updateObject(OBJECT) function works perfectly ! :-) Yes the corrected design matrix makes sense. I was really exploring and did not fully understand how blockfinder worked yet. Will look at the bumphunter function as well. Looking forward to the more detailed example :-) Thanks again, Florence 2013/10/24 Kasper Daniel Hansen <kasperdanielhansen@gmail.com> > Btw, thank you for an extremely useful example/bug report. > > Best, > Kasper > > > On Wed, Oct 23, 2013 at 11:59 PM, Kasper Daniel Hansen < > kasperdanielhansen@gmail.com> wrote: > >> Here is a list of the reported issues >> 1) The object returned by mapToGenome has is.unsorted return TRUE. This >> is now fixed, see NEWS below. >> 2) When bumphunter() returns zero bumps, it throws an error. On the todo >> list. >> 3) Is there an issue with the renaming of Relation_to_UCSC_CpG_Island to >> Relation_to_Island in cpgCollapse and helper function? The answer is yes, >> it was a bug and it is fixed now. >> >> Re. your example, your design matrix for the block finding is wrong. You >> are testing whether samples in group B has a methylation of 0.5, which is >> obviously not true. This is why your inferred cutoff is 12033998108438140 >> - you're easily off by a factor 10^15 (!!!!). You need to remove the 0+ in >> the model matrix line, and just have design = model.matrix(~+factor()). In >> fact, even better is >> design = model.matrix(~ gmSet$Sample_Group) >> >> Finally, we are working on a more detailed example for the vignette, >> but it is not done yet. >> >> Fixes are in minfi 1.8.3 (1.9.3 devel) which should be available from the >> build system in around 36h. >> >> * The function mapToGenome would return something that looked >> like an unordered GenomicMethylSet. Actually, loci were >> correctly ordered within chromosomes, the issue had to do >> with whether the chromosomes were ordered as chr1, chr2, chr3 >> (used in minfi) or chr1, chr10, chr11 (lexigraphically). >> Reported by Florence Cavalli <florence@ebi.ac.uk>. >> >> * Bug in cpgCollapse led to incorrect results. Your output is >> affected if table(granges(output[[1]])$type) is all >> 'OpenSea'. Reported by Florence Cavalli >> <florence@ebi.ac.uk>. >> >> Best, >> Kasper >> >> >> >> >> >> On Tue, Oct 22, 2013 at 8:27 PM, Kasper Daniel Hansen < >> kasperdanielhansen@gmail.com> wrote: >> >>> 1) You can update your objects by running >>> OBJECT = updateObject(OBJECT) >>> This fixes everything and there should be no problems. Unless you have >>> been using the hg18 coordinates from the old annotation package, in which >>> case you're out of luck. >>> >>> The reason for this is that I realized that a substantial amount of the >>> annotation material provided by illumina of course depends on the genome >>> build, for example relation to CpG island. And Illumina only provides this >>> for hg19, not hg18. >>> >>> 2) A lot of the other errors you have seems to be real errors probably >>> caused by some late re-organization of the code. Thanks for providing an >>> extremely nice example where they fail, which should make it very easy to >>> track down. I suspect some late code cleaning errors to be behind this. I >>> will investigate and report back very soon. >>> >>> 3) If you want to find DMRs, which we define as smaller regions like >>> 100bp-5kb or so, you want to use >>> bumphunter() >>> The function blockFinder is for finding "blocks" which are large scale >>> DMRs, on the megabase scale, like in our 'recent' Nat Genet paper. Not >>> sure if that is what you want? >>> >>> Best, >>> Kasper >>> >>> >>> On Tue, Oct 22, 2013 at 4:04 PM, Florence Cavalli <florence@ebi.ac.uk>wrote: >>> >>>> Hello, >>>> >>>> I am working with 450k methylation data. >>>> - Is there a way to modify the annotation linked to an MethylSet object, >>>> (or is this something we are not supposed to do)? >>>> I created my large MethylSet object under the pervious version of R and >>>> Bioc and at the time the annotation package >>>> IlluminaHumanMethylation450kannotation.ilmn.v1.2 was used. However this >>>> package is not available in BioC 2.13 >>>> but IlluminaHumanMethylation450kanno.ilmn12.hg19 replaced it (as far as >>>> I >>>> understood). >>>> >>>> I would like to be able to do this since I'd like to use the >>>> new cpgCollapse() and blockFinder() functions from minfi that are under >>>> BioC 2.13. Indeed cpgCollapse() calls getAnnotation() and since I cannot >>>> load the annotation package in R-3.0.2/BioC 2.13 an error is returned >>>> (and >>>> the new functions are not available under the previous version of BioC). >>>> >>>> Is there a way around this problem ? I tried the annotation() function >>>> but >>>> without success. Or do I have to reconstruct my MethylSet object from >>>> the >>>> very beginning reading the raw data using the library >>>> IlluminaHumanMethylation450kanno.ilmn12.hg19, normalysing (and >>>> re-running >>>> my clustering to be sure to have consistent results) ? It is just that I >>>> would save quite some time if I could actually convert my object. >>>> >>>> - Also I am planning to use the the following workflow to run >>>> blockFinder() >>>> and find differentially methylated regions. >>>> >>>> To be able to ultimately run blockFinder(), using your the minfi test >>>> data >>>> I created a GenomicMethylSet with mapToGenome() function from >>>> the MethylSet object then I ran cpgCollapse() to obtain an >>>> GenomicRatioSet object and finally used this object and a design >>>> matrix to >>>> run blockFinder(). >>>> See example: >>>> >>>> As in the vignette: >>>> library(IlluminaHumanMethylation450kmanifest) >>>> library(IlluminaHumanMethylation450kanno.ilmn12.hg19) >>>> library(minfi) >>>> require(minfiData) >>>> >>>> baseDir <- system.file("extdata", package = "minfiData") >>>> targets <- read.450k.sheet(baseDir) >>>> targets >>>> >>>> sub(baseDir, "", targets$Basename) >>>> RGset <- read.450k.exp(base = baseDir, targets = targets) >>>> RGset >>>> pd <- pData(RGset) >>>> pd[,1:4] >>>> >>>> Then >>>> >>>> MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = >>>> "controls") >>>> Mset.swan <- preprocessSWAN(RGset, MSet.norm) >>>> >>>> mset <- Mset.swan >>>> >>>> >>>> > mset >>>> MethylSet (storageMode: lockedEnvironment) >>>> assayData: 485512 features, 6 samples >>>> element names: Meth, Unmeth >>>> phenoData >>>> sampleNames: 5723646052_R02C02 5723646052_R04C01 ... >>>> 5723646053_R06C02 (6 total) >>>> varLabels: Sample_Name Sample_Well ... filenames (13 total) >>>> varMetadata: labelDescription >>>> Annotation >>>> array: IlluminaHumanMethylation450k >>>> annotation: ilmn12.hg19 >>>> Preprocessing >>>> Method: SWAN (based on a MethylSet preprocesses as 'Illumina, >>>> bg.correct >>>> = TRUE, normalize = controls, reference = 1' >>>> minfi version: 1.8.1 >>>> Manifest version: 0.4.0 >>>> > >>>> >>>> > class(mset) >>>> [1] "MethylSet" >>>> attr(,"package") >>>> [1] "minfi" >>>> > gmSet = mapToGenome(mset) >>>> > class(gmSet) >>>> [1] "GenomicMethylSet" >>>> attr(,"package") >>>> [1] "minfi" >>>> > gmSet_cpgCollapse=cpgCollapse(gmSet,what="M",returnBlockInfo=FALSE) >>>> [cpgCollapse] Creating annotation. >>>> Error in cpgCollapseAnnotation(gr, relationToIsland, islandName, maxGap >>>> = >>>> maxGap, : >>>> object has to be ordered. >>>> >>>> Not sure about the following !? But I seems that sorting the object >>>> works >>>> > gmSet_s=sort(gmSet) >>>> > gmSet_cpgCollapse=cpgCollapse(gmSet_s,what="M",returnBlockInfo=FALSE) >>>> [cpgCollapse] Creating annotation. >>>> [cpgCollapseAnnotation] Clustering islands and clusters of probes. >>>> [cpgCollapseAnnotation] Computing new annotation. >>>> [cpgCollapseAnnotation] Defining blocks. >>>> [cpgCollapse] Collapsing data...... >>>> > class(gmSet_cpgCollapse) >>>> [1] "GenomicRatioSet" >>>> attr(,"package") >>>> [1] "minfi" >>>> >>>> ## Sample groups >>>> ## > targets[,"Sample_Group"] >>>> ## [1] "GroupA" "GroupA" "GroupB" "GroupB" "GroupA" "GroupB" >>>> > design <- model.matrix(~ 0+factor(c(1,1,2,2,1,2))) >>>> > colnames(design) <- c("groupA", "groupB") >>>> > head(design) >>>> groupA groupB >>>> 1 1 0 >>>> 2 1 0 >>>> 3 0 1 >>>> 4 0 1 >>>> 5 1 0 >>>> 6 0 1 >>>> >>>> > Block_g2=blockFinder(gmSet_cpgCollapse, design=design, coef = 2, what >>>> = >>>> "M",pickCutoff=TRUE,smooth=FALSE) >>>> [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.1). >>>> [bumphunterEngine] Computing coefficients. >>>> [bumphunterEngine] Performing 1000 permutations. >>>> [bumphunterEngine] Computing marginal permutation p-values. >>>> [bumphunterEngine] cutoff: 12033998108438140 >>>> [bumphunterEngine] Finding regions. >>>> [bumphunterEngine] No bumps found! >>>> Error in res$table$indexStart : $ operator is invalid for atomic vectors >>>> >>>> I assume that if some bumps are found it would not return an error!? >>>> Would you advice to smooth the signal or not? >>>> >>>> Is that the proper way of running this analysis ? (I wonder since I did >>>> not >>>> find any example for such analysis) >>>> >>>> - Finally I noticed that in the cpgCollapse function the attribute >>>> Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name from a data frame >>>> return by getAnnotation object are called. >>>> However looking at the result of the getAnnotation() function called >>>> there >>>> isn't any Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name columns >>>> attributes (see below)? Are we supposed to use another annotation >>>> package? >>>> >>>> >>>> > cpgCollapse >>>> function (object, what = c("Beta", "M"), maxGap = 500, blockMaxGap = >>>> 2.5 * >>>> 10^5, maxClusterWidth = 1500, dataSummary = colMeans, na.rm = FALSE, >>>> returnBlockInfo = TRUE, verbose = TRUE, ...) >>>> { >>>> what <- match.arg(what) >>>> gr <- granges(object) >>>> ann <- getAnnotation(object) >>>> relationToIsland <- ann$Relation_to_UCSC_CpG_Island >>>> islandName <- ann$UCSC_CpG_Islands_Name >>>> .... >>>> .... >>>> } >>>> >>>> class(gmSet) >>>> gmSet >>>> > class(gmSet) >>>> [1] "GenomicMethylSet" >>>> attr(,"package") >>>> [1] "minfi" >>>> > gmSet >>>> class: GenomicMethylSet >>>> dim: 485512 6 >>>> exptData(0): >>>> assays(2): Meth Unmeth >>>> rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923 >>>> rowData metadata column names(0): >>>> colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02 >>>> 5723646053_R06C02 >>>> colData names(13): Sample_Name Sample_Well ... Basename filenames >>>> Annotation >>>> array: IlluminaHumanMethylation450k >>>> annotation: ilmn12.hg19 >>>> Preprocessing >>>> Method: SWAN (based on a MethylSet preprocesses as 'Illumina, >>>> bg.correct >>>> = TRUE, normalize = controls, reference = 1' >>>> minfi version: 1.8.1 >>>> Manifest version: 0.4.0 >>>> >>>> >>>> A=getAnnotation(gmSet) >>>> colnames(A) >>>> >>>> > A=getAnnotation(gmSet) >>>> > colnames(A) >>>> [1] "chr" "pos" >>>> [3] "strand" "Name" >>>> [5] "AddressA" "AddressB" >>>> [7] "ProbeSeqA" "ProbeSeqB" >>>> [9] "Type" "NextBase" >>>> [11] "Color" "Probe_rs" >>>> [13] "Probe_maf" "CpG_rs" >>>> [15] "CpG_maf" "SBE_rs" >>>> [17] "SBE_maf" "Islands_Name" >>>> [19] "Relation_to_Island" "Forward_Sequence" >>>> [21] "SourceSeq" "Random_Loci" >>>> [23] "Methyl27_Loci" "UCSC_RefGene_Name" >>>> [25] "UCSC_RefGene_Accession" "UCSC_RefGene_Group" >>>> [27] "Phantom" "DMR" >>>> [29] "Enhancer" "HMM_Island" >>>> [31] "Regulatory_Feature_Name" "Regulatory_Feature_Group" >>>> [33] "DHS" >>>> > A$Relation_to_UCSC_CpG_Island >>>> NULL >>>> > A$UCSC_CpG_Islands_Name >>>> NULL >>>> >>>> sessionInfo() >>>> R version 3.0.2 (2013-09-25) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] minfiData_0.4.2 >>>> [2] BiocInstaller_1.12.0 >>>> [3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.0 >>>> [4] IlluminaHumanMethylation450kmanifest_0.4.0 >>>> [5] minfi_1.8.1 >>>> [6] bumphunter_1.2.0 >>>> [7] locfit_1.5-9.1 >>>> [8] iterators_1.0.6 >>>> [9] foreach_1.4.1 >>>> [10] Biostrings_2.30.0 >>>> [11] GenomicRanges_1.14.1 >>>> [12] XVector_0.2.0 >>>> [13] IRanges_1.20.0 >>>> [14] reshape_0.8.4 >>>> [15] plyr_1.8 >>>> [16] lattice_0.20-24 >>>> [17] Biobase_2.22.0 >>>> [18] BiocGenerics_0.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] annotate_1.40.0 AnnotationDbi_1.24.0 base64_1.1 >>>> [4] beanplot_1.1 codetools_0.2-8 DBI_0.2-7 >>>> [7] digest_0.6.3 doRNG_1.5.5 genefilter_1.44.0 >>>> [10] grid_3.0.2 illuminaio_0.4.0 itertools_0.1-1 >>>> [13] limma_3.18.0 MASS_7.3-29 matrixStats_0.8.12 >>>> [16] mclust_4.2 multtest_2.18.0 nlme_3.1-111 >>>> [19] nor1mix_1.1-4 pkgmaker_0.17.4 preprocessCore_1.24.0 >>>> [22] R.methodsS3_1.5.2 RColorBrewer_1.0-5 registry_0.2 >>>> [25] rngtools_1.2.3 RSQLite_0.11.4 siggenes_1.36.0 >>>> [28] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 >>>> [31] survival_2.37-4 tools_3.0.2 XML_3.98-1.1 >>>> [34] xtable_1.7-1 >>>> >>>> >>>> Thank you very much for developing these very useful packages! >>>> Best, >>>> Florence >>>> >>>> -- >>>> Florence Cavalli, PhD. >>>> The Hospital for Sick Children >>>> Brain Tumour Research Centre >>>> 101 College Street, TMDT-11-701 >>>> Toronto, ON M5G1L7 >>>> Canada >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> > [[alternative HTML version deleted]]

ADD REPLY • link 10.5 years ago Florence Cavalli ▴ 50

Login before adding your answer.