Question: Getting wrong rownames of MSet object when using pfilter from watermelon
0
gravatar for anne-kristin.stavrum
24 days ago by
anne-kristin.stavrum0 wrote:

I use minfi to import my EPIC data, and use pfilter from wateRmelon as the next step to remove probes and samples with high detection p-values and low bead counts etc.

I submit the RGset object to pfilter and get an Mset object back.

Mset.pf <- pfilter(RGset, perCount = 5, pnthresh = 0.01, perc = 1, pthresh = 1)

When I do this on my computer, the rownames of the Mset.pf object are of the form cg05575921, but if i do the same on another computer i get e.g. 1600101, which I think may be some address keys from illumina.

Does anyone know why i get these other numbers? Or how I can convert them to cg-numbers?

I can see that the Illumina Manifest file gets loaded...

Kind regards, Anne-Kristin

Ps my computer is not powerful enough to do the preprocessing on the full dataset, so this is why I need it to work on another computer...

ADD COMMENTlink modified 24 days ago by lschal0 • written 24 days ago by anne-kristin.stavrum0

Thank you so much for your help. Session info for the system where I get the weird numbers are below I have close to 1300 samples, and will definitely try bigmelon.

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS

Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LCCTYPE=enDK.UTF-8 LCNUMERIC=C [3] LCTIME=enDK.UTF-8 LCCOLLATE=enDK.UTF-8 [5] LCMONETARY=nbNO.UTF-8 LCMESSAGES=enDK.UTF-8 [7] LCPAPER=nbNO.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C [11] LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 [2] wateRmelon1.30.0 [3] illuminaio0.28.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [5] ROC1.62.0 [6] lumi2.38.0 [7] methylumi2.32.0 [8] FDb.InfiniumMethylation.hg192.2.0 [9] org.Hs.eg.db3.10.0 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 [11] GenomicFeatures1.38.0 [12] AnnotationDbi1.48.0 [13] ggplot23.2.1 [14] reshape21.4.3 [15] scales1.1.0 [16] limma3.42.0 [17] minfi1.32.0 [18] bumphunter1.28.0 [19] locfit1.5-9.1 [20] iterators1.0.12 [21] foreach1.4.7 [22] Biostrings2.54.0 [23] XVector0.26.0 [24] SummarizedExperiment1.16.0 [25] DelayedArray0.12.0 [26] BiocParallel1.20.0 [27] matrixStats0.55.0 [28] Biobase2.46.0 [29] GenomicRanges1.38.0 [30] GenomeInfoDb1.22.0 [31] IRanges2.20.0 [32] S4Vectors0.24.0 [33] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.60.0 mclust5.4.5 [4] base642.0 affyio1.56.0 bit640.9-7 [7] xml21.2.2 codetools0.2-16 splines3.6.1 [10] scrime1.3.5 knitr1.26 zeallot0.1.0 [13] Rsamtools2.2.1 annotate1.64.0 dbplyr1.4.2 [16] HDF5Array1.14.0 BiocManager1.30.10 readr1.3.1 [19] compiler3.6.1 httr1.4.1 backports1.1.5 [22] assertthat0.2.1 Matrix1.2-17 lazyeval0.2.2 [25] prettyunits1.0.2 tools3.6.1 affy1.64.0 [28] gtable0.3.0 glue1.3.1 GenomeInfoDbData1.2.2 [31] dplyr0.8.3 rappdirs0.3.1 doRNG1.7.1 [34] Rcpp1.0.3 vctrs0.2.0 multtest2.42.0 [37] preprocessCore1.48.0 nlme3.1-142 rtracklayer1.46.0 [40] DelayedMatrixStats1.8.0 xfun0.11 stringr1.4.0 [43] lifecycle0.1.0 rngtools1.4 XML3.98-1.20 [46] beanplot1.2 nleqslv3.3.2 zlibbioc1.32.0 [49] MASS7.3-51.4 hms0.5.2 rhdf52.30.0 [52] GEOquery2.54.0 RColorBrewer1.1-2 curl4.2 [55] memoise1.1.0 pkgmaker0.27 biomaRt2.42.0 [58] reshape0.8.8 stringi1.4.3 RSQLite2.1.2 [61] genefilter1.68.0 bibtex0.4.2 rlang0.4.1 [64] pkgconfig2.0.3 bitops1.0-6 nor1mix1.3-0 [67] lattice0.20-38 purrr0.3.3 Rhdf5lib1.8.0 [70] GenomicAlignments1.22.1 bit1.1-14 tidyselect0.2.5 [73] plyr1.8.4 magrittr1.5 R62.4.1 [76] DBI1.0.0 pillar1.4.2 withr2.1.2 [79] mgcv1.8-31 survival3.1-7 RCurl1.95-4.12 [82] tibble2.1.3 crayon1.3.4 KernSmooth2.23-16 [85] BiocFileCache1.10.2 progress1.2.2 grid3.6.1 [88] data.table1.12.6 blob1.2.0 digest0.6.22 [91] xtable1.8-4 tidyr1.0.0 openssl1.4.1 [94] munsell0.5.0 registry0.5-1 askpass1.1 [97] quadprog_1.5-7

ADD REPLYlink written 23 days ago by anne-kristin.stavrum0

Session info for my computer where I get the expected cg-numbers:

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.6 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] LCCTYPE=enGB.UTF-8 LCNUMERIC=C LCTIME=enGB.UTF-8 LCCOLLATE=enGB.UTF-8 LCMONETARY=nbNO.UTF-8 LCMESSAGES=enGB.UTF-8 LCPAPER=nbNO.UTF-8 [8] LCNAME=C LCADDRESS=C LCTELEPHONE=C LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 wateRmelon1.28.0 illuminaio0.26.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 ROC1.60.0 lumi2.36.0 [7] methylumi2.30.0 FDb.InfiniumMethylation.hg192.2.0 org.Hs.eg.db3.8.2 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 GenomicFeatures1.36.1 AnnotationDbi1.46.1 [13] ggplot23.2.1 reshape21.4.3 scales1.0.0 [16] limma3.40.6 minfi1.30.0 bumphunter1.26.0 [19] locfit1.5-9.1 iterators1.0.10 foreach1.4.4 [22] Biostrings2.52.0 XVector0.24.0 SummarizedExperiment1.14.0 [25] DelayedArray0.10.0 BiocParallel1.18.0 matrixStats0.54.0 [28] Biobase2.44.0 GenomicRanges1.36.1 GenomeInfoDb1.20.0 [31] IRanges2.18.2 S4Vectors0.22.1 BiocGenerics_0.30.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.58.0 mclust5.4.3 base642.0 rstudioapi0.10 affyio1.54.0 bit640.9-7 [8] xml21.2.0 codetools0.2-16 splines3.6.0 scrime1.3.5 zeallot0.1.0 Rsamtools2.0.0 annotate1.62.0 [15] HDF5Array1.12.1 BiocManager1.30.4 readr1.3.1 compiler3.6.0 httr1.4.0 backports1.1.4 assertthat0.2.1 [22] Matrix1.2-17 lazyeval0.2.2 prettyunits1.0.2 tools3.6.0 affy1.62.0 gtable0.3.0 glue1.3.1 [29] GenomeInfoDbData1.2.1 dplyr0.8.3 doRNG1.7.1 Rcpp1.0.2 vctrs0.2.0 multtest2.40.0 preprocessCore1.46.0 [36] nlme3.1-140 rtracklayer1.44.0 DelayedMatrixStats1.6.0 stringr1.4.0 lifecycle0.1.0 rngtools1.3.1.1 XML3.98-1.20 [43] beanplot1.2 nleqslv3.3.2 zlibbioc1.30.0 MASS7.3-51.4 hms0.5.1 rhdf52.28.0 GEOquery2.52.0 [50] RColorBrewer1.1-2 memoise1.1.0 pkgmaker0.27 biomaRt2.40.0 reshape0.8.8 stringi1.4.3 RSQLite2.1.1 [57] genefilter1.66.0 bibtex0.4.2 rlang0.4.0 pkgconfig2.0.3 bitops1.0-6 nor1mix1.2-3 lattice0.20-38 [64] purrr0.3.2 Rhdf5lib1.6.0 GenomicAlignments1.20.0 bit1.1-14 tidyselect0.2.5 plyr1.8.4 magrittr1.5 [71] R62.4.0 DBI1.0.0 pillar1.4.2 withr2.1.2 mgcv1.8-28 survival2.44-1.1 RCurl1.95-4.12 [78] tibble2.1.3 crayon1.3.4 KernSmooth2.23-15 progress1.2.2 grid3.6.0 data.table1.12.2 blob1.1.1 [85] digest0.6.19 xtable1.8-4 tidyr1.0.0 openssl1.4 munsell0.5.0 registry0.5-1 askpass1.1 [92] quadprog1.5-7

ADD REPLYlink written 23 days ago by anne-kristin.stavrum0

Dear Anne-Kristin:

There were a few changes in the most recent version of wateRmelon that we didn't communicate to users (or each other!) adequately. pfilter now returns a filtered rgchannelset instead of a methylset. The methylset is what you get from preprocessing (the rows then have illumina names and not just row numbers). To get the same thing as you had with the older versions, the object needs to go through preprocessRaw or a normaliser. Obviously, we recommend dasen (Pidsley 2013).

With your numbers of arrays, that's clearly a use case for bigmelon. Get in touch if you have any trouble using it.

best wishes

Leo

ADD REPLYlink written 23 days ago by lschal0

Thank you so much for clarifying! I want to remove the probes that are on the X and Y chromosomes before normalising the data, so will use preprocessRaw before proceeding.

Thanks again! Anne-Kristin

ADD REPLYlink written 22 days ago by anne-kristin.stavrum0
Answer: Getting wrong rownames of MSet object when using pfilter from watermelon
0
gravatar for lschal
24 days ago by
lschal0
University of Essex
lschal0 wrote:

Dear Anne-Kristin:

We need a bit more information to understand what's going on here, could you send me the output of sessionInfo() from both computers (with the packages you are using loaded). Also, how many epic arrays are you working with? Possibly you could do everything on your own computer (if you wish to) using the bigmelon package, which doesn't store everything in RAM.

Leo

ADD COMMENTlink written 24 days ago by lschal0

Thank you so much for your help. Session info for the system where I get the weird numbers are below I have close to 1300 samples, and will definitely try bigmelon.

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS

Matrix products: default BLAS: /usr/lib/x8664-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x8664-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LCCTYPE=enDK.UTF-8 LCNUMERIC=C [3] LCTIME=enDK.UTF-8 LCCOLLATE=enDK.UTF-8 [5] LCMONETARY=nbNO.UTF-8 LCMESSAGES=enDK.UTF-8 [7] LCPAPER=nbNO.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C [11] LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 [2] wateRmelon1.30.0 [3] illuminaio0.28.0 [4] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [5] ROC1.62.0 [6] lumi2.38.0 [7] methylumi2.32.0 [8] FDb.InfiniumMethylation.hg192.2.0 [9] org.Hs.eg.db3.10.0 [10] TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 [11] GenomicFeatures1.38.0
[12] AnnotationDbi
1.48.0
[13] ggplot23.2.1
[14] reshape2
1.4.3
[15] scales1.1.0
[16] limma
3.42.0
[17] minfi1.32.0
[18] bumphunter
1.28.0
[19] locfit1.5-9.1
[20] iterators
1.0.12
[21] foreach1.4.7
[22] Biostrings
2.54.0
[23] XVector0.26.0
[24] SummarizedExperiment
1.16.0
[25] DelayedArray0.12.0
[26] BiocParallel
1.20.0
[27] matrixStats0.55.0
[28] Biobase
2.46.0
[29] GenomicRanges1.38.0
[30] GenomeInfoDb
1.22.0
[31] IRanges2.20.0
[32] S4Vectors
0.24.0
[33] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.60.0 mclust5.4.5 [4] base642.0 affyio1.56.0 bit640.9-7 [7] xml21.2.2 codetools0.2-16 splines3.6.1 [10] scrime1.3.5 knitr1.26 zeallot0.1.0 [13] Rsamtools2.2.1 annotate1.64.0 dbplyr1.4.2 [16] HDF5Array1.14.0 BiocManager1.30.10 readr1.3.1 [19] compiler3.6.1 httr1.4.1 backports1.1.5 [22] assertthat0.2.1 Matrix1.2-17 lazyeval0.2.2 [25] prettyunits1.0.2 tools3.6.1 affy1.64.0 [28] gtable0.3.0 glue1.3.1 GenomeInfoDbData1.2.2 [31] dplyr0.8.3 rappdirs0.3.1 doRNG1.7.1 [34] Rcpp1.0.3 vctrs0.2.0 multtest2.42.0 [37] preprocessCore1.48.0 nlme3.1-142 rtracklayer1.46.0 [40] DelayedMatrixStats1.8.0 xfun0.11 stringr1.4.0 [43] lifecycle0.1.0 rngtools1.4 XML3.98-1.20 [46] beanplot1.2 nleqslv3.3.2 zlibbioc1.32.0 [49] MASS7.3-51.4 hms0.5.2 rhdf52.30.0 [52] GEOquery2.54.0 RColorBrewer1.1-2 curl4.2 [55] memoise1.1.0 pkgmaker0.27 biomaRt2.42.0 [58] reshape0.8.8 stringi1.4.3 RSQLite2.1.2 [61] genefilter1.68.0 bibtex0.4.2 rlang0.4.1 [64] pkgconfig2.0.3 bitops1.0-6 nor1mix1.3-0 [67] lattice0.20-38 purrr0.3.3 Rhdf5lib1.8.0 [70] GenomicAlignments1.22.1 bit1.1-14 tidyselect0.2.5 [73] plyr1.8.4 magrittr1.5 R62.4.1 [76] DBI1.0.0 pillar1.4.2 withr2.1.2 [79] mgcv1.8-31 survival3.1-7 RCurl1.95-4.12 [82] tibble2.1.3 crayon1.3.4 KernSmooth2.23-16 [85] BiocFileCache1.10.2 progress1.2.2 grid3.6.1 [88] data.table1.12.6 blob1.2.0 digest0.6.22 [91] xtable1.8-4 tidyr1.0.0 openssl1.4.1 [94] munsell0.5.0 registry0.5-1 askpass1.1 [97] quadprog_1.5-7

ADD REPLYlink written 24 days ago by anne-kristin.stavrum0

Session info for my computer where I get the expected cg-numbers:

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.6 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] LCCTYPE=enGB.UTF-8 LCNUMERIC=C LCTIME=enGB.UTF-8 LCCOLLATE=enGB.UTF-8 LCMONETARY=nbNO.UTF-8 LCMESSAGES=enGB.UTF-8 LCPAPER=nbNO.UTF-8
[8] LC
NAME=C LCADDRESS=C LCTELEPHONE=C LCMEASUREMENT=nbNO.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest0.3.0 wateRmelon1.28.0 illuminaio0.26.0
[4] IlluminaHumanMethylation450kanno.ilmn12.hg19
0.6.0 ROC1.60.0 lumi2.36.0
[7] methylumi2.30.0 FDb.InfiniumMethylation.hg192.2.0 org.Hs.eg.db3.8.2
[10] TxDb.Hsapiens.UCSC.hg19.knownGene
3.2.2 GenomicFeatures1.36.1 AnnotationDbi1.46.1
[13] ggplot23.2.1 reshape21.4.3 scales1.0.0
[16] limma
3.40.6 minfi1.30.0 bumphunter1.26.0
[19] locfit1.5-9.1 iterators1.0.10 foreach1.4.4
[22] Biostrings
2.52.0 XVector0.24.0 SummarizedExperiment1.14.0
[25] DelayedArray0.10.0 BiocParallel1.18.0 matrixStats0.54.0
[28] Biobase
2.44.0 GenomicRanges1.36.1 GenomeInfoDb1.20.0
[31] IRanges2.18.2 S4Vectors0.22.1 BiocGenerics_0.30.0

loaded via a namespace (and not attached): [1] colorspace1.4-1 siggenes1.58.0 mclust5.4.3 base642.0 rstudioapi0.10 affyio1.54.0 bit640.9-7
[8] xml2
1.2.0 codetools0.2-16 splines3.6.0 scrime1.3.5 zeallot0.1.0 Rsamtools2.0.0 annotate1.62.0
[15] HDF5Array1.12.1 BiocManager1.30.4 readr1.3.1 compiler3.6.0 httr1.4.0 backports1.1.4 assertthat0.2.1
[22] Matrix
1.2-17 lazyeval0.2.2 prettyunits1.0.2 tools3.6.0 affy1.62.0 gtable0.3.0 glue1.3.1
[29] GenomeInfoDbData1.2.1 dplyr0.8.3 doRNG1.7.1 Rcpp1.0.2 vctrs0.2.0 multtest2.40.0 preprocessCore1.46.0
[36] nlme
3.1-140 rtracklayer1.44.0 DelayedMatrixStats1.6.0 stringr1.4.0 lifecycle0.1.0 rngtools1.3.1.1 XML3.98-1.20
[43] beanplot1.2 nleqslv3.3.2 zlibbioc1.30.0 MASS7.3-51.4 hms0.5.1 rhdf52.28.0 GEOquery2.52.0
[50] RColorBrewer
1.1-2 memoise1.1.0 pkgmaker0.27 biomaRt2.40.0 reshape0.8.8 stringi1.4.3 RSQLite2.1.1
[57] genefilter1.66.0 bibtex0.4.2 rlang0.4.0 pkgconfig2.0.3 bitops1.0-6 nor1mix1.2-3 lattice0.20-38
[64] purrr
0.3.2 Rhdf5lib1.6.0 GenomicAlignments1.20.0 bit1.1-14 tidyselect0.2.5 plyr1.8.4 magrittr1.5
[71] R62.4.0 DBI1.0.0 pillar1.4.2 withr2.1.2 mgcv1.8-28 survival2.44-1.1 RCurl1.95-4.12
[78] tibble
2.1.3 crayon1.3.4 KernSmooth2.23-15 progress1.2.2 grid3.6.0 data.table1.12.2 blob1.1.1
[85] digest0.6.19 xtable1.8-4 tidyr1.0.0 openssl1.4 munsell0.5.0 registry0.5-1 askpass1.1
[92] quadprog
1.5-7

ADD REPLYlink written 24 days ago by anne-kristin.stavrum0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 147 users visited in the last hour