Question: estimateCellCounts Visualisations - Adjusting Beta Values for Plotting
0
4.3 years ago by
United Kingdom
andrew.j.skelton73320 wrote:

Hi,

I've got some blood 450K methylation data and I've used the estimateCellCounts function in Minfi to include as terms in my model for a differential methylation test. I just wanted to be sure that what I'm doing is correct - I want to get adjust beta values for visualisations.

Code:

foo <- estimateCellCounts(rgSet            = raw.idat,
meanPlot         = T,
returnAll        = T,
fixOutliers      = T,
mergeManifest    = T)

treatments           <- unique(pData(lumi.norm.filtered)$SampleType) treatment_arrays <- as.factor(pData(lumi.norm.filtered)$SampleType)
cell_counts          <- as.data.frame(foo$counts) design <- model.matrix(~0 + treatment_arrays + cell_counts$CD8T
+ cell_counts$CD4T + cell_counts$NK
+ cell_counts$Bcell + cell_counts$Mono
+ cell_counts$Gran) colnames(design) <- c(treatments, colnames(cell_counts)) fitM <- lmFit(getM(lumi.norm.filtered), design) #Adjust M Values mAdj.in <- getM(lumi.norm.filtered) mAdj.fit <- fitM$coefficients[,-c(1,2)]
#Use Lumi's M value to Beta Value conversion
#or

Does that seem logical/ sane?

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] FlowSorted.Blood.450k_1.6.0                        RPMM_1.20                                          cluster_2.0.3
[4] dynamicTreeCut_1.62                                impute_1.42.0                                      IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1
[7] IlluminaHumanMethylation450kmanifest_0.4.0         biomaRt_2.24.0                                     GenomicFeatures_1.20.1
[10] AnnotationDbi_1.30.1                               biovizBase_1.16.0                                  ggbio_1.16.1
[13] cowplot_0.5.0                                      gridExtra_2.0.0                                    doParallel_1.0.8
[16] scales_0.2.5                                       reshape2_1.4.1                                     ggplot2_1.0.1
[19] limma_3.24.14                                      lumi_2.20.2                                        minfi_1.14.0
[22] bumphunter_1.8.0                                   locfit_1.5-9.1                                     iterators_1.0.7
[25] foreach_1.4.2                                      Biostrings_2.36.1                                  XVector_0.8.0
[28] GenomicRanges_1.20.5                               GenomeInfoDb_1.4.1                                 IRanges_2.2.5
[31] S4Vectors_0.6.2                                    lattice_0.20-33                                    Biobase_2.28.0
[34] BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] nlme_3.1-121             bitops_1.0-6             matrixStats_0.14.2       RColorBrewer_1.1-2       tools_3.2.1              doRNG_1.6
[7] nor1mix_1.2-0            affyio_1.36.0            rpart_4.1-10             KernSmooth_2.23-15       Hmisc_3.16-0             DBI_0.3.1
[13] mgcv_1.8-7               colorspace_1.2-6         nnet_7.3-10              methylumi_2.14.0         GGally_0.5.0             base64_1.1
[19] preprocessCore_1.30.0    graph_1.46.0             pkgmaker_0.22            labeling_0.3             rtracklayer_1.28.6       genefilter_1.50.0
[25] quadprog_1.5-5           affy_1.46.1              RBGL_1.44.0              stringr_1.0.0            digest_0.6.8             Rsamtools_1.20.4
[31] foreign_0.8-65           illuminaio_0.10.0        siggenes_1.42.0          GEOquery_2.34.0          dichromat_2.0-0          BSgenome_1.36.2
[37] RSQLite_1.0.0            BiocInstaller_1.18.4     mclust_5.0.2             BiocParallel_1.2.9       acepack_1.3-3.3          VariantAnnotation_1.14.6
[43] RCurl_1.95-4.7           magrittr_1.5             Formula_1.2-1            futile.logger_1.4.1      Matrix_1.2-2             Rcpp_0.11.6
[49] munsell_0.4.2            proto_0.3-10             stringi_0.5-5            nleqslv_2.8              MASS_7.3-43              zlibbioc_1.14.0
[55] plyr_1.8.3               splines_3.2.1            multtest_2.24.0          annotate_1.46.1          beanplot_1.2             rngtools_1.2.4
[61] codetools_0.2-14         futile.options_1.0.0     XML_3.98-1.3             latticeExtra_0.6-26      lambda.r_1.1.7           gtable_0.1.2
[67] reshape_0.8.5            xtable_1.7-4             survival_2.38-3          OrganismDbi_1.10.0       GenomicAlignments_1.4.1  registry_0.3
limma minfi • 1.5k views
modified 3.8 years ago by Smeeta Shrestha20 • written 4.3 years ago by andrew.j.skelton73320
2
4.3 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I can't speak for the rest of the code, but there's a couple of things to point out with using raw cell counts as covariates. The first is that you're fitting to M-values, which is a log-transformed ratio of counts. If, for example, there's an increase in the cell count from 10 to 100 between two samples, you're effectively asking for a change in the M-value of 10*B between samples (where B is the estimated coefficient for the cell count covariate). This is equivalent to a 2^(10*B) = 1024^B change in the relevant counts (assuming the original log-transformation was base-2), which is pretty extreme for any non-zero value of B.

Possibly a better solution would be to log-transform the cell counts before using them as covariates. In the example above, the logged counts would have a change of log2(10) between samples. This equates to a change in the M-value of log2(10)*B, which is equivalent to a 10^B change in the relevant counts. This is less extreme and may provide a better fit to the data. For example, if the cell type was such that all bases were methylated, a 10-fold increase in the cell count for this type would result in a 10-fold increase in the methylated counts. This would be modelled by B=1 (ignoring proportionality effects for simplicity).

Another thing to keep in mind is that you're putting a lot of covariates into your design. This may confound the interpretation of your treatment effects in treatment_arrays. For example, if one sample type is associated with greater numbers of one cell type, any sample-specific differences (e.g., the treatment effect of interest) may be absorbed into the estimate of the coefficient for the count covariate of that cell type. As a result, you may end up with few significant features during differential testing for sample effects.

1
In addition to Aaron's comments I would add that the first thing I would do is to visualize the distributions of cell types between cases and controls. I believe we have great evidence that this estimation procedure is good enough for uncovering systematic differences between two groups (with a decent number of samples). However, assuming you uncover a difference between cases and controls, I am less convinced that we know the best way to address this. It is not clear to me that just adding terms in a linear model is the way to go. You could also try the various methods for reference free correction. Best, Kasper On Wed, Jul 29, 2015 at 8:14 PM, Aaron Lun [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Aaron Lun <https: support.bioconductor.org="" u="" 6732=""/> wrote Answer: > estimateCellCounts Visualisations - Adjusting Beta Values for Plotting > <https: support.bioconductor.org="" p="" 70559="" #70587="">: > > I can't speak for the rest of the code, but there's a couple of things to > point out with using raw cell counts as covariates. The first is that > you're fitting to M-values, which is a log-transformed ratio of counts. If, > for example, there's an increase in the cell count from 10 to 100 between > two samples, you're effectively asking for a change in the M-value of 10*B > between samples (where B is the estimated coefficient for the cell count > covariate). This is equivalent to a 2^(10*B) change in the relevant > counts (assuming the original transformation was base-2), which is pretty > extreme for any non-zero value of B. > > Possibly a better solution would be to log-transform the cell counts > before using them as covariates. In the example above, the logged counts > would have a change of log2(10) between samples. This equates to a change > in the M-value of log2(10)*B, which is equivalent to a 10^B change in the > relevant counts. This is less extreme and may provide a better fit to the > data. For example, if the cell type was such that all bases were > methylated, a 10-fold increase in the cell count for this type would result > in a 10-fold increase in the methylated counts. This would be modelled by > B=1 (ignoring proportionality effects for simplicity). > > Another thing to keep in mind is that you're putting a lot of covariates > in. This may confound the interpretation of your treatment effects. For > example, if one sample type is associated with greater numbers of one cell > type, any sample-specific differences (e.g., the treatment effect of > interest) may be absorbed into the estimate of the coefficient for the > count covariate of that cell type. As a result, you may end up with few > significant genes during testing for sample effects. > > ------------------------------ > > Post tags: minfi, limma > > You may reply via email or visit > A: estimateCellCounts Visualisations - Adjusting Beta Values for Plotting >

Hi Kasper:

Just wanted to point out that you might want to update the estimateCellCounts.Rd file to provide the correct citation/reference info for the "Accounting for cellular heterogeneity is critical in epigenome-wide association studies" paper. Currently it says that the paper is "Under Review"

0
3.8 years ago by
Smeeta Shrestha20 wrote:

Hi  Andrew ,

1. mAdj.fit <- fitM$coefficients[,-c(1,2)] 2. mAdj <- as.matrixmAdj.in) - mAdj.fit %*% t(design[,-c(1,2)]) what the [,-c(1,2)] means in the syntax 1 and 2 above. If i understand correctly does it mean removing columns 1 and 2 from the data matrix ? Thank you smeeta ADD COMMENTlink written 3.8 years ago by Smeeta Shrestha20 It means you remove the first and second column, correct. What you need to do is remove the columns which contain your treatments. So if you have 3 treatments, and they're the first three columns in your design matrix, then you'd remove 1:3. Check the columns of your design matrix to be sure. On Thursday, 21 January 2016, Smeeta Shrestha [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Smeeta Shrestha <https: support.bioconductor.org="" u="" 6393=""/> wrote Answer: > estimateCellCounts Visualisations - Adjusting Beta Values for Plotting > <https: support.bioconductor.org="" p="" 70559="" #77248="">: > > Hi Andrew , > > Could you please elaborate > > 1. mAdj.fit <- fitM$coefficients[,-c(1,2)] > > 2. mAdj <- as.matrixmAdj.in) - mAdj.fit %*% t(design[,-c(1,2)]) > > what the [,-c(1,2)] means in the syntax 1 and 2 above. > > > > If i understand correctly does it mean removing columns 1 and 2 from the > data matrix ? > > > > Thank you > > smeeta > > ------------------------------ > > Post tags: minfi, limma > > You may reply via email or visit > A: estimateCellCounts Visualisations - Adjusting Beta Values for Plotting >
0
3.8 years ago by
Smeeta Shrestha20 wrote:

Hi Andrew,

I have a question. Why are you dropping the intercept in your model.

you design is showing

design <- model.matrix (~0+treatment_arrays+ ..........)  and not

design <- model.matrix(~treatment_arrays + ..........)

Since I tried to adjust for blood count in similiar manner, but on using two designs I get different "adjusted beta values".

Thank you

smeeta