Question

QC/normalization standards for cancer studies with 450k array data

0

Entering edit mode

metamaden ▴ 10

@madensean-8348

Last seen 5 months ago

United States

I work in cancer research with 450k array methylation data. I have been reading about normalization in 450k arrays, with a mind towards deducing what factors in study design are important to determining my upstream workflow. A lot of the literature focuses on technical and analytical reproducibility, and some of these include study of cancer cohorts.

My current workflow, starting from IDATs includes: Illumina normalization -> SWAN normalization -> filtering on intensity p-value -> Gset and filtering on probe type -> ComBat -> Analysis.

SWAN seems to be pretty commonly used, but I am seeing more recent studies using BMIQ instead. Given current concern about cell type heterogeneity (and recommendations *not* to use quantile filtering in cancer/instances where global differential methylation is expected), I am interested in the appropriate normalization method(s) to use.

I realize it is important to validate array findings wherever possible with technical replicates and more high-fidelity approaches like sequencing. Also it is vital to tailor your approach to the particular conditions of your study and not go by the book or some blanket one-size-fits all approach. However, it is also important that results are replicable, and an important way of doing this would be to know to what extent there is a standard for upstream data processing. Maybe there isn't a particular "right" approach, but maybe too there are trends in how labs process their data that should be known in order to assess independent findings side-by-side.

A further consideration: why isn't it more common to chain together normalizations that would seem to complement one another? (ie. Noob->SWAN->BMIQ seems logical for background correction->within-array normalization->between-array normalization).

Thanks Bioconductor community!

Sean

smaden@fredhutch.org

minfi watermelon cancer 450k normalization • 1.9k views

ADD COMMENT • link updated 8.8 years ago by Kasper Daniel Hansen ★ 6.5k • written 8.8 years ago by metamaden ▴ 10

score 1 · Accepted Answer · 2015-07-16

Hi Sean, We have developed preprocessFunnorm in the minfi package for this exact usecase (cancer studies). It is described and extensively evaluated in JP Fortin, A Labbe, M Lemire, BW Zanke, TJ Hudson, EJ Fertig, CMT Greenwood and KD Hansen. Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biology 2014, 15:503. http://dx.doi.org/10.1186/s13059-014-0503-2 In this paper we show, amongst other things, that methods which do best at reducing say variability between technical replicates, not necessarily do best when the aim is to replicate findings across studies. This is worth keeping in mind, when you think about evaluating different methods. A close competitor to preprocessFunnorm is the NOOB method (preprocessNoob in minfi) which is pretty amazing, yet not utilized much in the literature I think. As you do, I would recommend to combine normalization with one of the many methods for removing batch effects. Finally, it is not clear that chaining different normalization methods gives you a better result that the methods by themselves (it might be; it just needs to be investigated). Best, Kasper D. Hansen On Wed, Jul 15, 2015 at 2:25 PM, maden.sean [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User maden.sean <https: support.bioconductor.org="" u="" 8348=""/> wrote Question: > QC/normalization standards for cancer studies with 450k array data > <https: support.bioconductor.org="" p="" 69990=""/>: > > I work in cancer research with 450k array methylation data. I have been > reading about normalization in 450k arrays, with a mind towards deducing > what factors in study design are important to determining my upstream > workflow. A lot of the literature focuses on technical and analytical > reproducibility, and some of these include study of cancer cohorts. > > My current workflow, starting from IDATs includes: Illumina normalization > -> SWAN normalization -> filtering on intensity p-value -> Gset and > filtering on probe type -> ComBat -> Analysis. > > SWAN seems to be pretty commonly used, but I am seeing more recent studies > using BMIQ instead. Given current concern about cell type heterogeneity > (and recommendations *not* to use quantile filtering in cancer/instances > where global differential methylation is expected), I am interested in the > appropriate normalization method(s) to use. > > I realize it is important to validate array findings wherever possible > with technical replicates and more high-fidelity approaches like > sequencing. Also it is vital to tailor your approach to the particular > conditions of your study and not go by the book or some blanket > one-size-fits all approach. However, it is also important that results are > replicable, and an important way of doing this would be to know to what > extent there is a standard for upstream data processing. Maybe there isn't > a particular "right" approach, but maybe too there are trends in how labs > process their data that should be known in order to assess independent > findings side-by-side. > > A further consideration: why isn't it more common to chain together > normalizations that would seem to complement one another? (ie. > Noob->SWAN->BMIQ seems logical for background correction->within-array > normalization->between-array normalization). > > Thanks Bioconductor community! > > Sean > > smaden@fredhutch.org > > ------------------------------ > > Post tags: minfi, watermelon, cancer, 450k, normalization > > You may reply via email or visit QC/normalization standards for cancer studies with 450k array data >