Hello all,
I am interested in details about the random component in probe selection at both the intra- and inter-array levels for preprocessSWAN() in minfi.
From the original paper I don't notice any explanation of explicit inter-array normalization, just that the intra-array normalization reduces technical variability between arrays. However, I find the following clause in documentation for minfi's preprocessSWAN function:
"SWAN uses a random subset of probes to do the between array normalization. In order to achive reproducible results, the seed needs to be set using set.seed."
So I am wondering:
1. Why is it unnecessary to set the seed for intra-array normalization and not just inter-array normalization? Is this because differences in replication are negligible? Should I be worried about the effects of random intra-array probe selection in hindering reproducibility?
2. What is happening in the inter-array normalization? I understand SWAN (Subset-Quantile Within Array Normalization) selects subsets of probes with varying levels of internal CpGs in order to define a kind of intensity distributions of each assay type, to which remaining probes on the array of subset are normalized. So what aspect of this process is used in the inter-array normalization?
Thanks as always.
best,
Sean
Thank you very much for your reply.
I notice an error thrown when I attempt to normalize a single array using preprocessSWAN(), which seems like it shouldn't happen if the normalization is only at the intra-array level (please see below).
Am I correct in that you are referring to the Functional Normalization paper by Fortin et al 2014 (PMID: 25599564)?
The only concern with preprocessFunnorm is that, as a newer method, relatively fewer papers appear to be out that implement this method, based on a search of PubMed and PMC. In the interests of comparability between 450k results, it struck me as helpful to have consistent normalization strategies between studies. I realize there is currently no consensus as to which strategy to use.
thanks,
Sean
### preprocessSWAN, single-array error ###
When you see an error like
that is an indication that the expectation was that you would be passing in a matrix or data.frame, but the function got a vector instead. This is not what I would call a bug per se, because why would anybody ever do a methylation experiment with a single sample?
In other words, there isn't code in
preprocessSWAN()
to ensure that data from a single sample doesn't get reduced to a vector (the canonical thing is when you subset using '['. If you domatrix[,someindicator]
and 'someindicator' is of length 1, then that will return a vector. If you domatrix[,someindicator,drop=FALSE]
, then you won't lose dimensions in your return object.). But this doesn't have anything to do with whether or notpreprocessSWAN()
is an intra or inter-array normalization. It's just an artifact of the fact that it never occurred to Kasper that somebody might try to analyze a single array, so there isn't anything in the code to catch that. In fact, the error occurs here:bgIntensitySwan <- function(rgSet){
grnMed <- matrixStats::colMedians(getGreen(rgSet)[getControlAddress(rgSet, controlType = "NEGATIVE"), ])
redMed <- matrixStats::colMedians(getRed(rgSet)[getControlAddress(rgSet, controlType = "NEGATIVE"), ])
return(rowMeans(cbind(grnMed, redMed)))
}
which is just the background estimation step.
I am not sure why you are trying to normalize a single array, because that doesn't really make any sense to me. But perhaps you were trying to convince yourself that
preprocessSWAN()
is in fact a within-array normalization. If so, why not just look at the code? The entire function is like 30 lines or so, and the relevant portion goes like this:Which I think is a pretty clear.
This makes sense, thank you.
best,
Sean