We are doing the QC and normalization of methylation data using minfi. We have decided to apply Functional normalization to get the beta values. We run the function with the default values (nPCs = 2) and, when we plot a PCA of the resulting beta values, we appreciate that they are still associated to some technical variables. To solve this, we have thought to rerun Functional Normalization by increasing the number of PCs. However, the documentation is very cryptic:
"The number k of principal components can be set by the argument
nPCs. By default
nPCsis set to 2, and have been shown to perform consistently well across different datasets. This parameter should only be modified by expert users."
In the paper where they described the method, they used it in three datasets using different parameters:
"As described above, we recommend using functional normalization with the number of principal components set to m=2. Additional file 1: Figure S8 shows the impact of varying the number of principal components on various performance measures we have used throughout, and shows that m=2 is a good choice for the data sets we have analyzed. It is outperformed by m=6 in the analysis of the KIRC data and by m=3 in the analysis of the AML data, but these choices perform worse in the analysis of the Ontario-EBV data. While m=2 is a good choice across data sets, we leave m to be a user-settable parameter in the implementation of the algorithm."
However, it is not explained when increasing the number of PCs might be advisable.
Could anyone give me a clue of how can I modify this parameter? Our dataset consists on 1500 children samples from general population. The DNA was obtained from blood.