This is a relatively theoretical question given my naiveness in reading the math details from the article for both SVA and WGCNA:
My main goal is to do a correlation analysis between genes via WGCNA (not limited to use this package from R) on 5 existing RNAseq datasets from different groups. Therefore, this is also a meta-analysis. I found that the batch effect is obvious (shown by a simple PCA plot). I am thinking to remove the batch effects by SVA (combat-seq).
My question is whether removing the batch effects by combat-seq (or any other means) will artificially introduce some correlation between the genes? In other words, whether the algorithm used by these batch effect removal tools will impute the gene expression using "correlation-related" ways, thus make every gene correlates to each other?
I do anticipate removing batch effects will strengthen the p-value of correlation for some true positive genes, but I am also afraid batch effect removal will generate many artifacts. Because my application is not to get the differentially expressed genes but to do analysis based on correlation.
Obviously, I may just mess up the concept between imputing missing value and batch effect correction. Some explanation of the differences between these two "data cleaning" tools will be greatly appreciated. Thanks!