Question

ComBat function (sva) generates very low adjusted p.values for a big number of genes

0

Entering edit mode

Alexandra Popa ▴ 10

@alexandra-popa-5996

Last seen 9.6 years ago

Dear list, I have recently come along the ComBat function from the sva package. I am currently analyzing Agilent one-color microarray data and I had strong batch effects. No normalization method could get rid of this effect and the data clustering showed it clearly. The reasons for this batch effects are multiple, but the bottom line is that I want to control them. Data experiment: 6 types of samples named: pp4, pp2, p4, p2, p0 and pn, with 3, 2, 3, 3, 2, and 1 replica respectively. There are a total of 5 batches. So I did the following pipeline: - normalize the data using quantile - apply ComBat on the data For ComBat, I use the following file, hereafter called "phenoBatches.txt" that contains the info on batches and covariates: "phenoBatches.txt" sample outcome batch cancer 1 pp4_1 14430 pp4 2 pp2_1 14430 pp2 3 p4_1 14407 p4 4 p2_1 14407 p2 5 pp4_2 10957 pp4 6 p4_2 10957 p4 7 p2_2 10957 p2 8 pp4_3 12957 pp4 9 p4_3 12957 p4 10 pp2_2 12959 pp2 11 p2_3 12957 p2 12 p0_1 12959 p0 13 p0_2 12959 p0 14 pn_1 12959 pn As explained in the documentation, I build a model matrix: mod=model.matrix(~as.factor(cancer),data=batchInfo) (Intercept) as.factor(cancer)p2 as.factor(cancer)p4 as.factor(cancer)pn as.factor(cancer)pp2 as.factor(cancer)pp4 pp4_1 1 0 0 0 0 1 pp2_1 1 0 0 0 1 0 p4_1 1 0 1 0 0 0 p2_1 1 1 0 0 0 0 pp4_2 1 0 0 0 0 1 p4_2 1 0 1 0 0 0 p2_2 1 1 0 0 0 0 pp4_3 1 0 0 0 0 1 p4_3 1 0 1 0 0 0 pp2_2 1 0 0 0 1 0 p2_3 1 1 0 0 0 0 p0_1 1 0 0 0 0 0 p0_2 1 0 0 0 0 0 pn_1 1 0 0 1 0 0 And then I apply ComBat as follows: >EafterComBat = ComBat(dat=E_norm$E, batch=batchInfo$batch, mod=mod, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE) Then, I put the new ComBat-adjusted values in the E_norm object: E_norm$E=EafterComBat Now the clustering is super and the batch effect has clearly been controlled for. I continue with my pipeline to get the differentially expressed genes with limma. The model matrix hereafter is: >design <- model.matrix(~-1+factor(targets$Subject)) pp4 pp2 p4 p2 p0 pn 1 1 0 0 0 0 0 2 0 1 0 0 0 0 3 0 0 1 0 0 0 4 0 0 0 1 0 0 5 1 0 0 0 0 0 6 0 0 1 0 0 0 7 0 0 0 1 0 0 8 1 0 0 0 0 0 9 0 0 1 0 0 0 10 0 1 0 0 0 0 11 0 0 0 1 0 0 12 0 0 0 0 1 0 13 0 0 0 0 1 0 14 0 0 0 0 0 1 The analysis of differentially expressed genes: fit <- lmFit(E_norm,design=design) fit <- eBayes(fit) with the contrasts of interest: > contrastMatrix Contrasts Levels pp4-pp2 pp2-p0 pp4-p0 pp2-p2 pp4-p4 pn-p0 p2-p0 p2-p4 pp4 1 0 1 0 1 0 0 0 pp2 -1 1 0 1 0 0 0 0 p4 0 0 0 0 -1 0 0 -1 p2 0 0 0 -1 0 0 1 1 p0 0 -1 -1 0 0 -1 -1 0 pn 0 0 0 0 0 1 0 0 >fit2 <- contrasts.fit(fit, contrastMatrix) >fit2 <- eBayes(fit2) And then the topTable function. However, the number of genes with the ajusted.p.values <0.05 is huge. More than 70% of my genes are coming out as differentially expressed, while before ComBat, none was. Moreover, the differences in intensities are not particularly high between classes, and the logFC values are inferieur to 1 in absolute values ... So my question is, why do I get such "good" adjusted p-values? Is there something I do wrong, maybe in what concerns the way I use ComBat (the model matrix, the normalization ...)? Is my design experiment not fit for the use of ComBat? I should mention that I applied ComBat on raw data and on normalized data and the result is the same. Any suggestion is welcome and I thank you in advance for your time. Sincerely yours, Alexandra [[alternative HTML version deleted]]

Microarray Normalization Clustering limma sva Microarray Normalization Clustering limma • 1.1k views

ADD COMMENT • link 10.9 years ago Alexandra Popa ▴ 10