Hello,
In RNA sequencing data, I want to adjust for the covariates such as age, BMI, sex etc. I have created colData with these covariates and factorised (for binary) and scaled (for continuous) them. My concern is I have normalised the data and then applying adjustment. However, during adjustment using both voom and removeBatchEffect, will they double adjust the data or is it the right way to perform this. Please see below the functions used.
design_matrix <- model.matrix(~Age + Gender + BMI +
Batch + Condition, data=colData)
v <- voom(raw_counts, design_matrix, plot=TRUE)
fit <- lmFit(v, design_matrix)
adjusted_counts <- removeBatchEffect(v$E, covariates=phenodata[, c("Age", "BMI", "Batch")]).
This is because I wish to use this matrix for correlation studies and not for differential gene expression analysis.
Is there any other way to adjust for covariates?
Thank you,
Thank you so much for the response and for the development of such crucial functions: voom() and removeBatchEffect().
I have factorized the covariates so they remain in numeric form and then using them into design matrix. Something like the following,
factorize variables to keep it numeric
I am not sure if age and BMI should be scaled or factorized.
Ah, no, that's not right. You have misinterpreted my comment about passing a non-numeric matrix to the
covariate
argument.You don't need to recode or scale any of the variables. Just use the code exactly as in my answer with the variables as they already are in your phenodata or colData data.frames. The model.matrix function will convert the factors into a numeric matrix in the correct way. You do not need to recode the factors to numeric yourself, nor should you do so.
Sure, thank you so much, will impliment codes as you suggested.