So I have a dataset that consists of the batch correction through RUV-normalization of several microarray datasets containing tumoral and non-tumoral samples. The data is in Log2 RUV-normalized expression. I want to perform differential expression analysis. Is the limma package in R fit for this?
From what I've read the limma package expects Log2 expression data without normalization, but some tutorials I also find use normalized data.
Thank you very much to all!
Hello Gordon.Thank you for your answer. Yes, my dataset is large so I think I'm good, but have to have into account those characteristics you mentiones.
My next step was to perform GSEA. However I only get signifcant p-values and no significant adjusted p-values. Do you think it may be because of that artificial homogeinity you refered?
No, that is not the reason. Homogeneity from batch correction tends to produce too many DE results rather than too few.
I used the ClusterProfiler package to perform the GSEA analysis as follows. Do you find something wrong with it? I have a panel of 630 genes and I want to do the GSEA with the ImmuneSigDB gene collection.
My batch-corrected data looks like this:
Hi Gordon,
What is the difference between "batch-corrected data" and "surrogate variables" (generated by RUV)? Could you elaborate the design with those surrogate or point to an example please ?
Best, Samuel