Hi all,
I am working on a large dataset consisting of multiple different RNAseq and Microarray studies from different labs and times. While we have a (functional) pipeline setup for this, I recently saw a post which mentioned using voomLmFit to counter the issues which might stem from having excess zeroes in the data.
Since we are combining RNAseq data and microarray data for a combined analysis, we see some of these data-sparsity issues; a number of the genes in the RNAseq dataset are simply not found in the microarray datasets, and not all of our microarray datasets share a full geneset either. Ideally, we would like not to simply remove the partially sparse genes from the dataset, since doing that would drastically reduce the amount of genes available for further analyses.
My question is therefore whether a voomLmFit pipeline could be used for both the RNAseq and microarray data? I.e. is voom transformation of microarray data harmful, and if so is there another way to account for these data-sparsity issues without having to cut down our genesets drastically?
Thanks, Adam
edit: For some more context, we are not combining samples across studies into any single groups. Rather, we want to perform a group-wise comparison (with the original groups from each study), contrasting the changes between conditions. We have no repeats of condition comparisons across studies.
Hi again,
Thank you for the response! Can you expound on why it is not a good idea? I realize that the expression levels/differential expression levels are not directly comparable, but would this also hold true if we are performing a pathway analysis (ORA/GSEA)?
And which part of the question are you referring to with the last comment, the voom-transform or if there is another way?
Maybe also for some more context, we are not combining samples across studies into any single groups. Rather, we want to perform a group-wise comparison (with the original groups from each study), contrasting the changes between conditions. We have no repeats of condition comparisons across studies.
If you're saying that (as an example) you have treated and control in both assay types, and you want to combine and make comparisons between the two, then that's what I'm talking about. That's a bad idea IMO, and sounds a meta-analysis is better.
It's far better to make the comparisons within assay type, and then combine results after. You can combine using effect size (
GeneMeta
) or p-values (metapod
). I would probably use p-values, and would probably use Stouffer's method, which is why I mentioned that about the one tailed p-values.And it doesn't matter if you're doing gene set analyses after, if the statistics are questionable to begin with.