Combining RNAseq and microarray data using edgeR/limma-voom
2
0
Entering edit mode
@5028264b
Last seen 10 days ago
United States

Hi all,

I am working on a large dataset consisting of multiple different RNAseq and Microarray studies from different labs and times. While we have a (functional) pipeline setup for this, I recently saw a post which mentioned using voomLmFit to counter the issues which might stem from having excess zeroes in the data.

Since we are combining RNAseq data and microarray data for a combined analysis, we see some of these data-sparsity issues; a number of the genes in the RNAseq dataset are simply not found in the microarray datasets, and not all of our microarray datasets share a full geneset either. Ideally, we would like not to simply remove the partially sparse genes from the dataset, since doing that would drastically reduce the amount of genes available for further analyses.

My question is therefore whether a voomLmFit pipeline could be used for both the RNAseq and microarray data? I.e. is voom transformation of microarray data harmful, and if so is there another way to account for these data-sparsity issues without having to cut down our genesets drastically?

Thanks, Adam

edit: For some more context, we are not combining samples across studies into any single groups. Rather, we want to perform a group-wise comparison (with the original groups from each study), contrasting the changes between conditions. We have no repeats of condition comparisons across studies.

limma voom RNASeq edgeR Microarray • 1.0k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

I agree with James, although I tend to combine analyses using geneset tests rather than by meta analysis tools like GeneMeta.

My question is therefore whether a voomLmFit pipeline could be used for both the RNAseq and microarray data?

No, voomLmFit() is only for count data such as RNA-seq. If you want to undertake a voom-like analysis of microarray data, then limma provides the function voomaLmFit() for microarrays. Generally, however, observation level weights are not so important for microarray as for RNA-seq, and you would usually just use the standard limma pipelines for microarray data.

You should be processing each technology (RNA-seq and each different type of microarray) according to their own characteristics. You cannot combine different technologies into a combined limma analysis, because the different technologies have different probe sets, different mean-variance relationships and different biases.

is voom transformation of microarray data harmful

Yes it is. A voom analysis of microarray data is basically nonsense because it is assuming characteristics that don't exist for microarrays. Just to mention one thing, voom is assuming that counts need to be normalized by library size, but microarrays don't have library sizes.

is there another way to account for these data-sparsity issues without having to cut down our genesets drastically?

Even if you could run voomLmFit on microarray data, it would do nothing to change the fact that different technologies measure different genes and isoforms. Software normalization cannot make a platform appear to measure something it doesn't.

My practice has been to analyse each technology dataset separately keeping all possible probes in the analysis, and then correlate results between datasets using geneset tests or similar.

ADD COMMENT
0
Entering edit mode

Hi Gordon,

Thanks for the answer! This makes a lot of sense. Your suggested pipeline (separate processing of datasets -> gene set analyses -> comparison) actually seems to fit quite well with what we have discussed internally, so it's good to know we were not that far off originally.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 days ago
United States

No, that's not a good idea at all. You should be doing a meta-analysis with these data rather than trying to combine like that. GeneMeta is one package you could use, but there are others that you can search for on the BioC website, like metapod.

The latter is a bit tricky if you use Stouffer's method because it's supposed to be based on one tailed p-values, and there's no obvious way to implement that in a package, so you have to know that, and use your own code to accommodate. In other words, you have to convert all your p-values to one tailed, do the test, and then convert back.

0
Entering edit mode

Hi again,

Thank you for the response! Can you expound on why it is not a good idea? I realize that the expression levels/differential expression levels are not directly comparable, but would this also hold true if we are performing a pathway analysis (ORA/GSEA)?

And which part of the question are you referring to with the last comment, the voom-transform or if there is another way?

Maybe also for some more context, we are not combining samples across studies into any single groups. Rather, we want to perform a group-wise comparison (with the original groups from each study), contrasting the changes between conditions. We have no repeats of condition comparisons across studies.

ADD REPLY
0
Entering edit mode

If you're saying that (as an example) you have treated and control in both assay types, and you want to combine and make comparisons between the two, then that's what I'm talking about. That's a bad idea IMO, and sounds a meta-analysis is better.

It's far better to make the comparisons within assay type, and then combine results after. You can combine using effect size (GeneMeta) or p-values (metapod). I would probably use p-values, and would probably use Stouffer's method, which is why I mentioned that about the one tailed p-values.

ADD REPLY
0
Entering edit mode

And it doesn't matter if you're doing gene set analyses after, if the statistics are questionable to begin with.

ADD REPLY
0
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6