Question

How to compare a single RNA-Seq sample with microarray samples?

0

Entering edit mode

Sam ▴ 10

@sam-21502

Last seen 6 months ago

Jerusalem

I am comparing a single RNA-Seq sample with many microarray samples (only the clustering is needed, not differential expression ). Since that is only one sample, I cannot use voom for the conversion. Would performing log2-cpm on the RNA-Seq counts be sufficient to put them on the same scale?

voom rna-seq microarray • 2.4k views

ADD COMMENT • link updated 5.5 years ago by James W. MacDonald 68k • written 5.5 years ago by Sam ▴ 10

score 1 · Answer 1 · 2020-06-15

It's not an issue of scale, really. You could convert to z-scores, in which case all the data would be N(0,1) and any cluster would almost certainly put the RNA-Seq out by itself.

In other words, the measures of gene expression that you get from microarrays and RNA-Seq are at best correlated with the underlying gene expression, and aren't a direct measure of the gene expression and can't be compared directly. I wouldn't even try to combine microarray data from different experiments, let alone completely different ways of measuring the gene expression.

As an example, in the more recent Affy arrays there are a set of anti-genomic probes that are designed to have no complementary sequences in any organism, and hence are not expected to bind to anything in a biological sample. These anti-genomic probes vary from almost pure AT to almost pure GC content. And as the GC content goes up, the binding goes up, to a saturated signal. So if you have an Affy probe that has super high GC content, it will bind to, like, anything. And that signal has nothing to do with a measurement of gene expression because these probes aren't designed to measure any gene expression! So for Affy probes, the signal you get is some combination of underlying transcript abundance, and just random binding that goes up as the GC content increases.

If you assume that the GC-specific binding is pretty consistent between samples, then that all comes out in the wash when you compare groups (well not exactly - as the GC-specific binding increases, your apparent fold change decreases - but algebraically it gets subtracted out). RNA-Seq has its own biases, that are different from microarray biases, and simply scaling the data to have the same distribution won't correct for those biases.