How to compare a single RNA-Seq sample with microarray samples?
1
0
Entering edit mode
Sam ▴ 10
@sam-21502
Last seen 6 weeks ago
Jerusalem

I am comparing a single RNA-Seq sample with many microarray samples (only the clustering is needed, not differential expression ). Since that is only one sample, I cannot use voom for the conversion. Would performing log2-cpm on the RNA-Seq counts be sufficient to put them on the same scale?

voom rna-seq microarray • 240 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

It's not an issue of scale, really. You could convert to z-scores, in which case all the data would be N(0,1) and any cluster would almost certainly put the RNA-Seq out by itself.

In other words, the measures of gene expression that you get from microarrays and RNA-Seq are at best correlated with the underlying gene expression, and aren't a direct measure of the gene expression and can't be compared directly. I wouldn't even try to combine microarray data from different experiments, let alone completely different ways of measuring the gene expression.

As an example, in the more recent Affy arrays there are a set of anti-genomic probes that are designed to have no complementary sequences in any organism, and hence are not expected to bind to anything in a biological sample. These anti-genomic probes vary from almost pure AT to almost pure GC content. And as the GC content goes up, the binding goes up, to a saturated signal. So if you have an Affy probe that has super high GC content, it will bind to, like, anything. And that signal has nothing to do with a measurement of gene expression because these probes aren't designed to measure any gene expression! So for Affy probes, the signal you get is some combination of underlying transcript abundance, and just random binding that goes up as the GC content increases.

If you assume that the GC-specific binding is pretty consistent between samples, then that all comes out in the wash when you compare groups (well not exactly - as the GC-specific binding increases, your apparent fold change decreases - but algebraically it gets subtracted out). RNA-Seq has its own biases, that are different from microarray biases, and simply scaling the data to have the same distribution won't correct for those biases.

0
Entering edit mode

RNA-Seq has its own biases, that are different from microarray biases, and simply scaling the data to have the same distribution won't correct for those biases.

This was explained, and nevertheless the request is to attempt to see the similarity of the RNA-Seq sample to one of the microarray conditions. Perhaps if similarity to one of them will be apparent, it will be a cue to perform a similarity check to this condition with other, more suitable tools.

You could convert to z-scores, in which case all the data would be N(0,1)

Because there is only one RNA-Seq sample, the RNA-Seq signals cannot be converted to z-score.

Is converting to log2-tpm, and quantile-normalizing together with the microarray samples reasonable?

0
Entering edit mode

Also, perhaps it is possible to use voom with one sample only?

0
Entering edit mode

It's possible to run voom with one sample, but you will find it's the same as running cpm with a prior count of 0.5 and log = TRUE.

And I have already said that there are biases that are confounded by technology, so any comparisons are some combination of the underlying gene expression and technical differences. And these differences shouldn't be expected to be monotonic, so a quantile normalization is probably futile.

But you seem bound and determined to do something, so why ask? Just do.