Possible approach of compare/interpret RNA-Seq results with microarray gene expression describing the same phenotype
2
0
Entering edit mode
svlachavas ▴ 780
@svlachavas-7225
Last seen 7 days ago
Germany/Heidelberg/German Cancer Resear…

Dear Community,

based on the results of a previous post (C: Possible ways of performing differential gene expression and analysis of RNA-Seq), regarding the analysis of a TCGA RNA-Seq data, i ended up with a list of DE genes (the analysis was performed on log2(estimated counts +1) values, with the pipeline of microarrays:limma-eBayes). Then, from a simple Venn diagram, i compared the ~5000 DE gene symbols from the RNA-Seq, with another small gene signature(94 DE genes), which i have acquired from a independent microarray experiment with similar experimental condition (same cancer, very similar comparison in limma,etc). Based on Venn diagram, the 89 gene symbols are common-and also they have the same alteration of gene expression (based on the log2FC).

Thus, the most appropriate/unbiased way of interpreting the results would be that these 89 gene symbols are more genuine DE ? Found in two independent datasets ? Or an even more "advanced approach" could be utilized, to also take the log2FCs into account ? Like a small-approach/kind of meta-analysis ? Despite the different high-throughput technologies ? As also another (might) drawback, regarding the annotation process ? That is, the microarrays were analyzed with customCDF arrays (affymetrix), whereas the RNA-Seq loaded from a specific R package has been already annotated to unique gene symbols.

[*I understand that this question might be a little more general for the purpose of this group, but there might be R packages or approaches for this purpose which i'm not familiar with.]

Any opinion or feedback is welcome !!

microarray rnaseq DEgenes meta-analysis limma • 824 views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 12 hours ago
The city by the bay

Your 89 genes are significantly DE in both contrasts - that's the obvious interpretation. If the biological context of the contrasts are sufficiently similar, you could say something about the DE being robust to the technology used. In this respect, they are likely to be "more genuine DE", as they are detected by orthogonal techniques. However, a formal statement of "more genuine DE" would require a reduction in the error rate, and this is not guaranteed when you intersect genes detected at a particular FDR threshold in separate contrasts.

I should point out that any differences in technology are largely irrelevant because your contrasts are performed within each technology. Any biases should thus cancel out. There might be differences in power between technologies, but this too is irrelevant as long as you are not making claims about genes that were not detected.

If you want a more sophisticated approach, you could use the microarray gene signature as a gene set in roast (with the log-fold changes as directional weights). You can then test for DE across genes in the RNA-seq data set.

0
Entering edit mode

Dear Aaron, thank you for your answer and rationale of your explanation !! To be honest, one initial thought i had was if it was appropriate to construct a forest plot: that possibly include the log2FCs for the 89 DE genes, for both 2 studies, along with their confidence intervals--however, i was a bit reluctant as i have never used a similar approach. Moreover, the other thing i tried, was a heatmap of the common gene symbols in the second dataset, which also showed similar expression patterns and the separation of cancer and control samples.

Regarding your approach, which sounds very interesting and not trivial:

1) You mean in the beggining, create for instance two gene symbols vector: one for the up-genes from the signature, and the other with the down genes, along with their respective log2FCs ? and i mentioned gene symbols, due to the fact that the RNA-Seq data has already unique gene symbols in the rows?

2) And then run roast 2 times: one for the up and then the other for the down genes ? with the following formula:

roast(TCGA.eset, index=vec1, design, contrast=, gene.weights=genes.fc.vec1)

# for instance where vec1 is a character vector which has the upregulated gene symbols, and the genes.fc.vec1 also a numeric vector of their relative log2Fcs ??

1
Entering edit mode

Gene weights can be positive and negative in the same vector, you don't have to separate up- and down-regulated genes.

0
Entering edit mode

oh i think i got it-so provide a total vector of gene symbols in the index, as also the total relative log2FCs in the gene.weights argument, correct ? and then from the proportion of up and down genes, i should interpret in the same way the consistensy of this signature in the tested dataset, right ?

0
Entering edit mode

Dear Aaron,

sorry to return after some days, just a small update and your comment on the final interpretation. Briefly, based on your relative comments:

head(vec1) [1] "AARS"   "ABCD3"  "ACADM"  "ACADS"  "ACADVL" "AHCY"  # the vector of common DE gene symbols 

head(genes.fc.vec1) [1]  0.7871676 -0.8855676 -1.0332955 -1.2122512 -0.7056447  1.1534264 # the relative log2FCs of the above common DE symbols in the microarray dataset

roast(y=y, index=vec1, design=design.2, contrast=2, gene.weights=genes.fc.vec1) # where y the TCGA RNA-Seq

          Active.Prop      P.Value Down              0       1.0000000000 Up                1        0.0005002501 UpOrDown          1        0.0010000000 Mixed             1      0.0010000000

My quick questions are the following:

1) As the Up proportion is 1, essentially "validates" the direction of my signature from microarrays in the RNA-Seq dataset, correct ? As also of course by the Mixed proportion (as i have both up and down genes).

2) with implementing roast in the context of my original post question, essentially i give an alternative validation of my gene signature, with a more "sophisticated" way, except the initial DE expression analysis with the RNA-Seq dataset ?

3) Because of the two different technologies (although RNA-Seq processed in a similar way like microarrays),and also the different annotation procedures, still roast is valid, correct ? As essentially i use the final gene symbols, right ?

0
Entering edit mode
theobroma22 ▴ 10
@theobroma22-11920
Last seen 4.9 years ago

I would think that you have to convert the microarray data so that it's on the same scale as rna-seq data to compare the two, or vice-versa. Another way is to use pcr to validate the expressions of those 89 genes, or a subset of those at least, you observed in both sets. As such, right now you have insight into the datasets but you may need some sort of validation for that comparison like conversion or pcr validation.