I have a question in regards of correlating diseases to drugs using the DVD data.
content of the disease data-set :
Matrix containing ranks of genes (rows) for a set of disease
profiles (columns). The ranks are in decreasing order of differential expression.
Data created using the GEO database.
drug dataset :
So this is an entirely different data-set.
A matrix containing the ranked lists of expression profiles for
the 1309 drug compounds in the Connectivity Map (version 2)
screening. Rows are the genes and the columns contain ranked lists
for different drugs. The profiles are in rank decreasing order.
So help for both data-sets mention that they are ranked genes.
Information about the drugRL :
Drug Network and Communities.
We quantified the degree of similarity in the transcriptional responses among
drugs. To this end, we exploited a repository of transcriptional responses to
compounds: the Connectivity Map (cMap) (11, 12) containing 6,100 genome-wide
expression profiles obtained by treatment of five different human cell lines at
different dosages with a set of 1,309 different molecules. We represented the
similarity between two drugs as a “distance” and computed it as summarized in
Fig. 1A: For each compound, we considered all the transcriptional responses
following treatments, across different cell lines and/or at different
concentrations. Each transcriptional response was represented as a list of
genes ranked according to their differential expression. We then computed a
single “synthetic” ranked list of genes, the Prototype Ranked List (PRL), by
merging all the ranked lists referring to the same compound. In order to
equally weight the contribution of each of the cell lines to the drug PRL, rank
merging was achieved with a procedure (detailed in SI Methods) based on a
hierarchical majority-voting scheme, where genes consistently
overexpressed/down-regulated across the ranked lists are moved at the top/bottom
of the PRL (18). The rank-merging procedure first compares, pairwise, the ranked
lists obtained with the same drug using the Spearman’s Footrule similarity
measure (20). Then, it merges the two lists that are the most similar to each
other, following the Borda Merging Method (21), thus obtaining a single ranked
list. This new ranked list replaces the two lists, and then the procedure is
repeated until only one ranked list remains (the PRL of the drug). The PRL thus
captures the consensus transcriptional response of a compound across different
experimental settings, consistently reducing nonrelevant effects due to toxicity,
dosage, and cell line (SI Methods).
(other source: Lamb J et al. (2006) The Connectivity Map: Using Gene- Expression
Signatures to Connect Small Molecules, Genes, and Disease.
Science, 313(5795), 1929-1935.)
My question: Does the result of for example taking the top 100 genes from the
drugRL and comparing these with the diseases, which show up-regulation for these
genes, produce any insight in the correlation of diseases from the diseaseRL data-set?
If not can it be done, and if so how ?