Question: Correlate drugs to diseases using DrugVsDisease data
gravatar for bioinformatics
8 months ago by
bioinformatics10 wrote:

Hello !


I have a question in regards of correlating diseases to drugs using the DVD data.


content of the disease data-set :

Matrix containing ranks of genes (rows) for a set of disease

profiles (columns). The ranks are in decreasing order of differential expression.


Data created using the GEO database.

example :


> head(as.matrix(diseaseRL[,'wilms-tumor']))


LINC00115  0.016834807

GOT2P1    -0.001120510

TP73-AS1  -0.001258255


drug dataset :


> head(as.matrix(drugRL[,'vorinostat']))


ZNF702P  5840

SAMD4A   6881

VN1R1    6154

ZNF419   4075



So this is an entirely different data-set.



     A matrix containing the ranked lists of expression profiles for

     the 1309 drug compounds in the Connectivity Map (version 2)

     screening. Rows are the genes and the columns contain ranked lists

     for different drugs. The profiles are in rank decreasing order.



So help for both data-sets mention that they are ranked genes.




Information about the drugRL :


Drug Network and Communities.


We quantified the degree of similarity in the transcriptional responses among

drugs. To this end, we exploited a repository of transcriptional responses to

compounds: the Connectivity Map (cMap) (11, 12) containing 6,100 genome-wide

expression profiles obtained by treatment of five different human cell lines at

different dosages with a set of 1,309 different molecules. We represented the

similarity between two drugs as a “distance” and computed it as summarized in

Fig. 1A: For each compound, we considered all the transcriptional responses

following treatments, across different cell lines and/or at different

concentrations. Each transcriptional response was represented as a list of

genes ranked according to their differential expression. We then computed a

single “synthetic” ranked list of genes, the Prototype Ranked List (PRL), by

merging all the ranked lists referring to the same compound. In order to

equally weight the contribution of each of the cell lines to the drug PRL, rank

merging was achieved with a procedure (detailed in SI Methods) based on a

hierarchical majority-voting scheme, where genes consistently

overexpressed/down-regulated across the ranked lists are moved at the top/bottom

of the PRL (18). The rank-merging procedure first compares, pairwise, the ranked

lists obtained with the same drug using the Spearman’s Footrule similarity

measure (20). Then, it merges the two lists that are the most similar to each

other, following the Borda Merging Method (21), thus obtaining a single ranked

list. This new ranked list replaces the two lists, and then the procedure is

repeated until only one ranked list remains (the PRL of the drug). The PRL thus

captures the consensus transcriptional response of a compound across different

experimental settings, consistently reducing nonrelevant effects due to toxicity,

dosage, and cell line (SI Methods).


source :


(other source: Lamb J et al. (2006) The Connectivity Map: Using Gene- Expression

Signatures to Connect Small Molecules, Genes, and Disease.

Science, 313(5795), 1929-1935.)



My question: Does the result of for example taking the top 100 genes from the

drugRL and comparing these with the diseases, which show up-regulation for these

genes, produce any insight in the correlation of diseases from the diseaseRL data-set?


If not can it be done, and if so how ?




ADD COMMENTlink modified 8 months ago by saezrodriguez0 • written 8 months ago by bioinformatics10
gravatar for saezrodriguez
8 months ago by
saezrodriguez0 wrote:

Dear Ric,

Thanks for your interest and apologies for the slow reply.

The diseaseRL data set is the log FC coefficient change in expression between disease and control samples. These coefficients are then ranked internally by DrugVsDisease so that the top ranked list of genes for disease and drug profiles can be compared.

If you would like to do your own comparison, we recommend you rank the diseaseRL profiles (using R’s rank function) to make them comparable to the Drug rank profiles.

Also note that if you take only top  up-regulated 100 genes, it is possible also under-expressed are important, and you would miss that. We would advise you to take top ranked up and down regulated instead.

ADD COMMENTlink written 8 months ago by saezrodriguez0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 172 users visited in the last hour