I have two (or more) micro array data of genes of SARS (http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE1739) and Parkinson disease (http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE7621). I found the dysregulated genes in these sets by applying criteria log fold change is greater than 1.5 and p-vlaue < 0.01. My question is how to find the common dysregulated genes in these two sets? Which statistical tests should be applied? Which packages are available in R for this kind analysis? I am new to bioinformatics. Kindly bear with me if question is very basic. Thanks in advance.
Do you mean with common dysregulated genes, the ones that are significantly differentially expressed in both sets?
For that I usually use Venn diagrams. In the limma manual you can find examples of how to make Venn diagrams, or you can make them yourself with the gplots package.
Dear B.Nota, Thanks for your reply. Actually representation is not my problem. I want to know that what is the statistical approach to calculate the common dysregulated genes.? As two sets contain different number of control and infected samples. So which criteria is to be imposed to get number common significant up regulated and down regulated genes in two diseases?
You could make a contingency table and use a Fisher Exact test, or you could use the hypergeometric distribution (see ?phyper in R). Given a universe of genes in two experiments, if you identify a set of genes in experiment 1, and another set of genes in experiment 2, these can help you evaluate the likelihood of a given degree of overlap. As b.nota mentioned, I usually make a Venn diagram and then evaluate it with either of those tests. There's a package in R which does this for you, called GeneOverlap.
Thanks Chris Seidel, So effectively it means find dysregulated genes in list 1 and dysregulated genes in list 2. Then apply GeneOverlap on dysregulated genes 1 and dysregulated genes 2 ( By using function newGeneOverlap in GeneOverlap package).? Sorry for my late reply and poor understanding in bioinformatics.
For each gene, you can use the maximum of the two p-values from the SARS and Parkinson datasets to test whether the gene is dysregulated in both diseases.
In other words, a gene is a common significant gene if it is significant in both diseases. It is as simple as that.
However, the method you have used to assess significance in each individual dataset does not seem the best. It would be better to apply an analysis method that controls the false discovery rate across the whole genome.
Hi,
Do you mean with common dysregulated genes, the ones that are significantly differentially expressed in both sets?
For that I usually use Venn diagrams. In the limma manual you can find examples of how to make Venn diagrams, or you can make them yourself with the gplots package.
Good luck.
Ben
Dear B.Nota, Thanks for your reply. Actually representation is not my problem. I want to know that what is the statistical approach to calculate the common dysregulated genes.? As two sets contain different number of control and infected samples. So which criteria is to be imposed to get number common significant up regulated and down regulated genes in two diseases?