Differential Gene Expression using limma
1
0
Entering edit mode
Sonia SHAH ▴ 30
@sonia-shah-1585
Last seen 9.8 years ago
Hi, It would be greatly appreciated if I could get some advice on how to go about looking at differential expression on my data. I have Affy data from 3 different cell types: type1, type2, type3 and 3 biological reps for each type I want to get 2 gene lists using limma: 1. genes that are expressed in type1 and type3 but not in type2 2. genes that are expressed in type2 and type3 but not in type1 There seem to be lost of different ways of doing this. I tried 2 design matrices: DESIGN1 type1 type2 type3 type1rep1 1 0 0 type1rep2 1 0 0 type1rep3 1 0 0 type2rep1 0 1 0 type2rep2 0 1 0 type2rep3 0 1 0 type3rep1 0 0 1 type3rep2 0 0 1 type3rep3 0 0 1 contrasts: (type1+type3)-type2 (type2+type3)-type1 DESIGN2 I would use 2 design matrices to get each gene list The first matrix below will give genes that are in type1+3 but not in type2: A B type1rep1 1 0 type1rep2 1 0 type1rep3 1 0 type2rep1 0 1 type2rep2 0 1 type2rep3 0 1 type3rep1 1 0 type3rep2 1 0 type3rep3 1 0 contrast A-B The second matrix below will give genes that are in type2+3 but not in type1: A B type1rep1 0 1 type1rep2 0 1 type1rep3 0 1 type2rep1 1 0 type2rep2 1 0 type2rep3 1 0 type3rep1 1 0 type3rep2 1 0 type3rep3 1 0 contrast A-B I would have thought that the two different approaches would give me the same number of differentially expressed genes. But it doesn't. It gives me very different numbers. Are the two approaches the same or am I doing something completely wrong? Thanks Sonia
affy affy • 753 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Sonia, Sonia SHAH wrote: > Hi, > > It would be greatly appreciated if I could get some advice on how to go > about looking at differential expression on my data. > > I have Affy data from 3 different cell types: type1, type2, type3 > and 3 biological reps for each type > > > I want to get 2 gene lists using limma: > 1. genes that are expressed in type1 and type3 but not in type2 > 2. genes that are expressed in type2 and type3 but not in type1 Just a technical point here; you cannot find genes that are 'expressed' in one sample and not in another. The best you can do is find genes that are expressed at a different level between samples. > > There seem to be lost of different ways of doing this. I tried 2 design > matrices: > > DESIGN1 > type1 type2 type3 > type1rep1 1 0 0 > type1rep2 1 0 0 > type1rep3 1 0 0 > type2rep1 0 1 0 > type2rep2 0 1 0 > type2rep3 0 1 0 > type3rep1 0 0 1 > type3rep2 0 0 1 > type3rep3 0 0 1 > > contrasts: (type1+type3)-type2 > (type2+type3)-type1 These are not contrasts. To be a contrast, the coefficients have to sum to zero, so you would need (type1 + type3)/2 - type2 (type2 + type3)/2 - type1 > > > > DESIGN2 > I would use 2 design matrices to get each gene list > > The first matrix below will give genes that are in type1+3 but not in > type2: > > A B > type1rep1 1 0 > type1rep2 1 0 > type1rep3 1 0 > type2rep1 0 1 > type2rep2 0 1 > type2rep3 0 1 > type3rep1 1 0 > type3rep2 1 0 > type3rep3 1 0 > > contrast A-B > > > The second matrix below will give genes that are in type2+3 but not in > type1: > > A B > type1rep1 0 1 > type1rep2 0 1 > type1rep3 0 1 > type2rep1 1 0 > type2rep2 1 0 > type2rep3 1 0 > type3rep1 1 0 > type3rep2 1 0 > type3rep3 1 0 > > contrast A-B > > > I would have thought that the two different approaches would give me the > same number of differentially expressed genes. But it doesn't. It gives > me very different numbers. > > Are the two approaches the same or am I doing something completely > wrong? Well, if you used the contrasts as I outline above they will be very similar but still not the same. The difference is a technical point about how the contrasts are computed. Note: To make this explanation easier to understand, I am omitting the empirical Bayes moderation step. In the first case, the contrast you are using is very similar to a t-statistic, in which you are computing the difference in mean expression in the numerator, and an estimate of how accurately you are computing those means in the denominator. Since you have three groups, the denominator tells you how well you are estimating the mean of those three groups (based on the variance within each group - this is the important point). In the second case, the contrast is identical to a t-statistic because you have two groups you are comparing and the denominator estimates how well you are estimating the means of those two groups. To illustrate this difference, here is an example. Let's say that the expression values for a particular gene look like this: Type1 = 5.6, 5.8, 5.4 Type2 = 8.5, 8.6, 8.3 Type3 = 14.1, 14.2, 14.5 Now in the first case, if you compute the contrast (type2 + type3)/2 - type1 you will get a difference of ~5.8 and a very significant p-value because the variability *within* each sample type is very small. On the other hand, if you did the comparisons as in your second case, this would probably not be significant because the variability within the pooled Type2 and Type3 samples would now be quite high. This will result in a much larger denominator for your t-statistic (but with the same numerator), so the resulting p-value will be much larger. So how you do things depends on what exactly you are looking to show. If you want to find those genes where e.g., Type1 is different from the mean expression of Type2 and Type3 then you want to use your first method. If you want to find those genes where the expression values for Type1 are different from Type2 and Type3 _and_ there is very little difference between Type2 and Type3, then you should use your second method. HTH, Jim -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT

Login before adding your answer.

Traffic: 824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6