Entering edit mode
                    Miguel Gallach
        
    
        ▴
    
    40
        @miguel-gallach-5016
        Last seen 11.2 years ago
        
    Dear list,
I am using edgeR to analyze my RNA-Seq data. My data consist in two
factors
(Adaptation and Temperature) and two levels per factor, and I want fit
full
and reduced glm like the next ones:
Reduced model
fit0 = counts ~ Adaptation
Aditive model
fit1 = counts ~ Adaptation + Temperature
Interaction model
fit2 = counts ~ block + treatment + block:treatment
That is what I did until now:
d = raw.data[,1:8]
head (d)
            R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold
R10.Cold
FBgn0000003      0      0      0       0       2       8       3
1
FBgn0000008     92    123     80      41      76     135     188
150
FBgn0000014      0      1      2       4       5       8      10
7
FBgn0000015      2      1      0       0       2       3       9
5
FBgn0000017   2000   2393   1799    1290    1029     973    2441
1665
FBgn0000018     36     43     10      17       6      16      20
30
Adaptation =
factor(c("HotAdapted","HotAdapted","ColdAdapted","ColdAdapted","HotAda
pted","HotAdapted","ColdAdapted","ColdAdapted"))
Temperature =
factor(c("HotCage","HotCage","HotCage","HotCage","ColdCage","ColdCage"
,"ColdCage","ColdCage"))
design = model.matrix(~Adaptation + Temperature +
Adaptation:Temperature)
design
  (Intercept) AdaptationHotAdapted TemperatureHot
1           1                    1              1
2           1                    1              1
3           1                    0              1
4           1                    0              1
5           1                    1              0
6           1                    1              0
7           1                    0              0
8           1                    0              0
  AdaptationHotAdapted:TemperatureHot
1                                   1
2                                   1
3                                   0
4                                   0
5                                   0
6                                   0
7                                   0
8                                   0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Adaptation
[1] "contr.treatment"
attr(,"contrasts")$Temperature
[1] "contr.treatment"
d.GLM = DGEList(d, group = c("HotAdaptedHotCage", "HotAdaptedHotCage",
"ColdAdaptedHotCage", "ColdAdaptedHotCage","HotAdaptedColdCage",
"HotAdaptedColdCage", "ColdAdaptedColdCage", "ColdAdaptedColdCage"))
Calculating library sizes from column totals.
d.GLM = calcNormFactors(d.GLM)
d.GLM
An object of class "DGEList"
$samples
                   group lib.size norm.factors
R4.Hot     HotAdaptedHot 17409289    0.9881635
R5.Hot     HotAdaptedHot 17642552    1.0818144
R9.Hot    ColdAdaptedHot 20010974    0.8621807
R10.Hot   ColdAdaptedHot 14064143    0.8932791
R4.Cold   HotAdaptedCold 11968317    1.0061084
R5.Cold   HotAdaptedCold 11072832    1.0523857
R9.Cold  ColdAdaptedCold 22386103    1.0520949
R10.Cold ColdAdaptedCold 17408532    1.0903311
$counts
            R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold
R10.Cold
FBgn0000003      0      0      0       0       2       8       3
1
FBgn0000008     92    123     80      41      76     135     188
150
FBgn0000014      0      1      2       4       5       8      10
7
FBgn0000015      2      1      0       0       2       3       9
5
FBgn0000017   2000   2393   1799    1290    1029     973    2441
1665
14864 more rows ...
$all.zeros
FBgn0000003 FBgn0000008 FBgn0000014 FBgn0000015 FBgn0000017
      FALSE       FALSE       FALSE       FALSE       FALSE
14864 more elements ...
#According to the manual, DGEList is now ready to be passed to the
functions that do the calculations to determine differential
expression
levels for the genes. However is not possible to achieve statistical
significance with fewer than ten counts in total for a tag/gene and we
also
do not want to waste effort finding spurious DE (such as when a gene
is
only expressed in one library), so we filter out tags/genes with fewer
than
1 count per million in SIX or more libraries.
cpm.d.GLM = cpm(d.GLM)
d.GLM = d.GLM[rowSums(cpm.d.GLM > 1) >= 2, ]
nrow(d.GLM)
[1] 9418
#Now the dataset is ready to be analysed for differential expression
#The design matrix
design
  (Intercept) AdaptationHotAdapted TemperatureHotCage
1           1                    1                  1
2           1                    1                  1
3           1                    0                  1
4           1                    0                  1
5           1                    1                  0
6           1                    1                  0
7           1                    0                  0
8           1                    0                  0
  AdaptationHotAdapted:TemperatureHotCage
1                                       1
2                                       1
3                                       0
4                                       0
5                                       0
6                                       0
7                                       0
8                                       0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Adaptation
[1] "contr.treatment"
attr(,"contrasts")$Temperature
[1] "contr.treatment"
#Estimate CR common/trended/tagwise dispersion
d.GLM = estimateGLMCommonDisp(d.GLM, design)
names(d.GLM)
[1] "samples"           "counts"            "all.zeros"
[4] "common.dispersion"
d.GLM$common.dispersion
[1] 0.04634741
sqrt(d.GLM$common.dispersion)
[1] 0.2152845
d.GLM = estimateGLMTrendedDisp(d.GLM, design)
Loading required package: splines
summary(d.GLM$trended.dispersion)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.03672 0.03843 0.04326 0.05267 0.06134 0.11870
d.GLM = estimateGLMTagwiseDisp(d.GLM, design)
d.GLM$prior.n
NULL
d$prior.n
NULL
ls()
[1] "Adaptation"  "cpm.d"       "cpm.d.GLM"   "d"           "d.GLM"
[6] "design"      "raw.data"    "Temperature"
d.GLM
An object of class "DGEList"
$samples
                       group lib.size norm.factors
R4.Hot     HotAdaptedHotCage 17409289    0.9881635
R5.Hot     HotAdaptedHotCage 17642552    1.0818144
R9.Hot    ColdAdaptedHotCage 20010974    0.8621807
R10.Hot   ColdAdaptedHotCage 14064143    0.8932791
R4.Cold   HotAdaptedColdCage 11968317    1.0061084
R5.Cold   HotAdaptedColdCage 11072832    1.0523857
R9.Cold  ColdAdaptedColdCage 22386103    1.0520949
R10.Cold ColdAdaptedColdCage 17408532    1.0903311
$counts
            R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold
R10.Cold
FBgn0000008     92    123     80      41      76     135     188
150
FBgn0000017   2000   2393   1799    1290    1029     973    2441
1665
FBgn0000018     36     43     10      17       6      16      20
30
FBgn0000024     51     68     73      42     118     118     242
129
FBgn0000032   1640   1942   2328    1342     755     674    2110
1307
9413 more rows ...
$all.zeros
FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
      FALSE       FALSE       FALSE       FALSE       FALSE
9413 more elements ...
$common.dispersion
[1] 0.04634741
$trended.dispersion
[1] 0.06147752 0.03682037 0.09393240 0.06250704 0.03701067
9413 more elements ...
$abundance
FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
 -11.915246   -9.183744  -13.519050  -11.966242   -9.300038
9413 more elements ...
$bin.dispersion
 [1] 0.10242476 0.08298402 0.07844827 0.06269934 0.05173652 0.04872388
 [7] 0.04284990 0.04557811 0.04009431 0.03637836 0.03769710 0.03569138
[13] 0.03724316 0.03980808 0.04726670
$bin.abundance
 [1] -13.904034 -13.147003 -12.527444 -12.012132 -11.522218 -11.127957
 [7] -10.769579 -10.430591 -10.119638  -9.821485  -9.532052  -9.228034
[13]  -8.853866  -8.376654  -7.322181
$tagwise.dispersion
[1] 0.05760539 0.03015829 0.10190732 0.05315469 0.03199193
9413 more elements ...
getPriorN(d.GLM, design=design)
[1] 5
#Fit model with tagwise dispersion
glmfit.tgw = glmFit(d.GLM, design,
dispersion=d.GLM$tagwise.dispersion)
lrt.tgw = glmLRT(d.GLM, glmfit.tgw)
topTags(lrt.tgw)
Coefficient:  AdaptationHotAdapted:TemperatureHotCage
               logConc     logFC       LR      P.Value          FDR
FBgn0032285  -8.215481 -5.121465 60.90457 5.990958e-15 5.642284e-11
FBgn0026089  -8.406297 -3.019672 40.81214 1.675890e-10 7.891766e-07
FBgn0039475 -10.039331 -2.757261 39.04361 4.144435e-10 1.301076e-06
FBgn0051641 -10.291549  2.976952 37.46208 9.320768e-10 2.194575e-06
FBgn0058042  -9.665534  2.267317 33.71276 6.388032e-09 1.203250e-05
FBgn0010333  -8.364017 -2.431286 31.28393 2.229167e-08 3.163945e-05
FBgn0030094  -9.930521 -4.244210 31.18012 2.351626e-08 3.163945e-05
FBgn0053653 -11.223515 -2.506126 30.73483 2.958070e-08 3.482388e-05
FBgn0028863  -9.954041 -2.419844 30.13375 4.032520e-08 4.219808e-05
FBgn0037979 -11.777803 -3.211380 29.61620 5.266308e-08 4.959809e-05
If I understand, glmfit.tgw would be the interaction model (f2) I was
talking about, wright? My problem is that I do not know now how to
define
the additive and reduced models to test Additive vs. Reduced and
Interaction vs. Additive.
Any suggestion?
Many thanks,
Miguel Gallach
        [[alternative HTML version deleted]]
                    
                
                