Entering edit mode
Christy
•
0
@christy-24194
Last seen 3.4 years ago
Hi there.
I have an experiment comprising 25 samples from 12 different tissue types, 2 different ages and 2 different species (as outlined in the dataframe below):
> data.frame(Sample=colnames(y),Tissue,Age,Species)
Sample Tissue Age Species
1 S1 T1 adult Species1
2 S2 T2 fetal Species2
3 S3 T2 fetal Species2
4 S4 T2 fetal Species1
5 S5 T3 fetal Species1
6 S6 T4 adult Species2
7 S7 T4 adult Species2
8 S8 T5 adult Species1
9 S9 T6 adult Species2
10 S10 T6 adult Species2
11 S11 T2 adult Species2
12 S12 T2 adult Species2
13 S13 T2 adult Species1
14 S14 T3 adult Species1
15 S15 T7 adult Species2
16 S16 T7 adult Species2
17 S17 T7 adult Species2
18 S18 T7 adult Species2
19 S19 T8 adult Species1
20 S20 T9 adult Species1
21 S21 T10 adult Species2
22 S22 T10 adult Species2
23 S23 T10 adult Species1
24 S24 T11 adult Species1
25 S25 T12 adult Species1
I am wanting to find significantly deferentially expressed genes for each tissue type in comparison to the average expression of each gene across all tissues, while accounting for effects due to age and species.
This is what I have come up with. I'm not convinced I have done this right. I am hoping for feedback on whether I have gone about this the right way and if not, how can I do this?
Many thanks in advance!
>BiocManager::install("edgeR")
>library(edgeR)
#read in count data
>countTable=read.table(file="count_matrix_TPM_metadata.txt",header=TRUE, row.names=1)
>#define tissues
>Tissue=c("Tissue1","Tissue2","Tissue2","Tissue2","Tissue3","Tissue4","Tissue4","Tissue5","Tissue6","Tissue6","Tissue2","Tissue2","Tissue2","Tissue3","Tissue7","Tissue7","Tissue7","Tissue7","Tissue8","Tissue9","Tissue10","Tissue10","Tissue10","Tissue11","Tissue12")
>#create edgeR object using tissue type as grouping factor
>y=DGEList(counts=countTable[,18:42],genes=countTable[,10:17], group=Tissue)
>#filter out lowly expressed
>keep <- filterByExpr(y,min.count=1)
>y <- y[keep, , keep.lib.sizes=FALSE]
>#define other effects for model
>Age<-factor(c("adult","fetal","fetal","fetal","fetal",rep("adult",20)))
>Species<-factor(c("Species1","Species2","Species2","Species1","Species1","Species2","Species2","Species1",rep("Species2",4),"Species1","Species1",rep("Species2",4),"Species1","Species1","Species2","Species2",rep("Species1",3)))
>#Create design matix for which the intercept is the overall mean
>contrasts(y$samples$group)<-contr.sum(levels(y$samples$group))
>design<-model.matrix(~Tissue+Age+Species,data=y$samples)
>design
(Intercept) TissueT10 TissueT11 TissueT12 TissueT2 TissueT3 TissueT4 TissueT5 TissueT6 TissueT7 TissueT8 TissueT9 Agefetal
S1 1 0 0 0 0 0 0 0 0 0 0 0 0
S2 1 0 0 0 1 0 0 0 0 0 0 0 1
S3 1 0 0 0 1 0 0 0 0 0 0 0 1
S4 1 0 0 0 1 0 0 0 0 0 0 0 1
S5 1 0 0 0 0 1 0 0 0 0 0 0 1
S6 1 0 0 0 0 0 1 0 0 0 0 0 0
S7 1 0 0 0 0 0 1 0 0 0 0 0 0
S8 1 0 0 0 0 0 0 1 0 0 0 0 0
S9 1 0 0 0 0 0 0 0 1 0 0 0 0
S10 1 0 0 0 0 0 0 0 1 0 0 0 0
S11 1 0 0 0 1 0 0 0 0 0 0 0 0
S12 1 0 0 0 1 0 0 0 0 0 0 0 0
S13 1 0 0 0 1 0 0 0 0 0 0 0 0
S14 1 0 0 0 0 1 0 0 0 0 0 0 0
S15 1 0 0 0 0 0 0 0 0 1 0 0 0
S16 1 0 0 0 0 0 0 0 0 1 0 0 0
S17 1 0 0 0 0 0 0 0 0 1 0 0 0
S18 1 0 0 0 0 0 0 0 0 1 0 0 0
S19 1 0 0 0 0 0 0 0 0 0 1 0 0
S20 1 0 0 0 0 0 0 0 0 0 0 1 0
S21 1 1 0 0 0 0 0 0 0 0 0 0 0
S22 1 1 0 0 0 0 0 0 0 0 0 0 0
S23 1 1 0 0 0 0 0 0 0 0 0 0 0
S24 1 0 1 0 0 0 0 0 0 0 0 0 0
S25 1 0 0 1 0 0 0 0 0 0 0 0 0
SpeciesSpecies2
S1 0
S2 1
S3 1
S4 0
S5 0
S6 1
S7 1
S8 0
S9 1
S10 1
S11 1
S12 1
S13 0
S14 0
S15 1
S16 1
S17 1
S18 1
S19 0
S20 0
S21 1
S22 1
S23 0
S24 0
S25 0
attr(,"assign")
[1] 0 1 1 1 1 1 1 1 1 1 1 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Tissue
[1] "contr.treatment"
attr(,"contrasts")$Age
[1] "contr.treatment"
attr(,"contrasts")$Species
[1] "contr.treatment"
>#estimate trended dispersions
>y <- estimateGLMCommonDisp(y, design)
>y <- estimateGLMTrendedDisp(y, design)
>#fit liner model
>fit <- glmQLFit(y, design)
Warning message:
Zero sample variances detected, have been offset away from zero
#do I need to be worried about this message?
>#genes specifically up or down in tissue 10 as compared to the average of all the other tissues
>Tissue10 <- glmQLFTest(fit, coef=2)
>topTissue10<- topTags(Tissue10,sort.by="PValue",p.value=0.05)
>#genes specifically up or down in tissue 2 as compared to the average of all the other tissues
>Tissue2 <- glmQLFTest(fit, coef=5)
>topTissue2<- topTags(Tissue2,sort.by="PValue",p.value=0.05)