Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
I have a miRNA data from an experiment, which contain repeated
measurements. The experiment set up is as follows: 12 indviduals were
assigned into two groups, Good and Low based on previous observations;
where Good means those produce good quality and Low means those
produce low quality. Samples were taken from each individual in a week
interval for a month. The miRNA-seq data has three of these time
points (0-day, 14-day and 28-day) for the eleven individuals,
unfortunetly we lost one of the samples from Good-0-day group. Our aim
is to check whether miRNA are involved in determining the quality and
whether the time of sampling affect the quality through miRNAs.
Hypotheses of test are:
1.Mean miRNA expression is the same regardless of quality at any time
point
2.Mean miRNA expression is the same at different time point within the
same quality group
Below is the detail of the analysis, which I used in edgeR.
My question is, am I in the write track? How can I check whether the
assumption of repeated measurments met or not?
As you can see from the head of the data table individual S4 looks an
outlier, which is also clear after normalization in a plot (plot is
not shown). My question is that should I remove it? Is there any
effect in case of removing or missing samples? Does sample imbalance
affect the model?
R version 3.1.0 beta (2014-03-28 r65330)
Platform: x86_64-pc-linux-gnu (64-bit)
> y <- read.table(file="salmon",row.names="miRNA", header=TRUE)
> head(y)
S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53
S59 S60 S44
let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59
6 27
let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22
0 8
let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6
0 1
let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3
0 2
let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8
0 7
let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2
2 0
S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89
S95 S83 S90
let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26
6 18
let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1
0 3
let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0
0 0
let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2
0 11
let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4
2 1
let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0
0 3
> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day",
"0day","0day","0day","0day","14day", "14day", "14day", "14day",
"14day", "14day", "14day", "14day", "14day", "14day", "14day",
"14day", "28day", "28day", "28day", "28day", "28day", "28day",
"28day", "28day", "28day", "28day", "28day", "28day"))
> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low",
"Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low
","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","G
ood","Low","Low","Low","Low","Low","Low"))
> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F
434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F
429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F
445","F446","F429","F431","F434","F440","F447","F448"))
> dim(y)
[1] 140 35
> dge <- DGEList(counts=y)
> dge <- calcNormFactors(dge)
> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size)
> keep <- rowSums(m > 1) >= 3
> dge <- dge[keep,]
> dim(dge)
[1] 135 35
> indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2
","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1",
"2","3","4","5","6"))
> targets <- data.frame(quality,indv,time)
> targets
quality indv time
1 Good 1 0day
2 Good 2 0day
3 Good 3 0day
4 Good 4 0day
5 Good 5 0day
6 Low 1 0day
7 Low 2 0day
8 Low 3 0day
9 Low 4 0day
10 Low 5 0day
11 Low 6 0day
12 Good 1 14day
13 Good 2 14day
14 Good 3 14day
15 Good 4 14day
16 Good 5 14day
17 Good 6 14day
18 Low 1 14day
19 Low 2 14day
20 Low 3 14day
21 Low 4 14day
22 Low 5 14day
23 Low 6 14day
24 Good 1 28day
25 Good 2 28day
26 Good 3 28day
27 Good 4 28day
28 Good 5 28day
29 Good 6 28day
30 Low 1 28day
31 Low 2 28day
32 Low 3 28day
33 Low 4 28day
34 Low 5 28day
35 Low 6 28day
> quality <- factor(targets$quality, levels=c("Good","Low"))
> time <- factor(targets$time, levels=c("0day","14day","28day"))
> design <- model.matrix(~quality+quality:indv+quality:time)
> dge <- estimateGLMCommonDisp(dge,design)
> dge <- estimateGLMTrendedDisp(dge,design)
> dge <- estimateGLMTagwiseDisp(dge,design)
> fit <- glmFit(dge, design)
> lrt <- glmLRT(fit, coef="qualityGood:time14day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_14dayvs0day.csv")
> lrt <- glmLRT(fit, coef="qualityGood:time28day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_14dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="28day_LowvsGood.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="14day_LowvsGood.csv")
> lrt <- glmLRT(fit, coef="qualityLow")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="0day_LowvsGood.csv")
I have a miRNA data from an experiment, which contain repeated
measurements. The experiment set up is as follows: 12 indviduals were
assigned into two groups, Good and Low based on previous observations;
where Good means those produce good quality and Low means those
produce low quality. Samples were taken from each individual in a week
interval for a month. The miRNA-seq data has three of these time
points (0-day, 14-day and 28-day) for the eleven individuals,
unfortunetly we lost one of the samples from Good-0-day group. Our aim
is to check whether miRNA are involved in determining the quality and
whether the time of sampling affect the quality through miRNAs.
Hypotheses of test are:
1.Mean miRNA expression is the same regardless of quality at any time
point
2.Mean miRNA expression is the same at different time point within the
same quality group
Below is the detail of the analysis, which I used in edgeR.
My question is, am I in the write track? How can I check whether the
assumption of repeated measurments met or not?
As you can see from the head of the data table individual S4 looks an
outlier, which is also clear after normalization in a plot (plot is
not shown). My question is that should I remove it? Is there any
effect in case of removing or missing samples? Does sample imbalance
affect the model?
R version 3.1.0 beta (2014-03-28 r65330)
Platform: x86_64-pc-linux-gnu (64-bit)
> y <- read.table(file="salmon",row.names="miRNA", header=TRUE)
> head(y)
S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53
S59 S60 S44
let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59
6 27
let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22
0 8
let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6
0 1
let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3
0 2
let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8
0 7
let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2
2 0
S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89
S95 S83 S90
let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26
6 18
let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1
0 3
let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0
0 0
let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2
0 11
let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4
2 1
let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0
0 3
> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day",
"0day","0day","0day","0day","14day", "14day", "14day", "14day",
"14day", "14day", "14day", "14day", "14day", "14day", "14day",
"14day", "28day", "28day", "28day", "28day", "28day", "28day",
"28day", "28day", "28day", "28day", "28day", "28day"))
> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low",
"Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low
","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","G
ood","Low","Low","Low","Low","Low","Low"))
> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F
434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F
429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F
445","F446","F429","F431","F434","F440","F447","F448"))
> dim(y)
[1] 140 35
> dge <- DGEList(counts=y)
> dge <- calcNormFactors(dge)
> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size)
> keep <- rowSums(m > 1) >= 3
> dge <- dge[keep,]
> dim(dge)
[1] 135 35
> indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2
","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1",
"2","3","4","5","6"))
> targets <- data.frame(quality,indv,time)
> targets
quality indv time
1 Good 1 0day
2 Good 2 0day
3 Good 3 0day
4 Good 4 0day
5 Good 5 0day
6 Low 1 0day
7 Low 2 0day
8 Low 3 0day
9 Low 4 0day
10 Low 5 0day
11 Low 6 0day
12 Good 1 14day
13 Good 2 14day
14 Good 3 14day
15 Good 4 14day
16 Good 5 14day
17 Good 6 14day
18 Low 1 14day
19 Low 2 14day
20 Low 3 14day
21 Low 4 14day
22 Low 5 14day
23 Low 6 14day
24 Good 1 28day
25 Good 2 28day
26 Good 3 28day
27 Good 4 28day
28 Good 5 28day
29 Good 6 28day
30 Low 1 28day
31 Low 2 28day
32 Low 3 28day
33 Low 4 28day
34 Low 5 28day
35 Low 6 28day
> quality <- factor(targets$quality, levels=c("Good","Low"))
> time <- factor(targets$time, levels=c("0day","14day","28day"))
> design <- model.matrix(~quality+quality:indv+quality:time)
> dge <- estimateGLMCommonDisp(dge,design)
> dge <- estimateGLMTrendedDisp(dge,design)
> dge <- estimateGLMTagwiseDisp(dge,design)
> fit <- glmFit(dge, design)
> lrt <- glmLRT(fit, coef="qualityGood:time14day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_14dayvs0day.csv")
> lrt <- glmLRT(fit, coef="qualityGood:time28day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_14dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="28day_LowvsGood.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="14day_LowvsGood.csv")
> lrt <- glmLRT(fit, coef="qualityLow")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="0day_LowvsGood.csv")
-- output of sessionInfo():
R version 3.1.0 beta (2014-03-28 r65330)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.4.2 limma_3.18.13
loaded via a namespace (and not attached):
[1] tools_3.1.0
--
Sent via the guest posting facility at bioconductor.org.