EdgeR repeated measurment
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs. Hypotheses of test are: 1.Mean miRNA expression is the same regardless of quality at any time point 2.Mean miRNA expression is the same at different time point within the same quality group Below is the detail of the analysis, which I used in edgeR. My question is, am I in the write track? How can I check whether the assumption of repeated measurments met or not? As you can see from the head of the data table individual S4 looks an outlier, which is also clear after normalization in a plot (plot is not shown). My question is that should I remove it? Is there any effect in case of removing or missing samples? Does sample imbalance affect the model? R version 3.1.0 beta (2014-03-28 r65330) Platform: x86_64-pc-linux-gnu (64-bit) > y <- read.table(file="salmon",row.names="miRNA", header=TRUE) > head(y) S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44 let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59 6 27 let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22 0 8 let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6 0 1 let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3 0 2 let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8 0 7 let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2 2 0 S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90 let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26 6 18 let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1 0 3 let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0 0 0 let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2 0 11 let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4 2 1 let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0 0 3 > time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day", "0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day")) > quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low", "Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low ","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","G ood","Low","Low","Low","Low","Low","Low")) > indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F 434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F 429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F 445","F446","F429","F431","F434","F440","F447","F448")) > dim(y) [1] 140 35 > dge <- DGEList(counts=y) > dge <- calcNormFactors(dge) > m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size) > keep <- rowSums(m > 1) >= 3 > dge <- dge[keep,] > dim(dge) [1] 135 35 > indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2 ","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1", "2","3","4","5","6")) > targets <- data.frame(quality,indv,time) > targets quality indv time 1 Good 1 0day 2 Good 2 0day 3 Good 3 0day 4 Good 4 0day 5 Good 5 0day 6 Low 1 0day 7 Low 2 0day 8 Low 3 0day 9 Low 4 0day 10 Low 5 0day 11 Low 6 0day 12 Good 1 14day 13 Good 2 14day 14 Good 3 14day 15 Good 4 14day 16 Good 5 14day 17 Good 6 14day 18 Low 1 14day 19 Low 2 14day 20 Low 3 14day 21 Low 4 14day 22 Low 5 14day 23 Low 6 14day 24 Good 1 28day 25 Good 2 28day 26 Good 3 28day 27 Good 4 28day 28 Good 5 28day 29 Good 6 28day 30 Low 1 28day 31 Low 2 28day 32 Low 3 28day 33 Low 4 28day 34 Low 5 28day 35 Low 6 28day > quality <- factor(targets$quality, levels=c("Good","Low")) > time <- factor(targets$time, levels=c("0day","14day","28day")) > design <- model.matrix(~quality+quality:indv+quality:time) > dge <- estimateGLMCommonDisp(dge,design) > dge <- estimateGLMTrendedDisp(dge,design) > dge <- estimateGLMTagwiseDisp(dge,design) > fit <- glmFit(dge, design) > lrt <- glmLRT(fit, coef="qualityGood:time14day") > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Good_14dayvs0day.csv") > lrt <- glmLRT(fit, coef="qualityGood:time28day") > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Good_28dayvs0day.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Good_28dayvs14day.csv") > lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Low_14dayvs0day.csv") > lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Low_28dayvs0day.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Low_28dayvs14day.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="28day_LowvsGood.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="14day_LowvsGood.csv") > lrt <- glmLRT(fit, coef="qualityLow") > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="0day_LowvsGood.csv") I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs. Hypotheses of test are: 1.Mean miRNA expression is the same regardless of quality at any time point 2.Mean miRNA expression is the same at different time point within the same quality group Below is the detail of the analysis, which I used in edgeR. My question is, am I in the write track? How can I check whether the assumption of repeated measurments met or not? As you can see from the head of the data table individual S4 looks an outlier, which is also clear after normalization in a plot (plot is not shown). My question is that should I remove it? Is there any effect in case of removing or missing samples? Does sample imbalance affect the model? R version 3.1.0 beta (2014-03-28 r65330) Platform: x86_64-pc-linux-gnu (64-bit) > y <- read.table(file="salmon",row.names="miRNA", header=TRUE) > head(y) S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44 let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59 6 27 let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22 0 8 let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6 0 1 let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3 0 2 let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8 0 7 let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2 2 0 S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90 let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26 6 18 let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1 0 3 let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0 0 0 let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2 0 11 let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4 2 1 let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0 0 3 > time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day", "0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day")) > quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low", "Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low ","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","G ood","Low","Low","Low","Low","Low","Low")) > indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F 434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F 429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F 445","F446","F429","F431","F434","F440","F447","F448")) > dim(y) [1] 140 35 > dge <- DGEList(counts=y) > dge <- calcNormFactors(dge) > m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size) > keep <- rowSums(m > 1) >= 3 > dge <- dge[keep,] > dim(dge) [1] 135 35 > indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2 ","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1", "2","3","4","5","6")) > targets <- data.frame(quality,indv,time) > targets quality indv time 1 Good 1 0day 2 Good 2 0day 3 Good 3 0day 4 Good 4 0day 5 Good 5 0day 6 Low 1 0day 7 Low 2 0day 8 Low 3 0day 9 Low 4 0day 10 Low 5 0day 11 Low 6 0day 12 Good 1 14day 13 Good 2 14day 14 Good 3 14day 15 Good 4 14day 16 Good 5 14day 17 Good 6 14day 18 Low 1 14day 19 Low 2 14day 20 Low 3 14day 21 Low 4 14day 22 Low 5 14day 23 Low 6 14day 24 Good 1 28day 25 Good 2 28day 26 Good 3 28day 27 Good 4 28day 28 Good 5 28day 29 Good 6 28day 30 Low 1 28day 31 Low 2 28day 32 Low 3 28day 33 Low 4 28day 34 Low 5 28day 35 Low 6 28day > quality <- factor(targets$quality, levels=c("Good","Low")) > time <- factor(targets$time, levels=c("0day","14day","28day")) > design <- model.matrix(~quality+quality:indv+quality:time) > dge <- estimateGLMCommonDisp(dge,design) > dge <- estimateGLMTrendedDisp(dge,design) > dge <- estimateGLMTagwiseDisp(dge,design) > fit <- glmFit(dge, design) > lrt <- glmLRT(fit, coef="qualityGood:time14day") > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Good_14dayvs0day.csv") > lrt <- glmLRT(fit, coef="qualityGood:time28day") > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Good_28dayvs0day.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Good_28dayvs14day.csv") > lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Low_14dayvs0day.csv") > lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Low_28dayvs0day.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="Low_28dayvs14day.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="28day_LowvsGood.csv") > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0)) > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="14day_LowvsGood.csv") > lrt <- glmLRT(fit, coef="qualityLow") > topTags(lrt) > tab <- topTags(lrt, n=Inf) > write.table(tab, file="0day_LowvsGood.csv") -- output of sessionInfo(): R version 3.1.0 beta (2014-03-28 r65330) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.4.2 limma_3.18.13 loaded via a namespace (and not attached): [1] tools_3.1.0 -- Sent via the guest posting facility at bioconductor.org.
miRNA Normalization edgeR miRNA Normalization edgeR • 1.4k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia
> Date: Fri, 4 Apr 2014 01:06:47 -0700 (PDT) > From: "teshe [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, ttb at uin.no > Subject: [BioC] EdgeR repeated measurment > > > I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs. > Hypotheses of test are: > > 1.Mean miRNA expression is the same regardless of quality at any time point > 2.Mean miRNA expression is the same at different time point within the same quality group > > Below is the detail of the analysis, which I used in edgeR. > > My question is, am I in the write track? I don't have time to go through your analysis, but I'd recommend that you used the built-in function cpm() to compute 'm' near the beginning. > How can I check whether the assumption of repeated measurments met or > not? You are not making distributional assumptions about the repeated measures, so there is nothing to check. > As you can see from the head of the data table individual S4 looks an > outlier, which is also clear after normalization in a plot (plot is not > shown). My question is that should I remove it? Is there any effect in > case of removing or missing samples? Does sample imbalance affect the > model? edgeR is able to handle imbalanced numbers of samples without any problem. Best wishes Gordon > R version 3.1.0 beta (2014-03-28 r65330) > Platform: x86_64-pc-linux-gnu (64-bit) > >> y <- read.table(file="salmon",row.names="miRNA", header=TRUE) >> head(y) > S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44 > let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59 6 27 > let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22 0 8 > let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6 0 1 > let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3 0 2 > let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8 0 7 > let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2 2 0 > S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90 > let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26 6 18 > let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1 0 3 > let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0 0 0 > let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2 0 11 > let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4 2 1 > let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0 0 3 >> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day" ,"0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day")) >> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low" ,"Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Lo w","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good"," Good","Low","Low","Low","Low","Low","Low")) >> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431"," F434","F440","F447","F448","F427","F432","F437","F438","F445","F446"," F429","F431","F434","F440","F447","F448","F427","F432","F437","F438"," F445","F446","F429","F431","F434","F440","F447","F448")) >> dim(y) > [1] 140 35 >> dge <- DGEList(counts=y) >> dge <- calcNormFactors(dge) >> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size) >> keep <- rowSums(m > 1) >= 3 >> dge <- dge[keep,] >> dim(dge) > [1] 135 35 >> indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1"," 2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1" ,"2","3","4","5","6")) >> targets <- data.frame(quality,indv,time) >> targets > quality indv time > 1 Good 1 0day > 2 Good 2 0day > 3 Good 3 0day > 4 Good 4 0day > 5 Good 5 0day > 6 Low 1 0day > 7 Low 2 0day > 8 Low 3 0day > 9 Low 4 0day > 10 Low 5 0day > 11 Low 6 0day > 12 Good 1 14day > 13 Good 2 14day > 14 Good 3 14day > 15 Good 4 14day > 16 Good 5 14day > 17 Good 6 14day > 18 Low 1 14day > 19 Low 2 14day > 20 Low 3 14day > 21 Low 4 14day > 22 Low 5 14day > 23 Low 6 14day > 24 Good 1 28day > 25 Good 2 28day > 26 Good 3 28day > 27 Good 4 28day > 28 Good 5 28day > 29 Good 6 28day > 30 Low 1 28day > 31 Low 2 28day > 32 Low 3 28day > 33 Low 4 28day > 34 Low 5 28day > 35 Low 6 28day >> quality <- factor(targets$quality, levels=c("Good","Low")) >> time <- factor(targets$time, levels=c("0day","14day","28day")) >> design <- model.matrix(~quality+quality:indv+quality:time) >> dge <- estimateGLMCommonDisp(dge,design) >> dge <- estimateGLMTrendedDisp(dge,design) >> dge <- estimateGLMTagwiseDisp(dge,design) >> fit <- glmFit(dge, design) >> lrt <- glmLRT(fit, coef="qualityGood:time14day") >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Good_14dayvs0day.csv") > >> lrt <- glmLRT(fit, coef="qualityGood:time28day") >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Good_28dayvs0day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Good_28dayvs14day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Low_14dayvs0day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Low_28dayvs0day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Low_28dayvs14day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="28day_LowvsGood.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="14day_LowvsGood.csv") > >> lrt <- glmLRT(fit, coef="qualityLow") >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="0day_LowvsGood.csv") > > -- output of sessionInfo(): > > R version 3.1.0 beta (2014-03-28 r65330) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > loaded via a namespace (and not attached): > [1] tools_3.1.0 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Thanks, On the model that I used quality <- factor(targets$quality, levels=c("Good","Low")) time <- factor(targets$time, levels=c("0day","14day","28day")) design <- model.matrix(~quality+quality:indv+quality:time) I found that LR values from Low vs Good at 0 day, Low vs good at 14 day, and Low vs Good at 28 day have similar values, so I wonder how this LR value be similar? Tesh ________________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Sent: Saturday, April 5, 2014 9:09 AM To: Teshome Tilahun Bizuayehu Cc: Bioconductor mailing list Subject: EdgeR repeated measurment > Date: Fri, 4 Apr 2014 01:06:47 -0700 (PDT) > From: "teshe [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, ttb at uin.no > Subject: [BioC] EdgeR repeated measurment > > > I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs. > Hypotheses of test are: > > 1.Mean miRNA expression is the same regardless of quality at any time point > 2.Mean miRNA expression is the same at different time point within the same quality group > > Below is the detail of the analysis, which I used in edgeR. > > My question is, am I in the write track? I don't have time to go through your analysis, but I'd recommend that you used the built-in function cpm() to compute 'm' near the beginning. > How can I check whether the assumption of repeated measurments met or > not? You are not making distributional assumptions about the repeated measures, so there is nothing to check. > As you can see from the head of the data table individual S4 looks an > outlier, which is also clear after normalization in a plot (plot is not > shown). My question is that should I remove it? Is there any effect in > case of removing or missing samples? Does sample imbalance affect the > model? edgeR is able to handle imbalanced numbers of samples without any problem. Best wishes Gordon > R version 3.1.0 beta (2014-03-28 r65330) > Platform: x86_64-pc-linux-gnu (64-bit) > >> y <- read.table(file="salmon",row.names="miRNA", header=TRUE) >> head(y) > S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44 > let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59 6 27 > let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22 0 8 > let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6 0 1 > let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3 0 2 > let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8 0 7 > let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2 2 0 > S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90 > let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26 6 18 > let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1 0 3 > let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0 0 0 > let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2 0 11 > let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4 2 1 > let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0 0 3 >> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day" ,"0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day")) >> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low" ,"Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Lo w","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good"," Good","Low","Low","Low","Low","Low","Low")) >> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431"," F434","F440","F447","F448","F427","F432","F437","F438","F445","F446"," F429","F431","F434","F440","F447","F448","F427","F432","F437","F438"," F445","F446","F429","F431","F434","F440","F447","F448")) >> dim(y) > [1] 140 35 >> dge <- DGEList(counts=y) >> dge <- calcNormFactors(dge) >> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size) >> keep <- rowSums(m > 1) >= 3 >> dge <- dge[keep,] >> dim(dge) > [1] 135 35 >> indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1"," 2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1" ,"2","3","4","5","6")) >> targets <- data.frame(quality,indv,time) >> targets > quality indv time > 1 Good 1 0day > 2 Good 2 0day > 3 Good 3 0day > 4 Good 4 0day > 5 Good 5 0day > 6 Low 1 0day > 7 Low 2 0day > 8 Low 3 0day > 9 Low 4 0day > 10 Low 5 0day > 11 Low 6 0day > 12 Good 1 14day > 13 Good 2 14day > 14 Good 3 14day > 15 Good 4 14day > 16 Good 5 14day > 17 Good 6 14day > 18 Low 1 14day > 19 Low 2 14day > 20 Low 3 14day > 21 Low 4 14day > 22 Low 5 14day > 23 Low 6 14day > 24 Good 1 28day > 25 Good 2 28day > 26 Good 3 28day > 27 Good 4 28day > 28 Good 5 28day > 29 Good 6 28day > 30 Low 1 28day > 31 Low 2 28day > 32 Low 3 28day > 33 Low 4 28day > 34 Low 5 28day > 35 Low 6 28day >> quality <- factor(targets$quality, levels=c("Good","Low")) >> time <- factor(targets$time, levels=c("0day","14day","28day")) >> design <- model.matrix(~quality+quality:indv+quality:time) >> dge <- estimateGLMCommonDisp(dge,design) >> dge <- estimateGLMTrendedDisp(dge,design) >> dge <- estimateGLMTagwiseDisp(dge,design) >> fit <- glmFit(dge, design) >> lrt <- glmLRT(fit, coef="qualityGood:time14day") >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Good_14dayvs0day.csv") > >> lrt <- glmLRT(fit, coef="qualityGood:time28day") >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Good_28dayvs0day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Good_28dayvs14day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Low_14dayvs0day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Low_28dayvs0day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="Low_28dayvs14day.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="28day_LowvsGood.csv") > >> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0)) >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="14day_LowvsGood.csv") > >> lrt <- glmLRT(fit, coef="qualityLow") >> topTags(lrt) >> tab <- topTags(lrt, n=Inf) >> write.table(tab, file="0day_LowvsGood.csv") > > -- output of sessionInfo(): > > R version 3.1.0 beta (2014-03-28 r65330) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.13 > > loaded via a namespace (and not attached): > [1] tools_3.1.0 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD REPLY

Login before adding your answer.

Traffic: 403 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6