edgeR for RNA-seq: nested paired analysis with uneven groups
0
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.5 years ago
Hello, I am new to the field and have a question regarding analysis of RNA- seq data with the edgeR package. I am looking at the difference in gene expression between two groups of patients (control and disease) after stimuli. I need to perform a paired analysis for each participant within each group, and then compare groups (different individuals between groups). My problem is that I have a different number of participants between in each group. Here???s a summary of my experimental design: 1) Paired analysis for control group (n=7) control patient 1: measurement 1 -> disease stimuli -> measurement 2 control patient 2: measurement 1 -> disease stimuli -> measurement 2 ???. Control group expression (n=7) 2) Paired analysis for disease group (n=6) diseased patient1: measurement 1 -> disease stimuli -> measurement 2 diseased patient2: measurement 1 -> disease stimuli -> measurement 2 ??? Disease group expression (n=6) 3) Differential expression Diseases vs healthy groups So I am basically looking for genes that demonstrate the highest variation after stimuli between the two groups. Based on the section 3.5 of the edgeR users guide, I first came up with the following nested paired analysis: ---------------------------------------------------------------------- ---------------------------------- #Individual patients > patient <- factor(c('patient1', 'patient1', 'patient2', 'patient2', 'patient3', 'patient3', 'patient4', 'patient4', 'patient5', 'patient5','patient6', 'patient6', 'patient7', 'patient7', 'patient8', 'patient8', 'patient9', 'patient9', 'patient10', 'patient10', 'patient11', 'patient11', 'patient12', 'patient12', 'patient13', 'patient13' )) #I re-numbered each patient > n <- factor(c(1, 1,2, 2, 3, 3, 4, 4,5, 5, 6, 6, 7, 7,1, 1,2, 2, 3, 3, 4, 4, 5, 5, 6, 6), levels=c(1,2,3,4,5,6,7)) #I assigned them to their respective group > group <- factor(c('control', 'control','control', 'control','control ','control','control','control','control','control','control','control ','control','control', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease','disease', 'disease', 'disease', 'disease'), levels=c("control","disease")) # time 1 refers to measurement before stimuli and time 2 refers to measurement after stimuli > time <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2), levels=c(1,2)) > data.frame(sample=colnames(y),patient,n,time,group) sample patient n time group 1 patient1 1 1 control 2 patient1 1 2 control 3 patient2 2 1 control 4 patient2 2 2 control 5 patient3 3 1 control 6 patient3 3 2 control 7 patient4 4 1 control 8 patient4 4 2 control 9 patient5 5 1 control 10 patient5 5 2 control 11 patient6 6 1 control 12 patient6 6 2 control 13 patient7 7 1 control 14 patient7 7 2 control 15 patient8 1 1 disease 16 patient8 1 2 disease 17 patient9 2 1 disease 18 patient9 2 2 disease 19 patient10 3 1 disease 20 patient10 3 2 disease 21 patient11 4 1 disease 22 patient11 4 2 disease 23 patient12 5 1 disease 24 patient12 5 2 disease 25 patient13 6 1 disease 26 patient13 6 2 disease > design <- model.matrix(~group+group:n+group:time) > colnames(design) [1] "(Intercept)" "groupdisease" "groupcontrol:n2" "groupdisease:n2" "groupcontrol:n3" "groupdisease:n3" "groupcontrol:n4" "groupdisease:n4" [9] "groupcontrol:n5" "groupdisease:n5" "groupcontrol:n6" "groupdisease:n6" "groupcontrol:n7" "groupdisease:n7" "groupcontrol:time2" "groupdisease:time2" #Estimate dispersion > y <- estimateGLMCommonDisp(y, design, verbose=TRUE) > y$common.dispersion > y <- estimateGLMTrendedDisp(y, design) > names(y) > summary(y$tagwise.dispersion) > y <- estimateGLMTagwiseDisp(y, design) #Differential expression > fit <- glmFit(y, design) > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1)) ---------------------------------------------------------------------- ----------------------- Now, it obviously didn't work since my groups are not of equal number and that re-numbering my sample this way and the contrast matrix make no sense. So I thought of re-numbering them this way: >n <- factor(c(1, 1,2, 2, 3, 3, 4, 4,5, 5, 6, 6, 7, 7,8, 8,9, 9, 10, 10, 11, 11, 12, 12, 13, 13), levels=c(1,2,3,4,5,6,7,8,9,10,11,12,13)) And designing my matrix this way: > design <- model.matrix(~group+group:n+n:time) > colnames(design) With the entire analysis: ---------------------------------------------------------------------- ----------------------- > patient <- factor(c('patient1', 'patient1', 'patient2', 'patient2', 'patient3', 'patient3', 'patient4', 'patient4', 'patient5', 'patient5','patient6', 'patient6', 'patient7', 'patient7', 'patient8', 'patient8', 'patient9', 'patient9', 'patient10', 'patient10', 'patient11', 'patient11', 'patient12', 'patient12', 'patient13', 'patient13' )) > n <- factor(c(1, 1,2, 2, 3, 3, 4, 4,5, 5, 6, 6, 7, 7,8, 8,9, 9, 10, 10, 11, 11, 12, 12, 13, 13), levels=c(1,2,3,4,5,6,7,8,9,10,11,12,13)) > group <- factor(c('control', 'control','control', 'control','control ','control','control','control','control','control','control','control ','control','control', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease','disease', 'disease', 'disease', 'disease'), levels=c("control","disease")) > time <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2), levels=c(1,2)) sample patient n time group 1 patient1 1 1 control 2 patient1 1 2 control 3 patient2 2 1 control 4 patient2 2 2 control 5 patient3 3 1 control 6 patient3 3 2 control 7 patient4 4 1 control 8 patient4 4 2 control 9 patient5 5 1 control 10 patient5 5 2 control 11 patient6 6 1 control 12 patient6 6 2 control 13 patient7 7 1 control 14 patient7 7 2 control 15 patient8 8 1 disease 16 patient8 8 2 disease 17 patient9 9 1 disease 18 patient9 9 2 disease 19 patient10 10 1 disease 20 patient10 10 2 disease 21 patient11 11 1 disease 22 patient11 11 2 disease 23 patient12 12 1 disease 24 patient12 12 2 disease 25 patient13 13 1 disease 26 patient13 13 2 disease >design <- model.matrix(~group+group:n+n:time) > colnames(design) [1] "(Intercept)" "groupdisease" "groupcontrol:n2" "groupdisease:n2" "groupcontrol:n3" "groupdisease:n3" "groupcontrol:n4" "groupdisease:n4" "groupcontrol:n5" [10] "groupdisease:n5" "groupcontrol:n6" "groupdisease:n6" "groupcontrol:n7" "groupdisease:n7" "groupcontrol:n8" "groupdisease:n8" "groupcontrol:n9" "groupdisease:n9" [19] "groupcontrol:n10" "groupdisease:n10" "groupcontrol:n11" "groupdisease:n11" "groupcontrol:n12" "groupdisease:n12" "groupcontrol:n13" "groupdisease:n13" "n1:time2" [28] "n2:time2" "n3:time2" "n4:time2" "n5:time2" "n6:time2" "n7:time2" "n8:time2" "n9:time2" "n10:time2" [37] "n11:time2" "n12:time2" "n13:time2" ---------------------------------------------------------------------- ----------------------- However, now I am not sure if my matrix is right and I am confused as to what would be the best way to compare the two groups. Any ideas would be greatly appreciated. Thank you very much, L -- output of sessionInfo(): > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.7.1 limma_3.21.3 -- Sent via the guest posting facility at bioconductor.org.
edgeR edgeR • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6