Question

edgeR for RNA-seq: nested paired analysis with uneven groups

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hello, I am new to the field and have a question regarding analysis of RNA- seq data with the edgeR package. I am looking at the difference in gene expression between two groups of patients (control and disease) after stimuli. I need to perform a paired analysis for each participant within each group, and then compare groups (different individuals between groups). My problem is that I have a different number of participants between in each group. Here???s a summary of my experimental design: 1) Paired analysis for control group (n=7) control patient 1: measurement 1 -> disease stimuli -> measurement 2 control patient 2: measurement 1 -> disease stimuli -> measurement 2 ???. Control group expression (n=7) 2) Paired analysis for disease group (n=6) diseased patient1: measurement 1 -> disease stimuli -> measurement 2 diseased patient2: measurement 1 -> disease stimuli -> measurement 2 ??? Disease group expression (n=6) 3) Differential expression Diseases vs healthy groups So I am basically looking for genes that demonstrate the highest variation after stimuli between the two groups. Based on the section 3.5 of the edgeR users guide, I first came up with the following nested paired analysis: ---------------------------------------------------------------------- ---------------------------------- #Individual patients > patient <- factor(c('patient1', 'patient1', 'patient2', 'patient2', 'patient3', 'patient3', 'patient4', 'patient4', 'patient5', 'patient5','patient6', 'patient6', 'patient7', 'patient7', 'patient8', 'patient8', 'patient9', 'patient9', 'patient10', 'patient10', 'patient11', 'patient11', 'patient12', 'patient12', 'patient13', 'patient13' )) #I re-numbered each patient > n <- factor(c(1, 1,2, 2, 3, 3, 4, 4,5, 5, 6, 6, 7, 7,1, 1,2, 2, 3, 3, 4, 4, 5, 5, 6, 6), levels=c(1,2,3,4,5,6,7)) #I assigned them to their respective group > group <- factor(c('control', 'control','control', 'control','control ','control','control','control','control','control','control','control ','control','control', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease','disease', 'disease', 'disease', 'disease'), levels=c("control","disease")) # time 1 refers to measurement before stimuli and time 2 refers to measurement after stimuli > time <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2), levels=c(1,2)) > data.frame(sample=colnames(y),patient,n,time,group) sample patient n time group 1 patient1 1 1 control 2 patient1 1 2 control 3 patient2 2 1 control 4 patient2 2 2 control 5 patient3 3 1 control 6 patient3 3 2 control 7 patient4 4 1 control 8 patient4 4 2 control 9 patient5 5 1 control 10 patient5 5 2 control 11 patient6 6 1 control 12 patient6 6 2 control 13 patient7 7 1 control 14 patient7 7 2 control 15 patient8 1 1 disease 16 patient8 1 2 disease 17 patient9 2 1 disease 18 patient9 2 2 disease 19 patient10 3 1 disease 20 patient10 3 2 disease 21 patient11 4 1 disease 22 patient11 4 2 disease 23 patient12 5 1 disease 24 patient12 5 2 disease 25 patient13 6 1 disease 26 patient13 6 2 disease > design <- model.matrix(~group+group:n+group:time) > colnames(design) [1] "(Intercept)" "groupdisease" "groupcontrol:n2" "groupdisease:n2" "groupcontrol:n3" "groupdisease:n3" "groupcontrol:n4" "groupdisease:n4" [9] "groupcontrol:n5" "groupdisease:n5" "groupcontrol:n6" "groupdisease:n6" "groupcontrol:n7" "groupdisease:n7" "groupcontrol:time2" "groupdisease:time2" #Estimate dispersion > y <- estimateGLMCommonDisp(y, design, verbose=TRUE) > y$common.dispersion > y <- estimateGLMTrendedDisp(y, design) > names(y) > summary(y$tagwise.dispersion) > y <- estimateGLMTagwiseDisp(y, design) #Differential expression > fit <- glmFit(y, design) > lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1)) ---------------------------------------------------------------------- ----------------------- Now, it obviously didn't work since my groups are not of equal number and that re-numbering my sample this way and the contrast matrix make no sense. So I thought of re-numbering them this way: >n <- factor(c(1, 1,2, 2, 3, 3, 4, 4,5, 5, 6, 6, 7, 7,8, 8,9, 9, 10, 10, 11, 11, 12, 12, 13, 13), levels=c(1,2,3,4,5,6,7,8,9,10,11,12,13)) And designing my matrix this way: > design <- model.matrix(~group+group:n+n:time) > colnames(design) With the entire analysis: ---------------------------------------------------------------------- ----------------------- > patient <- factor(c('patient1', 'patient1', 'patient2', 'patient2', 'patient3', 'patient3', 'patient4', 'patient4', 'patient5', 'patient5','patient6', 'patient6', 'patient7', 'patient7', 'patient8', 'patient8', 'patient9', 'patient9', 'patient10', 'patient10', 'patient11', 'patient11', 'patient12', 'patient12', 'patient13', 'patient13' )) > n <- factor(c(1, 1,2, 2, 3, 3, 4, 4,5, 5, 6, 6, 7, 7,8, 8,9, 9, 10, 10, 11, 11, 12, 12, 13, 13), levels=c(1,2,3,4,5,6,7,8,9,10,11,12,13)) > group <- factor(c('control', 'control','control', 'control','control ','control','control','control','control','control','control','control ','control','control', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease', 'disease','disease', 'disease', 'disease', 'disease'), levels=c("control","disease")) > time <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2), levels=c(1,2)) sample patient n time group 1 patient1 1 1 control 2 patient1 1 2 control 3 patient2 2 1 control 4 patient2 2 2 control 5 patient3 3 1 control 6 patient3 3 2 control 7 patient4 4 1 control 8 patient4 4 2 control 9 patient5 5 1 control 10 patient5 5 2 control 11 patient6 6 1 control 12 patient6 6 2 control 13 patient7 7 1 control 14 patient7 7 2 control 15 patient8 8 1 disease 16 patient8 8 2 disease 17 patient9 9 1 disease 18 patient9 9 2 disease 19 patient10 10 1 disease 20 patient10 10 2 disease 21 patient11 11 1 disease 22 patient11 11 2 disease 23 patient12 12 1 disease 24 patient12 12 2 disease 25 patient13 13 1 disease 26 patient13 13 2 disease >design <- model.matrix(~group+group:n+n:time) > colnames(design) [1] "(Intercept)" "groupdisease" "groupcontrol:n2" "groupdisease:n2" "groupcontrol:n3" "groupdisease:n3" "groupcontrol:n4" "groupdisease:n4" "groupcontrol:n5" [10] "groupdisease:n5" "groupcontrol:n6" "groupdisease:n6" "groupcontrol:n7" "groupdisease:n7" "groupcontrol:n8" "groupdisease:n8" "groupcontrol:n9" "groupdisease:n9" [19] "groupcontrol:n10" "groupdisease:n10" "groupcontrol:n11" "groupdisease:n11" "groupcontrol:n12" "groupdisease:n12" "groupcontrol:n13" "groupdisease:n13" "n1:time2" [28] "n2:time2" "n3:time2" "n4:time2" "n5:time2" "n6:time2" "n7:time2" "n8:time2" "n9:time2" "n10:time2" [37] "n11:time2" "n12:time2" "n13:time2" ---------------------------------------------------------------------- ----------------------- However, now I am not sure if my matrix is right and I am confused as to what would be the best way to compare the two groups. Any ideas would be greatly appreciated. Thank you very much, L -- output of sessionInfo(): > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.7.1 limma_3.21.3 -- Sent via the guest posting facility at bioconductor.org.

edgeR edgeR • 1.2k views

ADD COMMENT • link 9.9 years ago Guest User ★ 13k