edgeR glmFit

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

I am working my way through the edgeR User's Guide and following section 3.3, I encounter a problem. I run the following commands - using my own counts data: ## Chapt 3.3 i Users Guide cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",he ader=T) head(cots) data = as.matrix(cots); y = round(data); head(y) dim(data) targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) targets # setting up a combined factor Group<-factor(paste(targets$Treat,targets$Time,sep=".")) cbind(targets,Group=Group) design<-model.matrix(~0+Group) colnames(design)<-levels(Group) design fit<-glmFit(y,design) sessionInfo() Resulting in 'Error in glmFit.default(y, design) : No dispersion values provided.' I do not quite see where my mistake is and would appreciate help. Thank you. jahn -- output of sessionInfo(): cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",h eader=T) > head(cots) E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 comp168996_c1_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 comp442719_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 comp436057_c0_seq13 0.00 7.38 21.04 0.00 0.00 0.21 21.65 0.00 0.00 0.00 0.86 0.57 9.69 18.73 0.00 0.00 0.00 7.14 comp415319_c0_seq20 28.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 comp428135_c0_seq8 249.73 228.53 172.18 57.52 104.48 113.96 187.05 94.64 84.65 134.65 99.82 0.00 65.46 82.83 219.37 59.18 134.04 81.56 comp73458_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 J01R2 J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 comp168996_c1_seq1 1.00 0.00 1.00 0.00 0.00 0.00 0 0.00 0.00 0.00 comp442719_c0_seq1 0.00 0.00 35.09 0.00 0.00 0.00 0 0.00 0.00 0.00 comp436057_c0_seq13 3.07 0.00 0.05 0.00 0.00 19.09 0 0.60 0.00 0.00 comp415319_c0_seq20 0.00 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 comp428135_c0_seq8 87.00 188.03 177.74 115.65 75.66 26.50 0 41.02 131.62 101.48 comp73458_c0_seq1 0.00 1.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > data = as.matrix(cots); > y = round(data); > head(y) E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 comp168996_c1_seq1 0 1 0 0 0 0 0 0 0 comp442719_c0_seq1 0 35 0 0 0 0 0 0 0 comp436057_c0_seq13 0 0 0 0 19 0 1 0 0 comp415319_c0_seq20 0 0 0 0 0 0 0 0 0 comp428135_c0_seq8 188 178 116 76 26 0 41 132 101 comp73458_c0_seq1 1 0 0 0 0 0 0 0 0 > > dim(data) [1] 621694 28 > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > targets Sample Treat Time 1 E00R1 Els 00h 2 E00R2 Els 00h 3 E00R3 Els 00h 4 E01R1 Els 01h 5 E01R2 Els 01h 6 E01R3 Els 01h 7 E05R1 Els 05h 8 E05R2 Els 05h 9 E05R3 Els 05h 10 E48R2 Els 48h 11 E48R3 Els 48h 12 E96R1 Els 96h 13 E96R2 Els 96h 14 E96R3 Els 96h 15 J00R1 Jon 00h 16 J00R2 Jon 00h 17 J00R3 Jon 00h 18 J01R1 Jon 01h 19 J01R2 Jon 01h 20 J01R3 Jon 01h 21 J05R1 Jon 05h 22 J05R2 Jon 05h 23 J05R3 Jon 05h 24 J48R2 Jon 48h 25 J48R3 Jon 48h 26 J96R1 Jon 96h 27 J96R2 Jon 96h 28 J96R3 Jon 96h > > # setting up a combined factor > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > cbind(targets,Group=Group) Sample Treat Time Group 1 E00R1 Els 00h Els.00h 2 E00R2 Els 00h Els.00h 3 E00R3 Els 00h Els.00h 4 E01R1 Els 01h Els.01h 5 E01R2 Els 01h Els.01h 6 E01R3 Els 01h Els.01h 7 E05R1 Els 05h Els.05h 8 E05R2 Els 05h Els.05h 9 E05R3 Els 05h Els.05h 10 E48R2 Els 48h Els.48h 11 E48R3 Els 48h Els.48h 12 E96R1 Els 96h Els.96h 13 E96R2 Els 96h Els.96h 14 E96R3 Els 96h Els.96h 15 J00R1 Jon 00h Jon.00h 16 J00R2 Jon 00h Jon.00h 17 J00R3 Jon 00h Jon.00h 18 J01R1 Jon 01h Jon.01h 19 J01R2 Jon 01h Jon.01h 20 J01R3 Jon 01h Jon.01h 21 J05R1 Jon 05h Jon.05h 22 J05R2 Jon 05h Jon.05h 23 J05R3 Jon 05h Jon.05h 24 J48R2 Jon 48h Jon.48h 25 J48R3 Jon 48h Jon.48h 26 J96R1 Jon 96h Jon.96h 27 J96R2 Jon 96h Jon.96h 28 J96R3 Jon 96h Jon.96h > > design<-model.matrix(~0+Group) > colnames(design)<-levels(Group) > design Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h 1 1 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 0 0 4 0 1 0 0 0 0 0 0 0 0 5 0 1 0 0 0 0 0 0 0 0 6 0 1 0 0 0 0 0 0 0 0 7 0 0 1 0 0 0 0 0 0 0 8 0 0 1 0 0 0 0 0 0 0 9 0 0 1 0 0 0 0 0 0 0 10 0 0 0 1 0 0 0 0 0 0 11 0 0 0 1 0 0 0 0 0 0 12 0 0 0 0 1 0 0 0 0 0 13 0 0 0 0 1 0 0 0 0 0 14 0 0 0 0 1 0 0 0 0 0 15 0 0 0 0 0 1 0 0 0 0 16 0 0 0 0 0 1 0 0 0 0 17 0 0 0 0 0 1 0 0 0 0 18 0 0 0 0 0 0 1 0 0 0 19 0 0 0 0 0 0 1 0 0 0 20 0 0 0 0 0 0 1 0 0 0 21 0 0 0 0 0 0 0 1 0 0 22 0 0 0 0 0 0 0 1 0 0 23 0 0 0 0 0 0 0 1 0 0 24 0 0 0 0 0 0 0 0 1 0 25 0 0 0 0 0 0 0 0 1 0 26 0 0 0 0 0 0 0 0 0 1 27 0 0 0 0 0 0 0 0 0 1 28 0 0 0 0 0 0 0 0 0 1 attr(,"assign") [1] 1 1 1 1 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment" > > fit<-glmFit(y,design) Error in glmFit.default(y, design) : No dispersion values provided. > > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Norwegian (Bokm??l)_Norway.1252 LC_CTYPE=Norwegian (Bokm??l)_Norway.1252 LC_MONETARY=Norwegian (Bokm??l)_Norway.1252 [4] LC_NUMERIC=C LC_TIME=Norwegian (Bokm??l)_Norway.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 [7] IRanges_1.20.6 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 [13] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 -- Sent via the guest posting facility at bioconductor.org.

edgeR edgeR • 6.0k views

ADD COMMENT • link updated 10.4 years ago by Mark Robinson ▴ 880 • written 10.4 years ago by Guest User ★ 13k

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi Jahn, Chapter 3 demonstrates the setup of design matrices; 3.1 says "In this chapter, we outline the principles for setting up the design matrix and forming contrasts for some typical experimental designs". You would then combine that with what is shown in "1.4 Quick start" to do all the steps, something like: > design <- model.matrix(~<add yours="" here="">) > y <- estimateGLMCommonDisp(y,design) > y <- estimateGLMTrendedDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y,design) > lrt <- glmLRT(fit,coef=<add yours="" here="">) > topTags(lrt) In particular, you might have a look at some of the case studies in Chapter 4, for example 4.4 (but using your design matrix). Hope that helps. Best, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://ow.ly/riRea On 18.12.2013, at 11:21, Jahn Davik [guest] <guest at="" bioconductor.org=""> wrote: > > I am working my way through the edgeR User's Guide and following section 3.3, I encounter a problem. I run the following commands - using my own counts data: > ## Chapt 3.3 i Users Guide > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix", header=T) > head(cots) > data = as.matrix(cots); > y = round(data); > head(y) > > dim(data) > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > targets > > # setting up a combined factor > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > cbind(targets,Group=Group) > > design<-model.matrix(~0+Group) > colnames(design)<-levels(Group) > design > > fit<-glmFit(y,design) > > sessionInfo() > > Resulting in 'Error in glmFit.default(y, design) : No dispersion values provided.' > > I do not quite see where my mistake is and would appreciate help. > Thank you. > > jahn > > > > > > > > > -- output of sessionInfo(): > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix", header=T) >> head(cots) > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 > comp168996_c1_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > comp442719_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > comp436057_c0_seq13 0.00 7.38 21.04 0.00 0.00 0.21 21.65 0.00 0.00 0.00 0.86 0.57 9.69 18.73 0.00 0.00 0.00 7.14 > comp415319_c0_seq20 28.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > comp428135_c0_seq8 249.73 228.53 172.18 57.52 104.48 113.96 187.05 94.64 84.65 134.65 99.82 0.00 65.46 82.83 219.37 59.18 134.04 81.56 > comp73458_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > J01R2 J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > comp168996_c1_seq1 1.00 0.00 1.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > comp442719_c0_seq1 0.00 0.00 35.09 0.00 0.00 0.00 0 0.00 0.00 0.00 > comp436057_c0_seq13 3.07 0.00 0.05 0.00 0.00 19.09 0 0.60 0.00 0.00 > comp415319_c0_seq20 0.00 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > comp428135_c0_seq8 87.00 188.03 177.74 115.65 75.66 26.50 0 41.02 131.62 101.48 > comp73458_c0_seq1 0.00 1.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 >> data = as.matrix(cots); >> y = round(data); >> head(y) > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 > comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 > comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 > comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > comp168996_c1_seq1 0 1 0 0 0 0 0 0 0 > comp442719_c0_seq1 0 35 0 0 0 0 0 0 0 > comp436057_c0_seq13 0 0 0 0 19 0 1 0 0 > comp415319_c0_seq20 0 0 0 0 0 0 0 0 0 > comp428135_c0_seq8 188 178 116 76 26 0 41 132 101 > comp73458_c0_seq1 1 0 0 0 0 0 0 0 0 >> >> dim(data) > [1] 621694 28 >> >> targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) >> targets > Sample Treat Time > 1 E00R1 Els 00h > 2 E00R2 Els 00h > 3 E00R3 Els 00h > 4 E01R1 Els 01h > 5 E01R2 Els 01h > 6 E01R3 Els 01h > 7 E05R1 Els 05h > 8 E05R2 Els 05h > 9 E05R3 Els 05h > 10 E48R2 Els 48h > 11 E48R3 Els 48h > 12 E96R1 Els 96h > 13 E96R2 Els 96h > 14 E96R3 Els 96h > 15 J00R1 Jon 00h > 16 J00R2 Jon 00h > 17 J00R3 Jon 00h > 18 J01R1 Jon 01h > 19 J01R2 Jon 01h > 20 J01R3 Jon 01h > 21 J05R1 Jon 05h > 22 J05R2 Jon 05h > 23 J05R3 Jon 05h > 24 J48R2 Jon 48h > 25 J48R3 Jon 48h > 26 J96R1 Jon 96h > 27 J96R2 Jon 96h > 28 J96R3 Jon 96h >> >> # setting up a combined factor >> >> Group<-factor(paste(targets$Treat,targets$Time,sep=".")) >> cbind(targets,Group=Group) > Sample Treat Time Group > 1 E00R1 Els 00h Els.00h > 2 E00R2 Els 00h Els.00h > 3 E00R3 Els 00h Els.00h > 4 E01R1 Els 01h Els.01h > 5 E01R2 Els 01h Els.01h > 6 E01R3 Els 01h Els.01h > 7 E05R1 Els 05h Els.05h > 8 E05R2 Els 05h Els.05h > 9 E05R3 Els 05h Els.05h > 10 E48R2 Els 48h Els.48h > 11 E48R3 Els 48h Els.48h > 12 E96R1 Els 96h Els.96h > 13 E96R2 Els 96h Els.96h > 14 E96R3 Els 96h Els.96h > 15 J00R1 Jon 00h Jon.00h > 16 J00R2 Jon 00h Jon.00h > 17 J00R3 Jon 00h Jon.00h > 18 J01R1 Jon 01h Jon.01h > 19 J01R2 Jon 01h Jon.01h > 20 J01R3 Jon 01h Jon.01h > 21 J05R1 Jon 05h Jon.05h > 22 J05R2 Jon 05h Jon.05h > 23 J05R3 Jon 05h Jon.05h > 24 J48R2 Jon 48h Jon.48h > 25 J48R3 Jon 48h Jon.48h > 26 J96R1 Jon 96h Jon.96h > 27 J96R2 Jon 96h Jon.96h > 28 J96R3 Jon 96h Jon.96h >> >> design<-model.matrix(~0+Group) >> colnames(design)<-levels(Group) >> design > Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h > 1 1 0 0 0 0 0 0 0 0 0 > 2 1 0 0 0 0 0 0 0 0 0 > 3 1 0 0 0 0 0 0 0 0 0 > 4 0 1 0 0 0 0 0 0 0 0 > 5 0 1 0 0 0 0 0 0 0 0 > 6 0 1 0 0 0 0 0 0 0 0 > 7 0 0 1 0 0 0 0 0 0 0 > 8 0 0 1 0 0 0 0 0 0 0 > 9 0 0 1 0 0 0 0 0 0 0 > 10 0 0 0 1 0 0 0 0 0 0 > 11 0 0 0 1 0 0 0 0 0 0 > 12 0 0 0 0 1 0 0 0 0 0 > 13 0 0 0 0 1 0 0 0 0 0 > 14 0 0 0 0 1 0 0 0 0 0 > 15 0 0 0 0 0 1 0 0 0 0 > 16 0 0 0 0 0 1 0 0 0 0 > 17 0 0 0 0 0 1 0 0 0 0 > 18 0 0 0 0 0 0 1 0 0 0 > 19 0 0 0 0 0 0 1 0 0 0 > 20 0 0 0 0 0 0 1 0 0 0 > 21 0 0 0 0 0 0 0 1 0 0 > 22 0 0 0 0 0 0 0 1 0 0 > 23 0 0 0 0 0 0 0 1 0 0 > 24 0 0 0 0 0 0 0 0 1 0 > 25 0 0 0 0 0 0 0 0 1 0 > 26 0 0 0 0 0 0 0 0 0 1 > 27 0 0 0 0 0 0 0 0 0 1 > 28 0 0 0 0 0 0 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 1 1 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > >> >> fit<-glmFit(y,design) > Error in glmFit.default(y, design) : No dispersion values provided. >> >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Norwegian (Bokm??l)_Norway.1252 LC_CTYPE=Norwegian (Bokm??l)_Norway.1252 LC_MONETARY=Norwegian (Bokm??l)_Norway.1252 > [4] LC_NUMERIC=C LC_TIME=Norwegian (Bokm??l)_Norway.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 > [7] IRanges_1.20.6 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 > [13] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.4 years ago Mark Robinson ▴ 880

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi Jahn, I've cc'd the BioC mailing list; it's best to keep a public record of these exchanges, so others can benefit. So, I believe your problem is a matter of the class of the object that you are sending to estimateGLMCommonDisp(), etc.. Here is an example: > y <- matrix(rnbinom(1000,mu=10,size=10),ncol=4) > d <- DGEList(counts=y,group=c(1,1,2,2)) > design <- model.matrix(~group, data=d$samples) > d1 <- estimateGLMCommonDisp(y, design, verbose=TRUE) Disp = 0.08132 , BCV = 0.2852 > class(d1) [1] "numeric" > d1 <- estimateGLMCommonDisp(d, design, verbose=TRUE) Disp = 0.08132 , BCV = 0.2852 > class(d1) [1] "DGEList" attr(,"package") [1] "edgeR" (if you send a count table, you get a different returned object than if you send a DGEList object) Sorry, I didn't completely spell it out in my last email, but it's there in the Quick Start and also in the case studies; typically, the starting point is creating a 'DGEList' object of the count table, which is the container that stores all the downstream components as well. So, I would change your code to (roughly): [? read data in ?] y <- y[keep,] d <- DGEList(counts=y,group=Group) d <- calcNormFactors(d) [? define design matrix ?] plotMDS(d) # it's always a good plan to look at this d <- estimateGLMCommonDisp(d,design) d <- estimateGLMTrendedDisp(d,design) d <- estimateGLMTagwiseDisp(d,design) And, so on ? Cheers, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://ow.ly/riRea On 19.12.2013, at 13:55, Jahn Davik <jahn.davik at="" bioforsk.no=""> wrote: > Mark, > I'm afraid I have to bother you again. Sorry about that. > Anyway. An indication of where I loose it is appreciated. > > This is what I send to R: > > y<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",hea der=T) > head(y) > y = as.matrix(cots); > y = round(data); > head(y) > > dim(y) > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > targets > > # XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXX > # Filtrering. > # Make sure how you come up with these number !!! > # z<-data > > keep<-rowSums(cpm(y)>2)>=3 > y<-y[keep,] > table(keep) > # XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXxXX > > > ## Chapt 3.3.1 i Users Guide > > # setting up a combined factor > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > cbind(targets,Group=Group) > > design<-model.matrix(~0+Group) > design > colnames(design)<-levels(Group) > design > > y<-estimateGLMCommonDisp(y,design) > y<-estimateGLMTrendedDisp(y,design) > y<-estimateGLMTagwiseDisp(y,design) > fit<-glmFit(y,design) > > > # IT WORKS FINE (I BELIEVE) ALL THE WAY TO estimate GLMTrendedDisp(y,design), WHERE IT RETURNS: > > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y, design) : > No residual df: cannot estimate dispersion > > > THE OUTPUT IS LIKE THIS: > > > > y = as.matrix(cots); > > y = round(data); > > head(y) > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 J01R3 J05R1 J05R2 > comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 > comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 0 > comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 0 0 0 > comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 188 178 116 > comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 > J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > comp168996_c1_seq1 0 0 0 0 0 0 > comp442719_c0_seq1 0 0 0 0 0 0 > comp436057_c0_seq13 0 19 0 1 0 0 > comp415319_c0_seq20 0 0 0 0 0 0 > comp428135_c0_seq8 76 26 0 41 132 101 > comp73458_c0_seq1 0 0 0 0 0 0 > > > > dim(y) > [1] 621694 28 > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > targets > Sample Treat Time > 1 E00R1 Els 00h > 2 E00R2 Els 00h > 3 E00R3 Els 00h > 4 E01R1 Els 01h > 5 E01R2 Els 01h > 6 E01R3 Els 01h > 7 E05R1 Els 05h > 8 E05R2 Els 05h > 9 E05R3 Els 05h > 10 E48R2 Els 48h > 11 E48R3 Els 48h > 12 E96R1 Els 96h > 13 E96R2 Els 96h > 14 E96R3 Els 96h > 15 J00R1 Jon 00h > 16 J00R2 Jon 00h > 17 J00R3 Jon 00h > 18 J01R1 Jon 01h > 19 J01R2 Jon 01h > 20 J01R3 Jon 01h > 21 J05R1 Jon 05h > 22 J05R2 Jon 05h > 23 J05R3 Jon 05h > 24 J48R2 Jon 48h > 25 J48R3 Jon 48h > 26 J96R1 Jon 96h > 27 J96R2 Jon 96h > 28 J96R3 Jon 96h > > > > # XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXX > > # Filtrering. > > # Make sure how you come up with these number !!! > > # z<-data > > > > keep<-rowSums(cpm(y)>2)>=3 > > y<-y[keep,] > > table(keep) > keep > FALSE TRUE > 558829 62865 > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > cbind(targets,Group=Group) > Sample Treat Time Group > 1 E00R1 Els 00h Els.00h > 2 E00R2 Els 00h Els.00h > 3 E00R3 Els 00h Els.00h > 4 E01R1 Els 01h Els.01h > 5 E01R2 Els 01h Els.01h > 6 E01R3 Els 01h Els.01h > 7 E05R1 Els 05h Els.05h > 8 E05R2 Els 05h Els.05h > 9 E05R3 Els 05h Els.05h > 10 E48R2 Els 48h Els.48h > 11 E48R3 Els 48h Els.48h > 12 E96R1 Els 96h Els.96h > 13 E96R2 Els 96h Els.96h > 14 E96R3 Els 96h Els.96h > 15 J00R1 Jon 00h Jon.00h > 16 J00R2 Jon 00h Jon.00h > 17 J00R3 Jon 00h Jon.00h > 18 J01R1 Jon 01h Jon.01h > 19 J01R2 Jon 01h Jon.01h > 20 J01R3 Jon 01h Jon.01h > 21 J05R1 Jon 05h Jon.05h > 22 J05R2 Jon 05h Jon.05h > 23 J05R3 Jon 05h Jon.05h > 24 J48R2 Jon 48h Jon.48h > 25 J48R3 Jon 48h Jon.48h > 26 J96R1 Jon 96h Jon.96h > 27 J96R2 Jon 96h Jon.96h > 28 J96R3 Jon 96h Jon.96h > > design<-model.matrix(~0+Group) > > design > GroupEls.00h GroupEls.01h GroupEls.05h GroupEls.48h GroupEls.96h GroupJon.00h GroupJon.01h GroupJon.05h GroupJon.48h GroupJon.96h > 1 1 0 0 0 0 0 0 0 0 0 > 2 1 0 0 0 0 0 0 0 0 0 > 3 1 0 0 0 0 0 0 0 0 0 > 4 0 1 0 0 0 0 0 0 0 0 > 5 0 1 0 0 0 0 0 0 0 0 > 6 0 1 0 0 0 0 0 0 0 0 > 7 0 0 1 0 0 0 0 0 0 0 > 8 0 0 1 0 0 0 0 0 0 0 > 9 0 0 1 0 0 0 0 0 0 0 > 10 0 0 0 1 0 0 0 0 0 0 > 11 0 0 0 1 0 0 0 0 0 0 > 12 0 0 0 0 1 0 0 0 0 0 > 13 0 0 0 0 1 0 0 0 0 0 > 14 0 0 0 0 1 0 0 0 0 0 > 15 0 0 0 0 0 1 0 0 0 0 > 16 0 0 0 0 0 1 0 0 0 0 > 17 0 0 0 0 0 1 0 0 0 0 > 18 0 0 0 0 0 0 1 0 0 0 > 19 0 0 0 0 0 0 1 0 0 0 > 20 0 0 0 0 0 0 1 0 0 0 > 21 0 0 0 0 0 0 0 1 0 0 > 22 0 0 0 0 0 0 0 1 0 0 > 23 0 0 0 0 0 0 0 1 0 0 > 24 0 0 0 0 0 0 0 0 1 0 > 25 0 0 0 0 0 0 0 0 1 0 > 26 0 0 0 0 0 0 0 0 0 1 > 27 0 0 0 0 0 0 0 0 0 1 > 28 0 0 0 0 0 0 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 1 1 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > colnames(design)<-levels(Group) > > design > Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h > 1 1 0 0 0 0 0 0 0 0 0 > 2 1 0 0 0 0 0 0 0 0 0 > 3 1 0 0 0 0 0 0 0 0 0 > 4 0 1 0 0 0 0 0 0 0 0 > 5 0 1 0 0 0 0 0 0 0 0 > 6 0 1 0 0 0 0 0 0 0 0 > 7 0 0 1 0 0 0 0 0 0 0 > 8 0 0 1 0 0 0 0 0 0 0 > 9 0 0 1 0 0 0 0 0 0 0 > 10 0 0 0 1 0 0 0 0 0 0 > 11 0 0 0 1 0 0 0 0 0 0 > 12 0 0 0 0 1 0 0 0 0 0 > 13 0 0 0 0 1 0 0 0 0 0 > 14 0 0 0 0 1 0 0 0 0 0 > 15 0 0 0 0 0 1 0 0 0 0 > 16 0 0 0 0 0 1 0 0 0 0 > 17 0 0 0 0 0 1 0 0 0 0 > 18 0 0 0 0 0 0 1 0 0 0 > 19 0 0 0 0 0 0 1 0 0 0 > 20 0 0 0 0 0 0 1 0 0 0 > 21 0 0 0 0 0 0 0 1 0 0 > 22 0 0 0 0 0 0 0 1 0 0 > 23 0 0 0 0 0 0 0 1 0 0 > 24 0 0 0 0 0 0 0 0 1 0 > 25 0 0 0 0 0 0 0 0 1 0 > 26 0 0 0 0 0 0 0 0 0 1 > 27 0 0 0 0 0 0 0 0 0 1 > 28 0 0 0 0 0 0 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 1 1 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > y<-estimateGLMCommonDisp(y,design) > > y<-estimateGLMTrendedDisp(y,design) > Error in return(NA, ntags) : multi-argument returns are not permitted > In addition: Warning message: > In estimateGLMTrendedDisp.default(y, design) : > No residual df: cannot estimate dispersion > > y<-estimateGLMTagwiseDisp(y,design) > Warning message: > In estimateGLMTagwiseDisp.default(y, design) : > No residual df: setting dispersion to NA > > > > > Thank you > jahn > > > > > > -----Opprinnelig melding----- > Fra: Mark Robinson [mailto:mark.robinson at imls.uzh.ch] > Sendt: 18. desember 2013 16:12 > Til: Jahn Davik [guest] > Kopi: bioconductor at r-project.org; Jahn Davik > Emne: Re: [BioC] edgeR glmFit > > Hi Jahn, > > Chapter 3 demonstrates the setup of design matrices; 3.1 says "In this chapter, we outline the principles for setting up the design matrix and forming contrasts for some typical experimental designs". You would then combine that with what is shown in "1.4 Quick start" to do all the steps, something like: > > > design <- model.matrix(~<add yours="" here="">) y <- > > estimateGLMCommonDisp(y,design) y <- estimateGLMTrendedDisp(y,design) > > y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y,design) lrt <- > > glmLRT(fit,coef=<add yours="" here="">) > > topTags(lrt) > > In particular, you might have a look at some of the case studies in Chapter 4, for example 4.4 (but using your design matrix). > > Hope that helps. > > Best, Mark > > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://ow.ly/riRea > > > > > > > On 18.12.2013, at 11:21, Jahn Davik [guest] <guest at="" bioconductor.org=""> wrote: > > > > > I am working my way through the edgeR User's Guide and following section 3.3, I encounter a problem. I run the following commands - using my own counts data: > > ## Chapt 3.3 i Users Guide > > > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",he > > ader=T) > > head(cots) > > data = as.matrix(cots); > > y = round(data); > > head(y) > > > > dim(data) > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > targets > > > > # setting up a combined factor > > > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > cbind(targets,Group=Group) > > > > design<-model.matrix(~0+Group) > > colnames(design)<-levels(Group) > > design > > > > fit<-glmFit(y,design) > > > > sessionInfo() > > > > Resulting in 'Error in glmFit.default(y, design) : No dispersion values provided.' > > > > I do not quite see where my mistake is and would appreciate help. > > Thank you. > > > > jahn > > > > > > > > > > > > > > > > > > -- output of sessionInfo(): > > > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",he > > ader=T) > >> head(cots) > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 > > comp168996_c1_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > comp442719_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > comp436057_c0_seq13 0.00 7.38 21.04 0.00 0.00 0.21 21.65 0.00 0.00 0.00 0.86 0.57 9.69 18.73 0.00 0.00 0.00 7.14 > > comp415319_c0_seq20 28.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > comp428135_c0_seq8 249.73 228.53 172.18 57.52 104.48 113.96 187.05 94.64 84.65 134.65 99.82 0.00 65.46 82.83 219.37 59.18 134.04 81.56 > > comp73458_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > J01R2 J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > comp168996_c1_seq1 1.00 0.00 1.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > comp442719_c0_seq1 0.00 0.00 35.09 0.00 0.00 0.00 0 0.00 0.00 0.00 > > comp436057_c0_seq13 3.07 0.00 0.05 0.00 0.00 19.09 0 0.60 0.00 0.00 > > comp415319_c0_seq20 0.00 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > comp428135_c0_seq8 87.00 188.03 177.74 115.65 75.66 26.50 0 41.02 131.62 101.48 > > comp73458_c0_seq1 0.00 1.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > >> data = as.matrix(cots); > >> y = round(data); > >> head(y) > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 > > comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > > comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 > > comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 > > comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > > J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > comp168996_c1_seq1 0 1 0 0 0 0 0 0 0 > > comp442719_c0_seq1 0 35 0 0 0 0 0 0 0 > > comp436057_c0_seq13 0 0 0 0 19 0 1 0 0 > > comp415319_c0_seq20 0 0 0 0 0 0 0 0 0 > > comp428135_c0_seq8 188 178 116 76 26 0 41 132 101 > > comp73458_c0_seq1 1 0 0 0 0 0 0 0 0 > >> > >> dim(data) > > [1] 621694 28 > >> > >> targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > >> targets > > Sample Treat Time > > 1 E00R1 Els 00h > > 2 E00R2 Els 00h > > 3 E00R3 Els 00h > > 4 E01R1 Els 01h > > 5 E01R2 Els 01h > > 6 E01R3 Els 01h > > 7 E05R1 Els 05h > > 8 E05R2 Els 05h > > 9 E05R3 Els 05h > > 10 E48R2 Els 48h > > 11 E48R3 Els 48h > > 12 E96R1 Els 96h > > 13 E96R2 Els 96h > > 14 E96R3 Els 96h > > 15 J00R1 Jon 00h > > 16 J00R2 Jon 00h > > 17 J00R3 Jon 00h > > 18 J01R1 Jon 01h > > 19 J01R2 Jon 01h > > 20 J01R3 Jon 01h > > 21 J05R1 Jon 05h > > 22 J05R2 Jon 05h > > 23 J05R3 Jon 05h > > 24 J48R2 Jon 48h > > 25 J48R3 Jon 48h > > 26 J96R1 Jon 96h > > 27 J96R2 Jon 96h > > 28 J96R3 Jon 96h > >> > >> # setting up a combined factor > >> > >> Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > >> cbind(targets,Group=Group) > > Sample Treat Time Group > > 1 E00R1 Els 00h Els.00h > > 2 E00R2 Els 00h Els.00h > > 3 E00R3 Els 00h Els.00h > > 4 E01R1 Els 01h Els.01h > > 5 E01R2 Els 01h Els.01h > > 6 E01R3 Els 01h Els.01h > > 7 E05R1 Els 05h Els.05h > > 8 E05R2 Els 05h Els.05h > > 9 E05R3 Els 05h Els.05h > > 10 E48R2 Els 48h Els.48h > > 11 E48R3 Els 48h Els.48h > > 12 E96R1 Els 96h Els.96h > > 13 E96R2 Els 96h Els.96h > > 14 E96R3 Els 96h Els.96h > > 15 J00R1 Jon 00h Jon.00h > > 16 J00R2 Jon 00h Jon.00h > > 17 J00R3 Jon 00h Jon.00h > > 18 J01R1 Jon 01h Jon.01h > > 19 J01R2 Jon 01h Jon.01h > > 20 J01R3 Jon 01h Jon.01h > > 21 J05R1 Jon 05h Jon.05h > > 22 J05R2 Jon 05h Jon.05h > > 23 J05R3 Jon 05h Jon.05h > > 24 J48R2 Jon 48h Jon.48h > > 25 J48R3 Jon 48h Jon.48h > > 26 J96R1 Jon 96h Jon.96h > > 27 J96R2 Jon 96h Jon.96h > > 28 J96R3 Jon 96h Jon.96h > >> > >> design<-model.matrix(~0+Group) > >> colnames(design)<-levels(Group) > >> design > > Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h > > 1 1 0 0 0 0 0 0 0 0 0 > > 2 1 0 0 0 0 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 0 0 0 > > 4 0 1 0 0 0 0 0 0 0 0 > > 5 0 1 0 0 0 0 0 0 0 0 > > 6 0 1 0 0 0 0 0 0 0 0 > > 7 0 0 1 0 0 0 0 0 0 0 > > 8 0 0 1 0 0 0 0 0 0 0 > > 9 0 0 1 0 0 0 0 0 0 0 > > 10 0 0 0 1 0 0 0 0 0 0 > > 11 0 0 0 1 0 0 0 0 0 0 > > 12 0 0 0 0 1 0 0 0 0 0 > > 13 0 0 0 0 1 0 0 0 0 0 > > 14 0 0 0 0 1 0 0 0 0 0 > > 15 0 0 0 0 0 1 0 0 0 0 > > 16 0 0 0 0 0 1 0 0 0 0 > > 17 0 0 0 0 0 1 0 0 0 0 > > 18 0 0 0 0 0 0 1 0 0 0 > > 19 0 0 0 0 0 0 1 0 0 0 > > 20 0 0 0 0 0 0 1 0 0 0 > > 21 0 0 0 0 0 0 0 1 0 0 > > 22 0 0 0 0 0 0 0 1 0 0 > > 23 0 0 0 0 0 0 0 1 0 0 > > 24 0 0 0 0 0 0 0 0 1 0 > > 25 0 0 0 0 0 0 0 0 1 0 > > 26 0 0 0 0 0 0 0 0 0 1 > > 27 0 0 0 0 0 0 0 0 0 1 > > 28 0 0 0 0 0 0 0 0 0 1 > > attr(,"assign") > > [1] 1 1 1 1 1 1 1 1 1 1 > > attr(,"contrasts") > > attr(,"contrasts")$Group > > [1] "contr.treatment" > > > >> > >> fit<-glmFit(y,design) > > Error in glmFit.default(y, design) : No dispersion values provided. > >> > >> sessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=Norwegian (Bokm??l)_Norway.1252 LC_CTYPE=Norwegian (Bokm??l)_Norway.1252 LC_MONETARY=Norwegian (Bokm??l)_Norway.1252 > > [4] LC_NUMERIC=C LC_TIME=Norwegian (Bokm??l)_Norway.1252 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0 > > > > loaded via a namespace (and not attached): > > [1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 > > [7] IRanges_1.20.6 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 > > [13] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.4 years ago Mark Robinson ▴ 880

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi Jahn, A few things: 1. Please don't email me directly. USE THE BIOC MAILING LIST. This is the third time I've mentioned this. I've cc'd it again. 2. You are using exactTest() with GLM-based dispersion estimation. This is probably ok, although a bit non-standard. Did you read and follow the case study examples? 3. I think your code is ok. You are comparing J and E at time point 0 and using different subsets of the whole dataset, you get different results. The reason is that the dispersion is estimated differently over different sets of replicates. I can't really tell what is happening without seeing the MDS plot and the BCV curves, but perhaps some of your replicates are wonky, or at least more variable in some conditions than others. 4. 620000 is a lot of transcripts. Lots of issues here, such as adaptor trimming before assembly (among others): http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die- experiences-with-nextera-libraries/ Plus, I'm not sure how your counts were done, but you may want to consolidate the transcripts into clusters for gene-level counting. In my experience, this is never a clean cut process, but you might consider using Corset or something similar: https://code.google.com/p/corset-project/ Best wishes, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://ow.ly/riRea On 20.12.2013, at 10:13, Jahn Davik <jahn.davik at="" bioforsk.no=""> wrote: > Hi Mark, > Thanks for your feedback. > I had to take a couple of steps back in order to get this working, and I guess I still have some way to go here. I come from the SAS world and R is a little cryptic to me. Anyway, following your advice I apparently got the code working. > Still, there is confusion (in my head). > > The data I eventually want to play with is the output from Trinity containing ~620000 transcripts and 30 experimental cells (-> two genotypes sampled at 5 time points with 3 biological replicates, giving a 620000 by 30 matrix). > So, to build on your last input, I made a subsample of this matrix and did only one time point. > > CODE RUN LIKE THIS: > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix", header=T) > head(cots) > > data = as.matrix(cots); > data = round(data); > head(data) > # write.table(data, file="H:/bip1/RNAseq/Data/round.matrix.txt", quote=F, sep="\t") > > keep<-rowSums(cpm(data)>2)>=3 > cots<-data[keep,] > table(keep) > > x<-cots[,c("E00R1","E00R2","E00R3","J00R1","J00R2","J00R3")] > head(x) > > dim(x) > > Group<-factor(substring(colnames(x),1,1)) > Group > > # QUICK START: > # Classic part > > y<-DGEList(counts=x,group=Group) > y<-calcNormFactors(y) > > plotMDS(y) > > # estimating dispersion > # over all genes > y<-estimateGLMCommonDisp(y) > > # genewise > y<-estimateGLMTrendedDisp(y) > y<-estimateGLMTagwiseDisp(y) > > plotBCV(y) > > et<-exactTest(y) > topTags(et) > > > YIELDING THIS OUTPUT AMONG FACTORS: > > topTags(et) > Comparison of groups: J-E > logFC logCPM PValue FDR > comp406486_c0_seq1 -8.08 5.78 5.47e-05 0.117 > comp420150_c0_seq1 9.16 4.97 5.73e-05 0.117 > comp418672_c0_seq18 -16.39 6.73 6.11e-05 0.117 > comp391070_c0_seq1 7.70 6.70 6.20e-05 0.117 > comp421831_c2_seq2 -6.18 6.31 6.59e-05 0.117 > comp418672_c0_seq7 15.97 6.32 6.72e-05 0.117 > comp437190_c0_seq3 -5.80 6.79 7.13e-05 0.117 > comp431729_c0_seq18 -15.80 6.14 7.27e-05 0.117 > comp391070_c0_seq7 7.55 8.99 7.55e-05 0.117 > comp426823_c0_seq5 -15.55 5.89 8.29e-05 0.117 > Then I expanded the matrix with another time point (the last one in my experiment), and wanted to do the same comparison, e.g., comparing the J and E at time point 00. > Here is the code I ran: > > library("edgeR") > > # setwd("H:/bip1/RNASeq/isoforms") > # getwd() > > options(digits=3) > > > ## Read data > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix", header=T) > head(cots) > > data = as.matrix(cots); > data = round(data); > head(data) > # write.table(data, file="H:/bip1/RNAseq/Data/round.matrix.txt", quote=F, sep="\t") > > keep<-rowSums(cpm(data)>2)>=3 > cots<-data[keep,] > table(keep) > > x<-cots[,c("E00R1","E00R2","E00R3","J00R1","J00R2","J00R3","E96R1"," E96R2","E96R3","J96R1","J96R2","J96R3")] > head(x) > > dim(x) > > Group<-factor(substring(colnames(x),1,3)) > Group > > # QUICK START: > # Classic part > > y<-DGEList(counts=x,group=Group) > y$samples > > y<-calcNormFactors(y) > > plotMDS(y) > > # estimating dispersion > # over all genes > y<-estimateGLMCommonDisp(y) > > # genewise > y<-estimateGLMTrendedDisp(y) > y<-estimateGLMTagwiseDisp(y) > > plotBCV(y) > > et<-exactTest(y, pair=c("E00","J00")) > topTags(et) > > > GIVING THE FOLLOWING OUTPUT: > > Comparison of groups: J00-E00 > logFC logCPM PValue FDR > comp436540_c0_seq6 -13.48 4.66 1.54e-05 0.416 > comp437851_c0_seq10 -12.68 3.14 2.63e-05 0.416 > comp436243_c0_seq45 12.79 3.65 3.78e-05 0.416 > comp434611_c0_seq48 12.51 3.12 4.02e-05 0.416 > comp420222_c0_seq1 12.21 3.12 4.58e-05 0.416 > comp425919_c0_seq13 -11.99 2.75 5.65e-05 0.416 > comp435680_c0_seq21 6.35 4.84 5.83e-05 0.416 > comp435275_c0_seq9 -6.80 4.93 6.03e-05 0.416 > comp434001_c0_seq7 -12.18 2.79 6.44e-05 0.416 > comp437026_c0_seq13 12.81 3.68 8.56e-05 0.416 > > > > > Now, I wonder why not these two runs give the same results? One possibility is of course that I code erroneously, but I realize that there may also be statistical explanations. > I would be very happy to hear from you. > > Best regards > Jahn > > > > > > > > > > > -----Opprinnelig melding----- > Fra: Mark Robinson [mailto:mark.robinson at imls.uzh.ch] > Sendt: 19. desember 2013 22:15 > Til: Jahn Davik > Kopi: bioconductor at r-project.org mailman > Emne: Re: [BioC] edgeR glmFit > > Hi Jahn, > > I've cc'd the BioC mailing list; it's best to keep a public record of these exchanges, so others can benefit. > > So, I believe your problem is a matter of the class of the object that you are sending to estimateGLMCommonDisp(), etc.. Here is an example: > > > y <- matrix(rnbinom(1000,mu=10,size=10),ncol=4) > > d <- DGEList(counts=y,group=c(1,1,2,2)) > > design <- model.matrix(~group, data=d$samples) > > d1 <- estimateGLMCommonDisp(y, design, verbose=TRUE) > Disp = 0.08132 , BCV = 0.2852 > > class(d1) > [1] "numeric" > > d1 <- estimateGLMCommonDisp(d, design, verbose=TRUE) > Disp = 0.08132 , BCV = 0.2852 > > class(d1) > [1] "DGEList" > attr(,"package") > [1] "edgeR" > > (if you send a count table, you get a different returned object than if you send a DGEList object) > > Sorry, I didn't completely spell it out in my last email, but it's there in the Quick Start and also in the case studies; typically, the starting point is creating a 'DGEList' object of the count table, which is the container that stores all the downstream components as well. > > So, I would change your code to (roughly): > > [? read data in ?] > > y <- y[keep,] > > d <- DGEList(counts=y,group=Group) > d <- calcNormFactors(d) > > [? define design matrix ?] > > plotMDS(d) # it's always a good plan to look at this > > d <- estimateGLMCommonDisp(d,design) > d <- estimateGLMTrendedDisp(d,design) > d <- estimateGLMTagwiseDisp(d,design) > > And, so on ? > > Cheers, Mark > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://ow.ly/riRea > > > > > > > On 19.12.2013, at 13:55, Jahn Davik <jahn.davik at="" bioforsk.no=""> wrote: > > > Mark, > > I'm afraid I have to bother you again. Sorry about that. > > Anyway. An indication of where I loose it is appreciated. > > > > This is what I send to R: > > > > y<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",heade > > r=T) > > head(y) > > y = as.matrix(cots); > > y = round(data); > > head(y) > > > > dim(y) > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > targets > > > > # > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > XXXXXXX > > # Filtrering. > > # Make sure how you come up with these number !!! > > # z<-data > > > > keep<-rowSums(cpm(y)>2)>=3 > > y<-y[keep,] > > table(keep) > > # > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > XXXXxXX > > > > > > ## Chapt 3.3.1 i Users Guide > > > > # setting up a combined factor > > > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > cbind(targets,Group=Group) > > > > design<-model.matrix(~0+Group) > > design > > colnames(design)<-levels(Group) > > design > > > > y<-estimateGLMCommonDisp(y,design) > > y<-estimateGLMTrendedDisp(y,design) > > y<-estimateGLMTagwiseDisp(y,design) > > fit<-glmFit(y,design) > > > > > > # IT WORKS FINE (I BELIEVE) ALL THE WAY TO estimate GLMTrendedDisp(y,design), WHERE IT RETURNS: > > > > Error in return(NA, ntags) : multi-argument returns are not permitted > > In addition: Warning message: > > In estimateGLMTrendedDisp.default(y, design) : > > No residual df: cannot estimate dispersion > > > > > > THE OUTPUT IS LIKE THIS: > > > > > > > y = as.matrix(cots); > > > y = round(data); > > > head(y) > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 J01R3 J05R1 J05R2 > > comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 > > comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 0 > > comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 0 0 0 > > comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 188 178 116 > > comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 > > J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > comp168996_c1_seq1 0 0 0 0 0 0 > > comp442719_c0_seq1 0 0 0 0 0 0 > > comp436057_c0_seq13 0 19 0 1 0 0 > > comp415319_c0_seq20 0 0 0 0 0 0 > > comp428135_c0_seq8 76 26 0 41 132 101 > > comp73458_c0_seq1 0 0 0 0 0 0 > > > > > > dim(y) > > [1] 621694 28 > > > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > > targets > > Sample Treat Time > > 1 E00R1 Els 00h > > 2 E00R2 Els 00h > > 3 E00R3 Els 00h > > 4 E01R1 Els 01h > > 5 E01R2 Els 01h > > 6 E01R3 Els 01h > > 7 E05R1 Els 05h > > 8 E05R2 Els 05h > > 9 E05R3 Els 05h > > 10 E48R2 Els 48h > > 11 E48R3 Els 48h > > 12 E96R1 Els 96h > > 13 E96R2 Els 96h > > 14 E96R3 Els 96h > > 15 J00R1 Jon 00h > > 16 J00R2 Jon 00h > > 17 J00R3 Jon 00h > > 18 J01R1 Jon 01h > > 19 J01R2 Jon 01h > > 20 J01R3 Jon 01h > > 21 J05R1 Jon 05h > > 22 J05R2 Jon 05h > > 23 J05R3 Jon 05h > > 24 J48R2 Jon 48h > > 25 J48R3 Jon 48h > > 26 J96R1 Jon 96h > > 27 J96R2 Jon 96h > > 28 J96R3 Jon 96h > > > > > > # > > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > > XXXXXXXXX > > > # Filtrering. > > > # Make sure how you come up with these number !!! > > > # z<-data > > > > > > keep<-rowSums(cpm(y)>2)>=3 > > > y<-y[keep,] > > > table(keep) > > keep > > FALSE TRUE > > 558829 62865 > > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > > cbind(targets,Group=Group) > > Sample Treat Time Group > > 1 E00R1 Els 00h Els.00h > > 2 E00R2 Els 00h Els.00h > > 3 E00R3 Els 00h Els.00h > > 4 E01R1 Els 01h Els.01h > > 5 E01R2 Els 01h Els.01h > > 6 E01R3 Els 01h Els.01h > > 7 E05R1 Els 05h Els.05h > > 8 E05R2 Els 05h Els.05h > > 9 E05R3 Els 05h Els.05h > > 10 E48R2 Els 48h Els.48h > > 11 E48R3 Els 48h Els.48h > > 12 E96R1 Els 96h Els.96h > > 13 E96R2 Els 96h Els.96h > > 14 E96R3 Els 96h Els.96h > > 15 J00R1 Jon 00h Jon.00h > > 16 J00R2 Jon 00h Jon.00h > > 17 J00R3 Jon 00h Jon.00h > > 18 J01R1 Jon 01h Jon.01h > > 19 J01R2 Jon 01h Jon.01h > > 20 J01R3 Jon 01h Jon.01h > > 21 J05R1 Jon 05h Jon.05h > > 22 J05R2 Jon 05h Jon.05h > > 23 J05R3 Jon 05h Jon.05h > > 24 J48R2 Jon 48h Jon.48h > > 25 J48R3 Jon 48h Jon.48h > > 26 J96R1 Jon 96h Jon.96h > > 27 J96R2 Jon 96h Jon.96h > > 28 J96R3 Jon 96h Jon.96h > > > design<-model.matrix(~0+Group) > > > design > > GroupEls.00h GroupEls.01h GroupEls.05h GroupEls.48h GroupEls.96h GroupJon.00h GroupJon.01h GroupJon.05h GroupJon.48h GroupJon.96h > > 1 1 0 0 0 0 0 0 0 0 0 > > 2 1 0 0 0 0 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 0 0 0 > > 4 0 1 0 0 0 0 0 0 0 0 > > 5 0 1 0 0 0 0 0 0 0 0 > > 6 0 1 0 0 0 0 0 0 0 0 > > 7 0 0 1 0 0 0 0 0 0 0 > > 8 0 0 1 0 0 0 0 0 0 0 > > 9 0 0 1 0 0 0 0 0 0 0 > > 10 0 0 0 1 0 0 0 0 0 0 > > 11 0 0 0 1 0 0 0 0 0 0 > > 12 0 0 0 0 1 0 0 0 0 0 > > 13 0 0 0 0 1 0 0 0 0 0 > > 14 0 0 0 0 1 0 0 0 0 0 > > 15 0 0 0 0 0 1 0 0 0 0 > > 16 0 0 0 0 0 1 0 0 0 0 > > 17 0 0 0 0 0 1 0 0 0 0 > > 18 0 0 0 0 0 0 1 0 0 0 > > 19 0 0 0 0 0 0 1 0 0 0 > > 20 0 0 0 0 0 0 1 0 0 0 > > 21 0 0 0 0 0 0 0 1 0 0 > > 22 0 0 0 0 0 0 0 1 0 0 > > 23 0 0 0 0 0 0 0 1 0 0 > > 24 0 0 0 0 0 0 0 0 1 0 > > 25 0 0 0 0 0 0 0 0 1 0 > > 26 0 0 0 0 0 0 0 0 0 1 > > 27 0 0 0 0 0 0 0 0 0 1 > > 28 0 0 0 0 0 0 0 0 0 1 > > attr(,"assign") > > [1] 1 1 1 1 1 1 1 1 1 1 > > attr(,"contrasts") > > attr(,"contrasts")$Group > > [1] "contr.treatment" > > > > > colnames(design)<-levels(Group) > > > design > > Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h > > 1 1 0 0 0 0 0 0 0 0 0 > > 2 1 0 0 0 0 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 0 0 0 > > 4 0 1 0 0 0 0 0 0 0 0 > > 5 0 1 0 0 0 0 0 0 0 0 > > 6 0 1 0 0 0 0 0 0 0 0 > > 7 0 0 1 0 0 0 0 0 0 0 > > 8 0 0 1 0 0 0 0 0 0 0 > > 9 0 0 1 0 0 0 0 0 0 0 > > 10 0 0 0 1 0 0 0 0 0 0 > > 11 0 0 0 1 0 0 0 0 0 0 > > 12 0 0 0 0 1 0 0 0 0 0 > > 13 0 0 0 0 1 0 0 0 0 0 > > 14 0 0 0 0 1 0 0 0 0 0 > > 15 0 0 0 0 0 1 0 0 0 0 > > 16 0 0 0 0 0 1 0 0 0 0 > > 17 0 0 0 0 0 1 0 0 0 0 > > 18 0 0 0 0 0 0 1 0 0 0 > > 19 0 0 0 0 0 0 1 0 0 0 > > 20 0 0 0 0 0 0 1 0 0 0 > > 21 0 0 0 0 0 0 0 1 0 0 > > 22 0 0 0 0 0 0 0 1 0 0 > > 23 0 0 0 0 0 0 0 1 0 0 > > 24 0 0 0 0 0 0 0 0 1 0 > > 25 0 0 0 0 0 0 0 0 1 0 > > 26 0 0 0 0 0 0 0 0 0 1 > > 27 0 0 0 0 0 0 0 0 0 1 > > 28 0 0 0 0 0 0 0 0 0 1 > > attr(,"assign") > > [1] 1 1 1 1 1 1 1 1 1 1 > > attr(,"contrasts") > > attr(,"contrasts")$Group > > [1] "contr.treatment" > > > > > y<-estimateGLMCommonDisp(y,design) > > > y<-estimateGLMTrendedDisp(y,design) > > Error in return(NA, ntags) : multi-argument returns are not permitted > > In addition: Warning message: > > In estimateGLMTrendedDisp.default(y, design) : > > No residual df: cannot estimate dispersion > > > y<-estimateGLMTagwiseDisp(y,design) > > Warning message: > > In estimateGLMTagwiseDisp.default(y, design) : > > No residual df: setting dispersion to NA > > > > > > > > > Thank you > > jahn > > > > > > > > > > > > -----Opprinnelig melding----- > > Fra: Mark Robinson [mailto:mark.robinson at imls.uzh.ch] > > Sendt: 18. desember 2013 16:12 > > Til: Jahn Davik [guest] > > Kopi: bioconductor at r-project.org; Jahn Davik > > Emne: Re: [BioC] edgeR glmFit > > > > Hi Jahn, > > > > Chapter 3 demonstrates the setup of design matrices; 3.1 says "In this chapter, we outline the principles for setting up the design matrix and forming contrasts for some typical experimental designs". You would then combine that with what is shown in "1.4 Quick start" to do all the steps, something like: > > > > > design <- model.matrix(~<add yours="" here="">) y <- > > > estimateGLMCommonDisp(y,design) y <- > > > estimateGLMTrendedDisp(y,design) y <- > > > estimateGLMTagwiseDisp(y,design) fit <- glmFit(y,design) lrt <- > > > glmLRT(fit,coef=<add yours="" here="">) > > > topTags(lrt) > > > > In particular, you might have a look at some of the case studies in Chapter 4, for example 4.4 (but using your design matrix). > > > > Hope that helps. > > > > Best, Mark > > > > > > ---------- > > Prof. Dr. Mark Robinson > > Bioinformatics, Institute of Molecular Life Sciences University of > > Zurich http://ow.ly/riRea > > > > > > > > > > > > > > On 18.12.2013, at 11:21, Jahn Davik [guest] <guest at="" bioconductor.org=""> wrote: > > > > > > > > I am working my way through the edgeR User's Guide and following section 3.3, I encounter a problem. I run the following commands - using my own counts data: > > > ## Chapt 3.3 i Users Guide > > > > > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix", > > > he > > > ader=T) > > > head(cots) > > > data = as.matrix(cots); > > > y = round(data); > > > head(y) > > > > > > dim(data) > > > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > > targets > > > > > > # setting up a combined factor > > > > > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > > cbind(targets,Group=Group) > > > > > > design<-model.matrix(~0+Group) > > > colnames(design)<-levels(Group) > > > design > > > > > > fit<-glmFit(y,design) > > > > > > sessionInfo() > > > > > > Resulting in 'Error in glmFit.default(y, design) : No dispersion values provided.' > > > > > > I do not quite see where my mistake is and would appreciate help. > > > Thank you. > > > > > > jahn > > > > > > > > > > > > > > > > > > > > > > > > > > > -- output of sessionInfo(): > > > > > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix", > > > he > > > ader=T) > > >> head(cots) > > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 > > > comp168996_c1_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > comp442719_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > comp436057_c0_seq13 0.00 7.38 21.04 0.00 0.00 0.21 21.65 0.00 0.00 0.00 0.86 0.57 9.69 18.73 0.00 0.00 0.00 7.14 > > > comp415319_c0_seq20 28.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > comp428135_c0_seq8 249.73 228.53 172.18 57.52 104.48 113.96 187.05 94.64 84.65 134.65 99.82 0.00 65.46 82.83 219.37 59.18 134.04 81.56 > > > comp73458_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > J01R2 J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > > comp168996_c1_seq1 1.00 0.00 1.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > > comp442719_c0_seq1 0.00 0.00 35.09 0.00 0.00 0.00 0 0.00 0.00 0.00 > > > comp436057_c0_seq13 3.07 0.00 0.05 0.00 0.00 19.09 0 0.60 0.00 0.00 > > > comp415319_c0_seq20 0.00 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > > comp428135_c0_seq8 87.00 188.03 177.74 115.65 75.66 26.50 0 41.02 131.62 101.48 > > > comp73458_c0_seq1 0.00 1.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > >> data = as.matrix(cots); > > >> y = round(data); > > >> head(y) > > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 > > > comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > > > comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 > > > comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 > > > comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > > > J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > > comp168996_c1_seq1 0 1 0 0 0 0 0 0 0 > > > comp442719_c0_seq1 0 35 0 0 0 0 0 0 0 > > > comp436057_c0_seq13 0 0 0 0 19 0 1 0 0 > > > comp415319_c0_seq20 0 0 0 0 0 0 0 0 0 > > > comp428135_c0_seq8 188 178 116 76 26 0 41 132 101 > > > comp73458_c0_seq1 1 0 0 0 0 0 0 0 0 > > >> > > >> dim(data) > > > [1] 621694 28 > > >> > > >> targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > >> targets > > > Sample Treat Time > > > 1 E00R1 Els 00h > > > 2 E00R2 Els 00h > > > 3 E00R3 Els 00h > > > 4 E01R1 Els 01h > > > 5 E01R2 Els 01h > > > 6 E01R3 Els 01h > > > 7 E05R1 Els 05h > > > 8 E05R2 Els 05h > > > 9 E05R3 Els 05h > > > 10 E48R2 Els 48h > > > 11 E48R3 Els 48h > > > 12 E96R1 Els 96h > > > 13 E96R2 Els 96h > > > 14 E96R3 Els 96h > > > 15 J00R1 Jon 00h > > > 16 J00R2 Jon 00h > > > 17 J00R3 Jon 00h > > > 18 J01R1 Jon 01h > > > 19 J01R2 Jon 01h > > > 20 J01R3 Jon 01h > > > 21 J05R1 Jon 05h > > > 22 J05R2 Jon 05h > > > 23 J05R3 Jon 05h > > > 24 J48R2 Jon 48h > > > 25 J48R3 Jon 48h > > > 26 J96R1 Jon 96h > > > 27 J96R2 Jon 96h > > > 28 J96R3 Jon 96h > > >> > > >> # setting up a combined factor > > >> > > >> Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > >> cbind(targets,Group=Group) > > > Sample Treat Time Group > > > 1 E00R1 Els 00h Els.00h > > > 2 E00R2 Els 00h Els.00h > > > 3 E00R3 Els 00h Els.00h > > > 4 E01R1 Els 01h Els.01h > > > 5 E01R2 Els 01h Els.01h > > > 6 E01R3 Els 01h Els.01h > > > 7 E05R1 Els 05h Els.05h > > > 8 E05R2 Els 05h Els.05h > > > 9 E05R3 Els 05h Els.05h > > > 10 E48R2 Els 48h Els.48h > > > 11 E48R3 Els 48h Els.48h > > > 12 E96R1 Els 96h Els.96h > > > 13 E96R2 Els 96h Els.96h > > > 14 E96R3 Els 96h Els.96h > > > 15 J00R1 Jon 00h Jon.00h > > > 16 J00R2 Jon 00h Jon.00h > > > 17 J00R3 Jon 00h Jon.00h > > > 18 J01R1 Jon 01h Jon.01h > > > 19 J01R2 Jon 01h Jon.01h > > > 20 J01R3 Jon 01h Jon.01h > > > 21 J05R1 Jon 05h Jon.05h > > > 22 J05R2 Jon 05h Jon.05h > > > 23 J05R3 Jon 05h Jon.05h > > > 24 J48R2 Jon 48h Jon.48h > > > 25 J48R3 Jon 48h Jon.48h > > > 26 J96R1 Jon 96h Jon.96h > > > 27 J96R2 Jon 96h Jon.96h > > > 28 J96R3 Jon 96h Jon.96h > > >> > > >> design<-model.matrix(~0+Group) > > >> colnames(design)<-levels(Group) > > >> design > > > Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h > > > 1 1 0 0 0 0 0 0 0 0 0 > > > 2 1 0 0 0 0 0 0 0 0 0 > > > 3 1 0 0 0 0 0 0 0 0 0 > > > 4 0 1 0 0 0 0 0 0 0 0 > > > 5 0 1 0 0 0 0 0 0 0 0 > > > 6 0 1 0 0 0 0 0 0 0 0 > > > 7 0 0 1 0 0 0 0 0 0 0 > > > 8 0 0 1 0 0 0 0 0 0 0 > > > 9 0 0 1 0 0 0 0 0 0 0 > > > 10 0 0 0 1 0 0 0 0 0 0 > > > 11 0 0 0 1 0 0 0 0 0 0 > > > 12 0 0 0 0 1 0 0 0 0 0 > > > 13 0 0 0 0 1 0 0 0 0 0 > > > 14 0 0 0 0 1 0 0 0 0 0 > > > 15 0 0 0 0 0 1 0 0 0 0 > > > 16 0 0 0 0 0 1 0 0 0 0 > > > 17 0 0 0 0 0 1 0 0 0 0 > > > 18 0 0 0 0 0 0 1 0 0 0 > > > 19 0 0 0 0 0 0 1 0 0 0 > > > 20 0 0 0 0 0 0 1 0 0 0 > > > 21 0 0 0 0 0 0 0 1 0 0 > > > 22 0 0 0 0 0 0 0 1 0 0 > > > 23 0 0 0 0 0 0 0 1 0 0 > > > 24 0 0 0 0 0 0 0 0 1 0 > > > 25 0 0 0 0 0 0 0 0 1 0 > > > 26 0 0 0 0 0 0 0 0 0 1 > > > 27 0 0 0 0 0 0 0 0 0 1 > > > 28 0 0 0 0 0 0 0 0 0 1 > > > attr(,"assign") > > > [1] 1 1 1 1 1 1 1 1 1 1 > > > attr(,"contrasts") > > > attr(,"contrasts")$Group > > > [1] "contr.treatment" > > > > > >> > > >> fit<-glmFit(y,design) > > > Error in glmFit.default(y, design) : No dispersion values provided. > > >> > > >> sessionInfo() > > > R version 3.0.2 (2013-09-25) > > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > > > locale: > > > [1] LC_COLLATE=Norwegian (Bokm??l)_Norway.1252 LC_CTYPE=Norwegian (Bokm??l)_Norway.1252 LC_MONETARY=Norwegian (Bokm??l)_Norway.1252 > > > [4] LC_NUMERIC=C LC_TIME=Norwegian (Bokm??l)_Norway.1252 > > > > > > attached base packages: > > > [1] parallel stats graphics grDevices utils datasets methods base > > > > > > other attached packages: > > > [1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0 > > > > > > loaded via a namespace (and not attached): > > > [1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 > > > [7] IRanges_1.20.6 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 > > > [13] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 > > > > > > -- > > > Sent via the guest posting facility at bioconductor.org. > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.4 years ago Mark Robinson ▴ 880

0

Entering edit mode

Hi Mark, -----Opprinnelig melding----- Fra: Mark Robinson [mailto:mark.robinson at imls.uzh.ch] Sendt: 20. desember 2013 15:27 Til: Jahn Davik Kopi: bioconductor at r-project.org mailman Emne: Re: [BioC] edgeR glmFit Hi Jahn, A few things: 1. Please don't email me directly. USE THE BIOC MAILING LIST. This is the third time I've mentioned this. I've cc'd it again. SORRY ABOUT THAT. WON'T HAPPEN AGAIN. I GET YOUR REPLY AND I JUST PRESS THE 'ANSWER' BUTTON. 2. You are using exactTest() with GLM-based dispersion estimation. I CHECKED THAT PART. AND ACTUALLY, WHEN I USED THE 'NON-GLM-BASED' ESTIMATION CALLS, THE DISCREPANCIES WERE NOT AS STRIKING. This is probably ok, although a bit non-standard. Did you read and follow the case study examples? I DID FOLLOW THE EXAMPLES. THEY DO NOT TOTALLY COVER MY DATA. IN PARTICULAR THE TIME POINT ELEMENT AND THE FACT THAT SUCH SAMPLES SHOULD BE CORRELATED DOES NOT SEEM TREATED. A TWO-WAY ANOVA APPROACH HAS BEEN SUGGESTED, BUT THAT DOES NOT TAKE THIS CORRELATION ISSUE INTO CONSIDERATION. I WOULD LOVE TO HEAR YOUR TAKE ON THIS. 3. I think your code is ok. You are comparing J and E at time point 0 and using different subsets of the whole dataset, you get different results. The reason is that the dispersion is estimated differently over different sets of replicates. I can't really tell what is happening without seeing the MDS plot and the BCV curves, but perhaps some of your replicates are wonky, or at least more variable in some conditions than others. I SUSPECTED SOMETHING LIKE THAT. ON THE MDS PLOTS THE REPS LOOK JUST A LITTLE DIFFERENT. 4. 620000 is a lot of transcripts. Lots of issues here, such as adaptor trimming before assembly (among others): THAT NUMBER OF TRANSCRIPTS IS PRETTY AVERAGE ACCORDING TO THE TRINITY PEOPLE. IN THE edgeR CONTEXT, HOWEVER, I FILTER OUT TRANSCRIPTS WITH LOW HIT RATE. http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die- experiences-with-nextera-libraries/ Plus, I'm not sure how your counts were done, but you may want to consolidate the transcripts into clusters for gene-level counting. In my experience, this is never a clean cut process, but you might consider using Corset or something similar: https://code.google.com/p/corset-project/ i'LL CHECK THIS OUT. ADAPTOR TRIMMING, ETC, IS DONE BY OUR SEQUENCE PROVIDOR. IN ADDITION I'VE USE TRIMMOMATIC TO GET RID OF LOW QUALITY READS. Best wishes, Mark THANKS. HAVE A GOOD HOLYDAY ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://ow.ly/riRea On 20.12.2013, at 10:13, Jahn Davik <jahn.davik at="" bioforsk.no=""> wrote: > Hi Mark, > Thanks for your feedback. > I had to take a couple of steps back in order to get this working, and I guess I still have some way to go here. I come from the SAS world and R is a little cryptic to me. Anyway, following your advice I apparently got the code working. > Still, there is confusion (in my head). > > The data I eventually want to play with is the output from Trinity containing ~620000 transcripts and 30 experimental cells (-> two genotypes sampled at 5 time points with 3 biological replicates, giving a 620000 by 30 matrix). > So, to build on your last input, I made a subsample of this matrix and did only one time point. > > CODE RUN LIKE THIS: > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",he > ader=T) > head(cots) > > data = as.matrix(cots); > data = round(data); > head(data) > # write.table(data, file="H:/bip1/RNAseq/Data/round.matrix.txt", > quote=F, sep="\t") > > keep<-rowSums(cpm(data)>2)>=3 > cots<-data[keep,] > table(keep) > > x<-cots[,c("E00R1","E00R2","E00R3","J00R1","J00R2","J00R3")] > head(x) > > dim(x) > > Group<-factor(substring(colnames(x),1,1)) > Group > > # QUICK START: > # Classic part > > y<-DGEList(counts=x,group=Group) > y<-calcNormFactors(y) > > plotMDS(y) > > # estimating dispersion > # over all genes > y<-estimateGLMCommonDisp(y) > > # genewise > y<-estimateGLMTrendedDisp(y) > y<-estimateGLMTagwiseDisp(y) > > plotBCV(y) > > et<-exactTest(y) > topTags(et) > > > YIELDING THIS OUTPUT AMONG FACTORS: > > topTags(et) > Comparison of groups: J-E > logFC logCPM PValue FDR > comp406486_c0_seq1 -8.08 5.78 5.47e-05 0.117 > comp420150_c0_seq1 9.16 4.97 5.73e-05 0.117 > comp418672_c0_seq18 -16.39 6.73 6.11e-05 0.117 > comp391070_c0_seq1 7.70 6.70 6.20e-05 0.117 > comp421831_c2_seq2 -6.18 6.31 6.59e-05 0.117 > comp418672_c0_seq7 15.97 6.32 6.72e-05 0.117 > comp437190_c0_seq3 -5.80 6.79 7.13e-05 0.117 > comp431729_c0_seq18 -15.80 6.14 7.27e-05 0.117 > comp391070_c0_seq7 7.55 8.99 7.55e-05 0.117 > comp426823_c0_seq5 -15.55 5.89 8.29e-05 0.117 > Then I expanded the matrix with another time point (the last one in my experiment), and wanted to do the same comparison, e.g., comparing the J and E at time point 00. > Here is the code I ran: > > library("edgeR") > > # setwd("H:/bip1/RNASeq/isoforms") > # getwd() > > options(digits=3) > > > ## Read data > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",he > ader=T) > head(cots) > > data = as.matrix(cots); > data = round(data); > head(data) > # write.table(data, file="H:/bip1/RNAseq/Data/round.matrix.txt", > quote=F, sep="\t") > > keep<-rowSums(cpm(data)>2)>=3 > cots<-data[keep,] > table(keep) > > x<-cots[,c("E00R1","E00R2","E00R3","J00R1","J00R2","J00R3","E96R1","E9 > 6R2","E96R3","J96R1","J96R2","J96R3")] > head(x) > > dim(x) > > Group<-factor(substring(colnames(x),1,3)) > Group > > # QUICK START: > # Classic part > > y<-DGEList(counts=x,group=Group) > y$samples > > y<-calcNormFactors(y) > > plotMDS(y) > > # estimating dispersion > # over all genes > y<-estimateGLMCommonDisp(y) > > # genewise > y<-estimateGLMTrendedDisp(y) > y<-estimateGLMTagwiseDisp(y) > > plotBCV(y) > > et<-exactTest(y, pair=c("E00","J00")) > topTags(et) > > > GIVING THE FOLLOWING OUTPUT: > > Comparison of groups: J00-E00 > logFC logCPM PValue FDR > comp436540_c0_seq6 -13.48 4.66 1.54e-05 0.416 > comp437851_c0_seq10 -12.68 3.14 2.63e-05 0.416 > comp436243_c0_seq45 12.79 3.65 3.78e-05 0.416 > comp434611_c0_seq48 12.51 3.12 4.02e-05 0.416 > comp420222_c0_seq1 12.21 3.12 4.58e-05 0.416 > comp425919_c0_seq13 -11.99 2.75 5.65e-05 0.416 > comp435680_c0_seq21 6.35 4.84 5.83e-05 0.416 > comp435275_c0_seq9 -6.80 4.93 6.03e-05 0.416 > comp434001_c0_seq7 -12.18 2.79 6.44e-05 0.416 > comp437026_c0_seq13 12.81 3.68 8.56e-05 0.416 > > > > > Now, I wonder why not these two runs give the same results? One possibility is of course that I code erroneously, but I realize that there may also be statistical explanations. > I would be very happy to hear from you. > > Best regards > Jahn > > > > > > > > > > > -----Opprinnelig melding----- > Fra: Mark Robinson [mailto:mark.robinson at imls.uzh.ch] > Sendt: 19. desember 2013 22:15 > Til: Jahn Davik > Kopi: bioconductor at r-project.org mailman > Emne: Re: [BioC] edgeR glmFit > > Hi Jahn, > > I've cc'd the BioC mailing list; it's best to keep a public record of these exchanges, so others can benefit. > > So, I believe your problem is a matter of the class of the object that you are sending to estimateGLMCommonDisp(), etc.. Here is an example: > > > y <- matrix(rnbinom(1000,mu=10,size=10),ncol=4) > > d <- DGEList(counts=y,group=c(1,1,2,2)) > > design <- model.matrix(~group, data=d$samples) > > d1 <- estimateGLMCommonDisp(y, design, verbose=TRUE) > Disp = 0.08132 , BCV = 0.2852 > > class(d1) > [1] "numeric" > > d1 <- estimateGLMCommonDisp(d, design, verbose=TRUE) > Disp = 0.08132 , BCV = 0.2852 > > class(d1) > [1] "DGEList" > attr(,"package") > [1] "edgeR" > > (if you send a count table, you get a different returned object than > if you send a DGEList object) > > Sorry, I didn't completely spell it out in my last email, but it's there in the Quick Start and also in the case studies; typically, the starting point is creating a 'DGEList' object of the count table, which is the container that stores all the downstream components as well. > > So, I would change your code to (roughly): > > [. read data in .] > > y <- y[keep,] > > d <- DGEList(counts=y,group=Group) > d <- calcNormFactors(d) > > [. define design matrix .] > > plotMDS(d) # it's always a good plan to look at this > > d <- estimateGLMCommonDisp(d,design) > d <- estimateGLMTrendedDisp(d,design) > d <- estimateGLMTagwiseDisp(d,design) > > And, so on . > > Cheers, Mark > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics, Institute of Molecular Life Sciences University of > Zurich http://ow.ly/riRea > > > > > > > On 19.12.2013, at 13:55, Jahn Davik <jahn.davik at="" bioforsk.no=""> wrote: > > > Mark, > > I'm afraid I have to bother you again. Sorry about that. > > Anyway. An indication of where I loose it is appreciated. > > > > This is what I send to R: > > > > y<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix",hea > > de > > r=T) > > head(y) > > y = as.matrix(cots); > > y = round(data); > > head(y) > > > > dim(y) > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > targets > > > > # > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > XX > > XXXXXXX > > # Filtrering. > > # Make sure how you come up with these number !!! > > # z<-data > > > > keep<-rowSums(cpm(y)>2)>=3 > > y<-y[keep,] > > table(keep) > > # > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > XX > > XXXXxXX > > > > > > ## Chapt 3.3.1 i Users Guide > > > > # setting up a combined factor > > > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > cbind(targets,Group=Group) > > > > design<-model.matrix(~0+Group) > > design > > colnames(design)<-levels(Group) > > design > > > > y<-estimateGLMCommonDisp(y,design) > > y<-estimateGLMTrendedDisp(y,design) > > y<-estimateGLMTagwiseDisp(y,design) > > fit<-glmFit(y,design) > > > > > > # IT WORKS FINE (I BELIEVE) ALL THE WAY TO estimate GLMTrendedDisp(y,design), WHERE IT RETURNS: > > > > Error in return(NA, ntags) : multi-argument returns are not > > permitted In addition: Warning message: > > In estimateGLMTrendedDisp.default(y, design) : > > No residual df: cannot estimate dispersion > > > > > > THE OUTPUT IS LIKE THIS: > > > > > > > y = as.matrix(cots); > > > y = round(data); > > > head(y) > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 J01R3 J05R1 J05R2 > > comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 > > comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 0 > > comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 0 0 0 > > comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 188 178 116 > > comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 > > J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > comp168996_c1_seq1 0 0 0 0 0 0 > > comp442719_c0_seq1 0 0 0 0 0 0 > > comp436057_c0_seq13 0 19 0 1 0 0 > > comp415319_c0_seq20 0 0 0 0 0 0 > > comp428135_c0_seq8 76 26 0 41 132 101 > > comp73458_c0_seq1 0 0 0 0 0 0 > > > > > > dim(y) > > [1] 621694 28 > > > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > > targets > > Sample Treat Time > > 1 E00R1 Els 00h > > 2 E00R2 Els 00h > > 3 E00R3 Els 00h > > 4 E01R1 Els 01h > > 5 E01R2 Els 01h > > 6 E01R3 Els 01h > > 7 E05R1 Els 05h > > 8 E05R2 Els 05h > > 9 E05R3 Els 05h > > 10 E48R2 Els 48h > > 11 E48R3 Els 48h > > 12 E96R1 Els 96h > > 13 E96R2 Els 96h > > 14 E96R3 Els 96h > > 15 J00R1 Jon 00h > > 16 J00R2 Jon 00h > > 17 J00R3 Jon 00h > > 18 J01R1 Jon 01h > > 19 J01R2 Jon 01h > > 20 J01R3 Jon 01h > > 21 J05R1 Jon 05h > > 22 J05R2 Jon 05h > > 23 J05R3 Jon 05h > > 24 J48R2 Jon 48h > > 25 J48R3 Jon 48h > > 26 J96R1 Jon 96h > > 27 J96R2 Jon 96h > > 28 J96R3 Jon 96h > > > > > > # > > > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > > XX > > > XXXXXXXXX > > > # Filtrering. > > > # Make sure how you come up with these number !!! > > > # z<-data > > > > > > keep<-rowSums(cpm(y)>2)>=3 > > > y<-y[keep,] > > > table(keep) > > keep > > FALSE TRUE > > 558829 62865 > > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > > cbind(targets,Group=Group) > > Sample Treat Time Group > > 1 E00R1 Els 00h Els.00h > > 2 E00R2 Els 00h Els.00h > > 3 E00R3 Els 00h Els.00h > > 4 E01R1 Els 01h Els.01h > > 5 E01R2 Els 01h Els.01h > > 6 E01R3 Els 01h Els.01h > > 7 E05R1 Els 05h Els.05h > > 8 E05R2 Els 05h Els.05h > > 9 E05R3 Els 05h Els.05h > > 10 E48R2 Els 48h Els.48h > > 11 E48R3 Els 48h Els.48h > > 12 E96R1 Els 96h Els.96h > > 13 E96R2 Els 96h Els.96h > > 14 E96R3 Els 96h Els.96h > > 15 J00R1 Jon 00h Jon.00h > > 16 J00R2 Jon 00h Jon.00h > > 17 J00R3 Jon 00h Jon.00h > > 18 J01R1 Jon 01h Jon.01h > > 19 J01R2 Jon 01h Jon.01h > > 20 J01R3 Jon 01h Jon.01h > > 21 J05R1 Jon 05h Jon.05h > > 22 J05R2 Jon 05h Jon.05h > > 23 J05R3 Jon 05h Jon.05h > > 24 J48R2 Jon 48h Jon.48h > > 25 J48R3 Jon 48h Jon.48h > > 26 J96R1 Jon 96h Jon.96h > > 27 J96R2 Jon 96h Jon.96h > > 28 J96R3 Jon 96h Jon.96h > > > design<-model.matrix(~0+Group) > > > design > > GroupEls.00h GroupEls.01h GroupEls.05h GroupEls.48h GroupEls.96h GroupJon.00h GroupJon.01h GroupJon.05h GroupJon.48h GroupJon.96h > > 1 1 0 0 0 0 0 0 0 0 0 > > 2 1 0 0 0 0 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 0 0 0 > > 4 0 1 0 0 0 0 0 0 0 0 > > 5 0 1 0 0 0 0 0 0 0 0 > > 6 0 1 0 0 0 0 0 0 0 0 > > 7 0 0 1 0 0 0 0 0 0 0 > > 8 0 0 1 0 0 0 0 0 0 0 > > 9 0 0 1 0 0 0 0 0 0 0 > > 10 0 0 0 1 0 0 0 0 0 0 > > 11 0 0 0 1 0 0 0 0 0 0 > > 12 0 0 0 0 1 0 0 0 0 0 > > 13 0 0 0 0 1 0 0 0 0 0 > > 14 0 0 0 0 1 0 0 0 0 0 > > 15 0 0 0 0 0 1 0 0 0 0 > > 16 0 0 0 0 0 1 0 0 0 0 > > 17 0 0 0 0 0 1 0 0 0 0 > > 18 0 0 0 0 0 0 1 0 0 0 > > 19 0 0 0 0 0 0 1 0 0 0 > > 20 0 0 0 0 0 0 1 0 0 0 > > 21 0 0 0 0 0 0 0 1 0 0 > > 22 0 0 0 0 0 0 0 1 0 0 > > 23 0 0 0 0 0 0 0 1 0 0 > > 24 0 0 0 0 0 0 0 0 1 0 > > 25 0 0 0 0 0 0 0 0 1 0 > > 26 0 0 0 0 0 0 0 0 0 1 > > 27 0 0 0 0 0 0 0 0 0 1 > > 28 0 0 0 0 0 0 0 0 0 1 > > attr(,"assign") > > [1] 1 1 1 1 1 1 1 1 1 1 > > attr(,"contrasts") > > attr(,"contrasts")$Group > > [1] "contr.treatment" > > > > > colnames(design)<-levels(Group) > > > design > > Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h > > 1 1 0 0 0 0 0 0 0 0 0 > > 2 1 0 0 0 0 0 0 0 0 0 > > 3 1 0 0 0 0 0 0 0 0 0 > > 4 0 1 0 0 0 0 0 0 0 0 > > 5 0 1 0 0 0 0 0 0 0 0 > > 6 0 1 0 0 0 0 0 0 0 0 > > 7 0 0 1 0 0 0 0 0 0 0 > > 8 0 0 1 0 0 0 0 0 0 0 > > 9 0 0 1 0 0 0 0 0 0 0 > > 10 0 0 0 1 0 0 0 0 0 0 > > 11 0 0 0 1 0 0 0 0 0 0 > > 12 0 0 0 0 1 0 0 0 0 0 > > 13 0 0 0 0 1 0 0 0 0 0 > > 14 0 0 0 0 1 0 0 0 0 0 > > 15 0 0 0 0 0 1 0 0 0 0 > > 16 0 0 0 0 0 1 0 0 0 0 > > 17 0 0 0 0 0 1 0 0 0 0 > > 18 0 0 0 0 0 0 1 0 0 0 > > 19 0 0 0 0 0 0 1 0 0 0 > > 20 0 0 0 0 0 0 1 0 0 0 > > 21 0 0 0 0 0 0 0 1 0 0 > > 22 0 0 0 0 0 0 0 1 0 0 > > 23 0 0 0 0 0 0 0 1 0 0 > > 24 0 0 0 0 0 0 0 0 1 0 > > 25 0 0 0 0 0 0 0 0 1 0 > > 26 0 0 0 0 0 0 0 0 0 1 > > 27 0 0 0 0 0 0 0 0 0 1 > > 28 0 0 0 0 0 0 0 0 0 1 > > attr(,"assign") > > [1] 1 1 1 1 1 1 1 1 1 1 > > attr(,"contrasts") > > attr(,"contrasts")$Group > > [1] "contr.treatment" > > > > > y<-estimateGLMCommonDisp(y,design) > > > y<-estimateGLMTrendedDisp(y,design) > > Error in return(NA, ntags) : multi-argument returns are not > > permitted In addition: Warning message: > > In estimateGLMTrendedDisp.default(y, design) : > > No residual df: cannot estimate dispersion > > > y<-estimateGLMTagwiseDisp(y,design) > > Warning message: > > In estimateGLMTagwiseDisp.default(y, design) : > > No residual df: setting dispersion to NA > > > > > > > > > Thank you > > jahn > > > > > > > > > > > > -----Opprinnelig melding----- > > Fra: Mark Robinson [mailto:mark.robinson at imls.uzh.ch] > > Sendt: 18. desember 2013 16:12 > > Til: Jahn Davik [guest] > > Kopi: bioconductor at r-project.org; Jahn Davik > > Emne: Re: [BioC] edgeR glmFit > > > > Hi Jahn, > > > > Chapter 3 demonstrates the setup of design matrices; 3.1 says "In this chapter, we outline the principles for setting up the design matrix and forming contrasts for some typical experimental designs". You would then combine that with what is shown in "1.4 Quick start" to do all the steps, something like: > > > > > design <- model.matrix(~<add yours="" here="">) y <- > > > estimateGLMCommonDisp(y,design) y <- > > > estimateGLMTrendedDisp(y,design) y <- > > > estimateGLMTagwiseDisp(y,design) fit <- glmFit(y,design) lrt <- > > > glmLRT(fit,coef=<add yours="" here="">) > > > topTags(lrt) > > > > In particular, you might have a look at some of the case studies in Chapter 4, for example 4.4 (but using your design matrix). > > > > Hope that helps. > > > > Best, Mark > > > > > > ---------- > > Prof. Dr. Mark Robinson > > Bioinformatics, Institute of Molecular Life Sciences University of > > Zurich http://ow.ly/riRea > > > > > > > > > > > > > > On 18.12.2013, at 11:21, Jahn Davik [guest] <guest at="" bioconductor.org=""> wrote: > > > > > > > > I am working my way through the edgeR User's Guide and following section 3.3, I encounter a problem. I run the following commands - using my own counts data: > > > ## Chapt 3.3 i Users Guide > > > > > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix > > > ", > > > he > > > ader=T) > > > head(cots) > > > data = as.matrix(cots); > > > y = round(data); > > > head(y) > > > > > > dim(data) > > > > > > targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > > targets > > > > > > # setting up a combined factor > > > > > > Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > > cbind(targets,Group=Group) > > > > > > design<-model.matrix(~0+Group) > > > colnames(design)<-levels(Group) > > > design > > > > > > fit<-glmFit(y,design) > > > > > > sessionInfo() > > > > > > Resulting in 'Error in glmFit.default(y, design) : No dispersion values provided.' > > > > > > I do not quite see where my mistake is and would appreciate help. > > > Thank you. > > > > > > jahn > > > > > > > > > > > > > > > > > > > > > > > > > > > -- output of sessionInfo(): > > > > > > cots<-read.table("H:/bip1/RNAseq/Data/transcriptsRed.counts.matrix > > > ", > > > he > > > ader=T) > > >> head(cots) > > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 > > > comp168996_c1_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > comp442719_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > comp436057_c0_seq13 0.00 7.38 21.04 0.00 0.00 0.21 21.65 0.00 0.00 0.00 0.86 0.57 9.69 18.73 0.00 0.00 0.00 7.14 > > > comp415319_c0_seq20 28.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > comp428135_c0_seq8 249.73 228.53 172.18 57.52 104.48 113.96 187.05 94.64 84.65 134.65 99.82 0.00 65.46 82.83 219.37 59.18 134.04 81.56 > > > comp73458_c0_seq1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > > J01R2 J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > > comp168996_c1_seq1 1.00 0.00 1.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > > comp442719_c0_seq1 0.00 0.00 35.09 0.00 0.00 0.00 0 0.00 0.00 0.00 > > > comp436057_c0_seq13 3.07 0.00 0.05 0.00 0.00 19.09 0 0.60 0.00 0.00 > > > comp415319_c0_seq20 0.00 0.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > > comp428135_c0_seq8 87.00 188.03 177.74 115.65 75.66 26.50 0 41.02 131.62 101.48 > > > comp73458_c0_seq1 0.00 1.00 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 > > >> data = as.matrix(cots); > > >> y = round(data); > > >> head(y) > > > E00R1 E00R2 E00R3 E01R1 E01R2 E01R3 E05R1 E05R2 E05R3 E48R2 E48R3 E96R1 E96R2 E96R3 J00R1 J00R2 J00R3 J01R1 J01R2 > > > comp168996_c1_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > > > comp442719_c0_seq1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > comp436057_c0_seq13 0 7 21 0 0 0 22 0 0 0 1 1 10 19 0 0 0 7 3 > > > comp415319_c0_seq20 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > comp428135_c0_seq8 250 229 172 58 104 114 187 95 85 135 100 0 65 83 219 59 134 82 87 > > > comp73458_c0_seq1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 > > > J01R3 J05R1 J05R2 J05R3 J48R2 J48R3 J96R1 J96R2 J96R3 > > > comp168996_c1_seq1 0 1 0 0 0 0 0 0 0 > > > comp442719_c0_seq1 0 35 0 0 0 0 0 0 0 > > > comp436057_c0_seq13 0 0 0 0 19 0 1 0 0 > > > comp415319_c0_seq20 0 0 0 0 0 0 0 0 0 > > > comp428135_c0_seq8 188 178 116 76 26 0 41 132 101 > > > comp73458_c0_seq1 1 0 0 0 0 0 0 0 0 > > >> > > >> dim(data) > > > [1] 621694 28 > > >> > > >> targets<-read.table("H:/bip1/RNAseq/Data/targets.txt",header=T) > > >> targets > > > Sample Treat Time > > > 1 E00R1 Els 00h > > > 2 E00R2 Els 00h > > > 3 E00R3 Els 00h > > > 4 E01R1 Els 01h > > > 5 E01R2 Els 01h > > > 6 E01R3 Els 01h > > > 7 E05R1 Els 05h > > > 8 E05R2 Els 05h > > > 9 E05R3 Els 05h > > > 10 E48R2 Els 48h > > > 11 E48R3 Els 48h > > > 12 E96R1 Els 96h > > > 13 E96R2 Els 96h > > > 14 E96R3 Els 96h > > > 15 J00R1 Jon 00h > > > 16 J00R2 Jon 00h > > > 17 J00R3 Jon 00h > > > 18 J01R1 Jon 01h > > > 19 J01R2 Jon 01h > > > 20 J01R3 Jon 01h > > > 21 J05R1 Jon 05h > > > 22 J05R2 Jon 05h > > > 23 J05R3 Jon 05h > > > 24 J48R2 Jon 48h > > > 25 J48R3 Jon 48h > > > 26 J96R1 Jon 96h > > > 27 J96R2 Jon 96h > > > 28 J96R3 Jon 96h > > >> > > >> # setting up a combined factor > > >> > > >> Group<-factor(paste(targets$Treat,targets$Time,sep=".")) > > >> cbind(targets,Group=Group) > > > Sample Treat Time Group > > > 1 E00R1 Els 00h Els.00h > > > 2 E00R2 Els 00h Els.00h > > > 3 E00R3 Els 00h Els.00h > > > 4 E01R1 Els 01h Els.01h > > > 5 E01R2 Els 01h Els.01h > > > 6 E01R3 Els 01h Els.01h > > > 7 E05R1 Els 05h Els.05h > > > 8 E05R2 Els 05h Els.05h > > > 9 E05R3 Els 05h Els.05h > > > 10 E48R2 Els 48h Els.48h > > > 11 E48R3 Els 48h Els.48h > > > 12 E96R1 Els 96h Els.96h > > > 13 E96R2 Els 96h Els.96h > > > 14 E96R3 Els 96h Els.96h > > > 15 J00R1 Jon 00h Jon.00h > > > 16 J00R2 Jon 00h Jon.00h > > > 17 J00R3 Jon 00h Jon.00h > > > 18 J01R1 Jon 01h Jon.01h > > > 19 J01R2 Jon 01h Jon.01h > > > 20 J01R3 Jon 01h Jon.01h > > > 21 J05R1 Jon 05h Jon.05h > > > 22 J05R2 Jon 05h Jon.05h > > > 23 J05R3 Jon 05h Jon.05h > > > 24 J48R2 Jon 48h Jon.48h > > > 25 J48R3 Jon 48h Jon.48h > > > 26 J96R1 Jon 96h Jon.96h > > > 27 J96R2 Jon 96h Jon.96h > > > 28 J96R3 Jon 96h Jon.96h > > >> > > >> design<-model.matrix(~0+Group) > > >> colnames(design)<-levels(Group) > > >> design > > > Els.00h Els.01h Els.05h Els.48h Els.96h Jon.00h Jon.01h Jon.05h Jon.48h Jon.96h > > > 1 1 0 0 0 0 0 0 0 0 0 > > > 2 1 0 0 0 0 0 0 0 0 0 > > > 3 1 0 0 0 0 0 0 0 0 0 > > > 4 0 1 0 0 0 0 0 0 0 0 > > > 5 0 1 0 0 0 0 0 0 0 0 > > > 6 0 1 0 0 0 0 0 0 0 0 > > > 7 0 0 1 0 0 0 0 0 0 0 > > > 8 0 0 1 0 0 0 0 0 0 0 > > > 9 0 0 1 0 0 0 0 0 0 0 > > > 10 0 0 0 1 0 0 0 0 0 0 > > > 11 0 0 0 1 0 0 0 0 0 0 > > > 12 0 0 0 0 1 0 0 0 0 0 > > > 13 0 0 0 0 1 0 0 0 0 0 > > > 14 0 0 0 0 1 0 0 0 0 0 > > > 15 0 0 0 0 0 1 0 0 0 0 > > > 16 0 0 0 0 0 1 0 0 0 0 > > > 17 0 0 0 0 0 1 0 0 0 0 > > > 18 0 0 0 0 0 0 1 0 0 0 > > > 19 0 0 0 0 0 0 1 0 0 0 > > > 20 0 0 0 0 0 0 1 0 0 0 > > > 21 0 0 0 0 0 0 0 1 0 0 > > > 22 0 0 0 0 0 0 0 1 0 0 > > > 23 0 0 0 0 0 0 0 1 0 0 > > > 24 0 0 0 0 0 0 0 0 1 0 > > > 25 0 0 0 0 0 0 0 0 1 0 > > > 26 0 0 0 0 0 0 0 0 0 1 > > > 27 0 0 0 0 0 0 0 0 0 1 > > > 28 0 0 0 0 0 0 0 0 0 1 > > > attr(,"assign") > > > [1] 1 1 1 1 1 1 1 1 1 1 > > > attr(,"contrasts") > > > attr(,"contrasts")$Group > > > [1] "contr.treatment" > > > > > >> > > >> fit<-glmFit(y,design) > > > Error in glmFit.default(y, design) : No dispersion values provided. > > >> > > >> sessionInfo() > > > R version 3.0.2 (2013-09-25) > > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > > > locale: > > > [1] LC_COLLATE=Norwegian (Bokm??l)_Norway.1252 LC_CTYPE=Norwegian (Bokm??l)_Norway.1252 LC_MONETARY=Norwegian (Bokm??l)_Norway.1252 > > > [4] LC_NUMERIC=C LC_TIME=Norwegian (Bokm??l)_Norway.1252 > > > > > > attached base packages: > > > [1] parallel stats graphics grDevices utils datasets methods base > > > > > > other attached packages: > > > [1] edgeR_3.4.2 limma_3.18.6 DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biobase_2.22.0 BiocGenerics_0.8.0 > > > > > > loaded via a namespace (and not attached): > > > [1] annotate_1.40.0 AnnotationDbi_1.24.0 DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 > > > [7] IRanges_1.20.6 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 > > > [13] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 > > > > > > -- > > > Sent via the guest posting facility at bioconductor.org. > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.4 years ago Jahn Davik ▴ 110

Login before adding your answer.