Question: TCC::ERROR: Need the design matrix for GLM.
0
5.4 years ago by
United States
Pankaj Agarwal100 wrote:
Hi, I have a rna-seq data consisting of matched tumor/normal samples from two patients. For normalization of the counts I am following the steps in the TCC vignette section "3.3 Normalization of two-group count data without replicates (paired)". The output from the commands are as follows: > data=read.delim("count_bt2_iGenomes_Ensembl.tsv") > head(data) A.sorted.bam B.sorted.bam ENSG00000000003 2400 1130 ENSG00000000005 2 3 ENSG00000000419 1819 575 ENSG00000000457 1317 1262 ENSG00000000460 799 1743 ENSG00000000938 203 41 C.sorted.bam D.sorted.bam ENSG00000000003 12 72 ENSG00000000005 0 0 ENSG00000000419 938 1608 ENSG00000000457 821 1469 ENSG00000000460 367 800 ENSG00000000938 33303 16355 > group <- c(1,1,2,2) > pair <- c(1,2,1,2) > c1 <- data.frame(group=group, pair=pair) > colnames(data) <- c("T_BRPC13.1118", "T_BRPC_13.764", "N_DU04_PBMC", "N_DU06_PBMC") > tcc <- new("TCC", data, c1) > tcc <- calcNormFactors(tcc, norm.method="tmm", test.method="edger", iteration=1, FDR=0.1, floorPDEG=0.05, paired=TRUE) TCC::INFO: Calculating normalization factors using DEGES TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 ) Error in .testByEdger.3(design = design, coef = coef, contrast = contrast) : TCC::ERROR: Need the design matrix for GLM. Reading further for steps needed for edgeR without TCC I saw something related to design and tried it, but got the same error: > design <- model.matrix(~ group + pair) > tcc <- new("TCC", data, c1) > tcc <- calcNormFactors(tcc, norm.method="tmm", test.method="edger", iteration=1, FDR=0.1, floorPDEG=0.05, paired=TRUE) TCC::INFO: Calculating normalization factors using DEGES TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 ) Error in .testByEdger.3(design = design, coef = coef, contrast = contrast) : TCC::ERROR: Need the design matrix for GLM. I would appreciate help with understanding the cause of the error. The output from sessionInfo() and package description is as follows: > sessionInfo() R version 3.0.3 (2014-03-06) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > > packageDescription("TCC") Package: TCC Type: Package Title: TCC: Differential expression analysis for tag count data with robust normalization strategies Version: 1.2.0 Author: Jianqiang Sun, Tomoaki Nishiyama, Kentaro Shimizu, and Koji Kadota Maintainer: Jianqiang Sun <wukong@bi.a.u-tokyo.ac.jp>, Tomoaki Nishiyama <tomoakin@staff.kanazawa-u.ac.jp> Description: This package provides a series of functions for performing differential expression analysis from RNA-seq count data using robust normalization strategy (called DEGES). The basic idea of DEGES is that potential differentially expressed genes or transcripts (DEGs) among compared samples should be removed before data normalization to obtain a well-ranked gene list where true DEGs are top-ranked and non-DEGs are bottom ranked. This can be done by performing a multi-step normalization strategy (called DEGES for DEG elimination strategy). A major characteristic of TCC is to provide the robust normalization methods for several kinds of count data (two-group with or without replicates, multi-group/multi-factor, and so on) by virtue of the use of combinations of functions in other sophisticated packages (especially edgeR, DESeq, and baySeq). Depends: R (>= 2.15), methods, DESeq, edgeR, baySeq, ROC Imports: EBSeq, samr Suggests: RUnit, BiocGenerics Enhances: snow biocViews: HighThroughputSequencing, DifferentialExpression, RNAseq License: GPL-2 Copyright: Authors listed above Packaged: 2013-10-15 05:31:33 UTC; biocbuild Built: R 3.0.3; ; 2014-03-31 20:00:49 UTC; unix -- File: /general/installs/R/R-3.0.3/lib64/R/library/TCC/Meta/package.rds Thank you, - Pankaj -------------------------------------- Pankaj Agarwal, M.S Bioinformatician Bioinformatics Shared Resource Duke Cancer Institute Duke University 919-681-6573 p.agarwal@duke.edu<mailto:p.agarwal@duke.edu> [[alternative HTML version deleted]]
modified 5.4 years ago by Jianqiang SUN10 • written 5.4 years ago by Pankaj Agarwal100
Answer: TCC::ERROR: Need the design matrix for GLM.
0
5.4 years ago by
Jianqiang SUN10 wrote:
Hi, Pankaj, I can see that you use the TCC of version 1.2.0 from the 'sessionInfo'. However, the "paired" approaches are supported in version of 1.4.0. Could you update the TCC version to 1.4.0 and try again? To install TCC version 1.4.0, I recommend that install R 3.1.0 first, (you use R 3.0.3 now) then start R and execute following scripts. > source("http://bioconductor.org/biocLite.R") > biocLite("TCC") If you have any error, please inform me. Thank you. Jianqiang SUN. *From:* Pankaj Agarwal [mailto:p.agarwal@duke.edu] *Sent:* Tuesday, April 15, 2014 10:51 PM *To:* bioconductor@r-project.org *Cc:* kadota@bi.a.u-tokyo.ac.jp *Subject:* TCC::ERROR: Need the design matrix for GLM. Hi, I have a rna-seq data consisting of matched tumor/normal samples from two patients. For normalization of the counts I am following the steps in the TCC vignette section "3.3 Normalization of two-group count data without replicates (paired)". The output from the commands are as follows: > data=read.delim("count_bt2_iGenomes_Ensembl.tsv") > head(data) A.sorted.bam B.sorted.bam ENSG00000000003 2400 1130 ENSG00000000005 2 3 ENSG00000000419 1819 575 ENSG00000000457 1317 1262 ENSG00000000460 799 1743 ENSG00000000938 203 41 C.sorted.bam D.sorted.bam ENSG00000000003 12 72 ENSG00000000005 0 0 ENSG00000000419 938 1608 ENSG00000000457 821 1469 ENSG00000000460 367 800 ENSG00000000938 33303 16355 > group <- c(1,1,2,2) > pair <- c(1,2,1,2) > c1 <- data.frame(group=group, pair=pair) > colnames(data) <- c("T_BRPC13.1118", "T_BRPC_13.764", "N_DU04_PBMC", "N_DU06_PBMC") > tcc <- new("TCC", data, c1) > tcc <- calcNormFactors(tcc, norm.method="tmm", test.method="edger", iteration=1, FDR=0.1, floorPDEG=0.05, paired=TRUE) TCC::INFO: Calculating normalization factors using DEGES TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 ) Error in .testByEdger.3(design = design, coef = coef, contrast = contrast) : TCC::ERROR: Need the design matrix for GLM. Reading further for steps needed for edgeR without TCC I saw something related to design and tried it, but got the same error: > design <- model.matrix(~ group + pair) > tcc <- new("TCC", data, c1) > tcc <- calcNormFactors(tcc, norm.method="tmm", test.method="edger", iteration=1, FDR=0.1, floorPDEG=0.05, paired=TRUE) TCC::INFO: Calculating normalization factors using DEGES TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 ) Error in .testByEdger.3(design = design, coef = coef, contrast = contrast) : TCC::ERROR: Need the design matrix for GLM. I would appreciate help with understanding the cause of the error. The output from sessionInfo() and package description is as follows: > sessionInfo() R version 3.0.3 (2014-03-06) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > > packageDescription("TCC") Package: TCC Type: Package Title: TCC: Differential expression analysis for tag count data with robust normalization strategies Version: 1.2.0 Author: Jianqiang Sun, Tomoaki Nishiyama, Kentaro Shimizu, and Koji Kadota Maintainer: Jianqiang Sun <wukong@bi.a.u-tokyo.ac.jp>, Tomoaki Nishiyama <tomoakin@staff.kanazawa-u.ac.jp> Description: This package provides a series of functions for performing differential expression analysis from RNA-seq count data using robust normalization strategy (called DEGES). The basic idea of DEGES is that potential differentially expressed genes or transcripts (DEGs) among compared samples should be removed before data normalization to obtain a well-ranked gene list where true DEGs are top-ranked and non-DEGs are bottom ranked. This can be done by performing a multi-step normalization strategy (called DEGES for DEG elimination strategy). A major characteristic of TCC is to provide the robust normalization methods for several kinds of count data (two-group with or without replicates, multi-group/multi-factor, and so on) by virtue of the use of combinations of functions in other sophisticated packages (especially edgeR, DESeq, and baySeq). Depends: R (>= 2.15), methods, DESeq, edgeR, baySeq, ROC Imports: EBSeq, samr Suggests: RUnit, BiocGenerics Enhances: snow biocViews: HighThroughputSequencing, DifferentialExpression, RNAseq License: GPL-2 Copyright: Authors listed above Packaged: 2013-10-15 05:31:33 UTC; biocbuild Built: R 3.0.3; ; 2014-03-31 20:00:49 UTC; unix -- File: /general/installs/R/R-3.0.3/lib64/R/library/TCC/Meta/package.rds Thank you, - Pankaj -------------------------------------- Pankaj Agarwal, M.S Bioinformatician Bioinformatics Shared Resource Duke Cancer Institute Duke University 919-681-6573 p.agarwal@duke.edu [[alternative HTML version deleted]]