Search
Question: glmTreat Errors for multi coefficient and analysis of deviance situations.
0
5 months ago by
BharathAnanth10 wrote:

Hi

I am fitting some time course RNA-seq data using glmQLFit and I then wanted to use glmTreat to test either of two coefficients with 0.5 lfc. Here is the simplified version (with fewer samples).

set.seed(0)
T <- 24
t <- rep(seq(3,42, by=3), 11)
c <- cos(2*pi*t/T)  # a time course basis function
s <- sin(2*pi*t/T)
batch_effect <- factor(ceiling(runif(14*11)*11))
subjects <- sprintf("P%02d", rep(1:11, each = 14))
design1 <- model.matrix(~ batch_effect + c + s)

fit <- glmQLFit(y, design, robust=TRUE)

These work

treat1 <- glmTreat(fit, coef="c", lfc=0.5)
treat2 <- glmTreat(fit, coef=c("c", "s"), lfc=0)

but this gives an error

treat3 <- glmTreat(fit, coef=c("c","s"), lfc=0.5)

matrix dimensions are not consistent for non-repeating number of columns

Instead, if I use contrasts instead of specifying coefficients, I get a different error

treat4 <- glmTreat(fit, contrast = con, lfc=0.5)

error: Error in data.frame(logCPM = glmfit\$AveLogCPM, PValue = p.value, row.names = rownames(glmfit)) :
row names supplied are of the wrong length

Is glmTreat not design to be used for multi-coefficient or analysis of deviance situtations? The help page seems to suggested multiple coefficients at least are permitted.

My sessionInfo is

R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.12.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] bindrcpp_0.2               sva_3.24.4                 BiocParallel_1.10.1        genefilter_1.58.1          mgcv_1.8-22
[6] nlme_3.1-131               SummarizedExperiment_1.6.5 DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2
[11] GenomicRanges_1.28.4       GenomeInfoDb_1.12.3        IRanges_2.10.5             S4Vectors_0.14.7           BiocGenerics_0.22.1
[16] ggplot2_2.2.1              PoiClaClu_1.0.2            magrittr_1.5               edgeR_3.18.1               limma_3.32.5

loaded via a namespace (and not attached):
[1] tidyr_0.7.2             bit64_0.9-7             splines_3.4.2           Formula_1.2-2           assertthat_0.2.0        statmod_1.4.30
[7] latticeExtra_0.6-28     blob_1.1.0              cellranger_1.1.0        GenomeInfoDbData_0.99.0 yaml_2.1.15             RSQLite_2.0
[13] backports_1.1.1         lattice_0.20-35         glue_1.2.0              digest_0.6.12           RColorBrewer_1.1-2      XVector_0.16.0
[19] checkmate_1.8.5         colorspace_1.3-2        htmltools_0.3.6         Matrix_1.2-11           plyr_1.8.4              XML_3.98-1.9
[25] pkgconfig_2.0.1         zlibbioc_1.22.0         purrr_0.2.4             xtable_1.8-2            scales_0.5.0            pracma_2.1.1
[31] htmlTable_1.11.0        tibble_1.3.4            annotate_1.54.0         nnet_7.3-12             lazyeval_0.2.1          readxl_1.0.0
[37] survival_2.41-3         memoise_1.1.0           foreign_0.8-69          tools_3.4.2             data.table_1.10.4-3     stringr_1.2.0
[43] munsell_0.4.3           locfit_1.5-9.1          cluster_2.0.6           AnnotationDbi_1.38.2    compiler_3.4.2          rlang_0.1.2
[49] grid_3.4.2              RCurl_1.95-4.8          rstudioapi_0.7          htmlwidgets_0.9         labeling_0.3            bitops_1.0-6
[55] base64enc_0.1-3         gtable_0.2.0            DBI_0.7                 R6_2.2.2                gridExtra_2.3           knitr_1.17
[61] dplyr_0.7.4             bit_1.1-12              bindr_0.1               Hmisc_4.0-3             stringi_1.1.6           Rcpp_0.12.12
[67] geneplotter_1.54.0      rpart_4.1-11            acepack_1.4.1           tidyselect_0.2.3

modified 5 months ago by Ryan C. Thompson6.7k • written 5 months ago by BharathAnanth10
0
5 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.7k wrote:

As far as I know, the treat method is only defined for a single coefficient or contrast at a time. I'm not even sure what the null hypothesis would be for testing multiple coefficients at once.

Normally the null hypothesis for the glm would be (in my case) H0: c=0 AND s =0.

What I want to test is the null hypothesis H0: |c| < lfc AND |s| <lfc using glmTreat. Is this inconceivable?

The problem with that is that the region defined by your null hypothesis is now dependent on how you've parametrized your design matrix in ways that are probably not intuitive. That is, tests using different design matrices that would be equivalent in glmLRT or glmQLFTest would give different results in glmTreat.

Just to give a few other possible ways to define the null hypothesis for multi-coefficient treat that make equal sense, consider: |c| + |s| < threshold, or c^2 + s^2 < threshold^2. Of course, all of these are still dependent on the parametrization.

To elaborate on Ryan's example, say we have two contrasts, A - B and A - C. If we test A - B = 0 and A - C = 0 via an ANODEV, this is the same null hypothesis as A - B = 0 and B - C = 0. However, if we ask for |A - B| <= 1 and |A - C| <= 1, this is not equivalent to |A - B| <= 1 and |B - C| <= 1. (For example, the first version is true while the second is not when A = 0, B = 1 and C = -1.)

This isn't an insurmountable difficulty - it just requires some care when phrasing the contrasts - but it does highlight the issues with dealing with composite null hypotheses. Indeed, when you start stacking together contrasts to try to perform a log-fold-change-aware equivalent of an ANODEV, you will become increasingly conservative if you only perform testing at the lfc limit, like treat; or you will need to consider the correlation in the true effect sizes between contrasts to obtain an appropriate p-value, which does not seem possible on a gene-by-gene basis. Maybe something could be made to work by sharing information across genes with empirical Bayes, but this would require some thought.