I have RNA-seq data from 5 males and 5 females for 4 tissue types (skin, liver, brain, and gonads), and I would like to: 1. Identify genes/transcripts with skin-biased expression 2. Identify genes/transcripts with sex-biased expression 3. Identify genes/transcripts with sex-biased expression in the skin only
I would like to use the GLM approach to analyze my data as described in the edgeR User's Guide, but I am not confident that I am correctly specifying the appropriate contrasts or using the best glm model, so I'm requesting feedback on my model setup here.
I have a DGEList object where I have assigned individuals to a group that is tissuetype_sex. Here is how I've specified the design matrix:
design<-model.matrix(~0+group, data=y$samples) head(design) groupbrains_female groupbrains_male groupgonads_female groupgonads_male groupliver_female groupliver_male groupskin_female groupskin_male brains_SSF1 1 0 0 0 0 0 0 0 brains_SSF2 1 0 0 0 0 0 0 0 brains_SSF3 1 0 0 0 0 0 0 0 brains_SSF4 1 0 0 0 0 0 0 0 brains_SSF6 1 0 0 0 0 0 0 0 brains_SSNP1 0 1 0 0 0 0 0 0
(sorry for the messy formatting of the output, I couldn't figure out how to make it look better)
I then fit the model using glmQLFit:
fit <- glmQLFit(y, design)
To identify skin-specific genes, I then do:
skinVbrains <- glmTreat(fit, contrast=makeContrasts((groupskin_female+groupskin_male)-(groupbrains_female+groupbrains_male), levels=design), lfc=2)
skinVliver <- glmTreat(fit, contrast=makeContrasts((groupskin_female+groupskin_male)-(groupliver_female+groupliver_male), levels=design), lfc=2)
skinVgonads <- glmTreat(fit, contrast=makeContrasts((groupskin_female+groupskin_male)-(groupgonads_female+groupgonads_male), levels=design), lfc=2)
and the genes that are significantly different in each of these three tests would be genes with skin-biased expression. To identify sex-biased genes and skin-specific sex-biased genes, I follow a similar approach:
Sexfit <- glmTreat(fit,
contrast=makeContrasts((groupskin_female+groupbrains_female+groupliver_female+groupgonads_female)-(groupskin_male+groupbrains_male+groupliver_male+groupgonads_male),
levels=design), lfc=2)
skinSex <- glmTreat(fit, contrast=makeContrasts(groupskin_female-groupskin_male, levels=design), lfc=2)
where those genes with significantly different expression in skinSex that are also in the previously-identified skin-biased expression gene set would be genes that are sex-biased in the skin only.
Is this the best way to set up these models? Would I be better off trying to use an interaction formula instead? Any advice on how I can improve this analysis would be much appreciated!
Thank you in advance, Sarah
P.S. In case it helps, here is my sessionInfo():
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_NZ.UTF-8 LC_NUMERIC=C LC_TIME=en_NZ.UTF-8 LC_COLLATE=en_NZ.UTF-8 LC_MONETARY=en_NZ.UTF-8 LC_MESSAGES=en_NZ.UTF-8
[7] LC_PAPER=en_NZ.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.23 edgeR_3.26.4 limma_3.40.2
loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1 yaml_2.2.0 Rcpp_1.0.1 splines_3.6.1 highr_0.8 grid_3.6.1 locfit_1.5-9.1 xfun_0.7 lattice_0.20-38
I fixed the formating of the output for you by replacing the markdown fencing (three back-ticks) with pre-formatted html tags instead. The fenced block will wrap long lines but the pre-formatted block doesn't. It's a trick that is seldom mentioned in documentation but can be useful.