Entering edit mode
Ryan C. Thompson
★
7.9k
@ryan-c-thompson-5618
Last seen 5 weeks ago
Icahn School of Medicine at Mount Sinaiā¦
Hi all,
I am working with an RNA-seq dataset with the following design:
design <- structure(c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1), .Dim = c(36L, 14L), .Dimnames
=
list(
c("B1S1", "B1S2", "B1S3", "B1S4", "B1S5", "B1S6", "B1S7",
"B1S8", "B1S9", "B1S10", "B1S11", "B1S12", "B2S1", "B2S2",
"B2S3", "B2S4", "B2S5", "B2S6", "B2S7", "B2S8", "B2S9", "B2S10",
"B2S11", "B2S12", "B3S1", "B3S2", "B3S3", "B3S4", "B3S5",
"B3S6", "B3S7", "B3S8", "B3S9", "B3S10", "B3S11", "B3S12"
), c("S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9",
"S10", "S11", "S12", "Batch1", "Batch2")), assign = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), contrasts =
structure(list(
S = "contr.treatment", Batch = "contr.sum"), .Names = c("S",
"Batch")))
There are 12 conditions, and the 12-condition experiment is replicated
3
times. One thing I'd like to do with this dataset is to identify a set
of "housekeeping" genes specific to this data, i.e. genes whose levels
do not change very much across all the samples. Is there a good way to
get a single number for each gene that represents its level of
variability? Should I just estimate per-gene dispersions with edgeR
using an intercept-only design (i.e. model.matrix(~1) and use the
dispersion as a measure of variability, or is there a good method that
takes the design into account and is independent of the specific
parametrization of the design?
-Ryan Thompson