Hi, I am doing some experiment and at a loss modeling by DESeq2. I want to estimate abundance of genes across the three conditions (namely treatment1, treatment2 and treatment3), adjusted for covariates of age. I make DESeq object, and estimate size factors, estimate dispersions, and do Wald test as follows where data is raw count matrix.
dds = DESeqDataSetFromMatrix(data, colData=mapping, design=~age_binned+treatment) dds = DESeq(dds, fitType="local")
at this time, I can extract results of differential expression test by Wald test between treatment1, 2 and 3.
However, when I want to know adjusted abundance for all groups, how can I proceed? I did plotCounts(returnData=TRUE) which normalizes counts by sequencing depth and adds a pseudocount of 1/2, but in my understanding this doesn't account for covariates information like age, which are included in dispersion estimate process?
I tried also variance stabilized transformation (VST) for data, which transform size-factor normalized data by using estimated dispersion using design formula, but I don't know this already accounts for covariates information and can be interpreted directly as "age adjusted abundance", and thus comparable directly across groups.
My try so far is to extract size factors normalized values by counts(dds, normalize=TRUE), and make generalized linear model by feeding this data as dependent variables, and covariates as independent variable, then use predict() to get age-adjusted data. Is this the correct way to do such thing?
Or should I use VST data from design ~1 to feed for GLM or something?
To summarize, If I want to use abundance adjusted for sequence depth or library size variance, and age effect for comparison between groups, is this correct to use:
- VST data directly (design ~age_binned + treatment)
- Model VST data (design ~age_binned + treatment) in GLM or something to adjust for age effect
- Model VST data (design ~1) in GLM or something to adjust for age effect
- Model normalized counts by size factors in GLM or something to adjust for age effect
- Model plotCounts() data frame in GLM or something to adjust for age effect
Certainly I'm missing something or poorly understood, thank you for help.