Hi all,
I'm currently doing differential gene expression analysis on Nanostring data.
For this I'm using the package NanostringDiff.
I try to detect the effect of a treatment vs the absence of treatment on a set of ex vivo biopsies samples.
Due to strong heterogeneity between individuals, the "sample" effect is much stronger than the "treatment" effect (gene response is close between treated and non treated for the same sample, and very different from the response of treated and non treated of other samples).
To take that into account I think I need to put the "sample" variable as a confounding factor.
With my small statistical background, I haven't been able to implement this kind of analysis with NanoStringDiff, even after reading the vignette.
If I understand correctly maybe the design would be like this :
pheno=pData(nanostring_data) treatment=pheno$treatment sample_ID=pheno$sample_ID design.full=model.matrix(~sample_ID+treatment) design.full (Intercept) sample_1 sample_2 sample_3 Treatment 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0 0 1 1
But I'm not sure which "contrast" and "Beta" parameters I should give to the glm.LRT() function that come after that to do the differential expression analysis.
If anyone knows how to use NanoStringDiff with a confounding factor, or have an alternative solution for doing that I would be very grateful to hear from you.
Guillaume
Thank you for your answer and your advices. Ok will try with default parameters, do you know if by default the covariates "samples" will be taken as confounders in the analysis?
The short answer is yes. However...
What you are asking doesn't really make sense. There is no 'by default' when asking about a design matrix that you have specified. The design matrix defines a set of linear equations that you are telling R to solve, and it has a set of coefficients, each of which has a certain interpretation. Ideally you would already know that. But if you don't you either need to learn a bit more about what you are doing, or find somebody who does who can help.
Simply having a tool that is useful for a task is no substitute for knowing how to use the tool.
Thanks a lot for the tips. I agree I need to level up on statistics... If by any chance you know any pedagogic online resources that would allow me to learn the theory behind design matrices, and how to be efficient in analysing complex gene expression datasets, that would be very helpful.
Both the limma User's Guide and the edgeR User's Guide are full of examples of different design matrices and what they estimate. A simple Google search of' R design matrix' brought up 570M results (in 0.1 seconds!), a fair portion of which may be useful.