NanoStringDiff analysis with confounding factors
1
0
Entering edit mode
@guillaume-robert-18902
Last seen 16 months ago
France/Nantes/Inovarion

Hi all,

I'm currently doing differential gene expression analysis on Nanostring data.

For this I'm using the package NanostringDiff.

I try to detect the effect of a treatment vs the absence of treatment on a set of ex vivo biopsies samples.

Due to strong heterogeneity between individuals, the "sample" effect is much stronger than the "treatment" effect (gene response is close between treated and non treated for the same sample, and very different from the response of treated and non treated of other samples).

To take that into account I think I need to put the "sample" variable as a confounding factor.

With my small statistical background, I haven't been able to implement this kind of analysis with NanoStringDiff, even after reading the vignette.

If I understand correctly maybe the design would be like this :

pheno=pData(nanostring_data)

treatment=pheno$treatment sample_ID=pheno$sample_ID

design.full=model.matrix(~sample_ID+treatment)

design.full

(Intercept) sample_1         sample_2         sample_3         Treatment

1           1                0                 0                0

1           1                0                 0                1

1           0                1                 0                0

1           0                1                 0                1

1           0                0                 1                0

1           0                0                 1                1

But I'm not sure which "contrast" and "Beta" parameters I should give to the glm.LRT() function that come after that to do the differential expression analysis.

If anyone knows how to use NanoStringDiff with a confounding factor, or have an alternative solution for doing that I would be very grateful to hear from you.

Guillaume

0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

In your parameterization, the coefficient of interest is Treatment, which is (already) a comparison between treated (and presumably) control. For that you use the Beta argument for glm.LRT, which by default uses the final column of your design matrix. So you don't really have to do anything but run glm.LRT using the defaults. But you should read and understand the help for that function, regardless.

0
Entering edit mode

Thank you for your answer and your advices. Ok will try with default parameters, do you know if by default the covariates "samples" will be taken as confounders in the analysis?

0
Entering edit mode

The short answer is yes. However...

What you are asking doesn't really make sense. There is no 'by default' when asking about a design matrix that you have specified. The design matrix defines a set of linear equations that you are telling R to solve, and it has a set of coefficients, each of which has a certain interpretation. Ideally you would already know that. But if you don't you either need to learn a bit more about what you are doing, or find somebody who does who can help.

Simply having a tool that is useful for a task is no substitute for knowing how to use the tool.

0
Entering edit mode

Thanks a lot for the tips. I agree I need to level up on statistics... If by any chance you know any pedagogic online resources that would allow me to learn the theory behind design matrices, and how to be efficient in analysing complex gene expression datasets, that would be very helpful.

0
Entering edit mode

Both the limma User's Guide and the edgeR User's Guide are full of examples of different design matrices and what they estimate. A simple Google search of' R design matrix' brought up 570M results (in 0.1 seconds!), a fair portion of which may be useful.