Hi,
I'm writing again dealing with a paired sample design:
the experimental setting involves 9 patients, 3 disease stages and microarray expression data according to the included target file
target<- readTargets("targetPT.txt") head(target) Genotype <- factor(target$Genotype) Disease<- factor(target$Disease, levels=c("stageA", "stageB", "stageC"))
I have performed a paired samples analysis using
design <- model.matrix(~Genotype+Disease)
in order to sort out genes differentially expressed between stages A and B for example but I noticed that the first patient and the first disease stage (in alphabetical order) disappears in the fit using colnames (fit)
I tried to use
design <- model.matrix(~0+Genotype+Disease)
to explicit the coefficient in intercept and the first Disease type disappears
I tried again
design <- model.matrix(~0+Disease+Genotype)
and again the first patient in alphabetical order disappears
I do not have sufficient mathematical education to understand exactly what shoud fit the needs
I would prefer this last model formula to extract using a contrast matrix the differentially expressed genes between stages considering the variability due to different patients because it explicits all the disease stages,
anyhow I would ask what could be the best way to address this problem and what could be the mistakes behind (i.e. I do not have all disease conditions for all the 9 patients,.. )
I thank you very much for attention,
Michela
> sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.18.13 loaded via a namespace (and not attached): [1] tools_3.0.2
Hi,
thanks for your quick answer, I get the point,
and basically I actually arrived to the conclusion of being absorbed in the Intercept, for this reason I went on and put 0+ in the model,
in any case sure I'm not a statistician, and I cannot move on from this.
I'm not at the moment convinced about the meaning of DiseaseB even if actually following your indication is right what I 'm intrested in. The point is that biologically the first patient is not a baseline. For this reson I would not consider a model in which both p[atient and disease are put together in the intercept. Disease stageA is a baseline for disease but the first patient is not as often in clinical settings happens.
In this light could you suggest another formula to extract in paired sample way (somehow considering that each pateint has his own variability) the genes which significantly differ among all the 3 disease stages?
eg. DiseasestageA vs B
DiseasestageA vs C
Disease stage B vs C?
Due to high interpatient variability it is very difficult to obtain results in not paired sample, Disease only based modeling.
I thank you very much for your patient and hope you would give me
feedback
Thanks a lot,
Michela
Dr. Michela Riba
Genome Function Unit
Center for Translational Genomics and Bioinformatics
San Raffaele Scientific Institute
Via Olgettina 58
20132 Milano
Italy
lab: +39 02 2643 9114
skype: mic_mir32
riba.michela@gmail.com
riba.michela@hsr.it
Hi Riba,
> I'm not at the moment convinced about the meaning of DiseaseB even if
> actually following your indication is right what I 'm intrested in.
> The point is that biologically the first patient is not a baseline
> For this reson I would not consider a model in which both p[atient and
> disease are put together in the intercept.
> Disease stageA is a baseline for disease but the first patient is not as
> often in clinical settings happens.
And this is exactly why I suggest you consult with a local statistician. What you have done is perfectly acceptable, but you don't understand enough to realize that.
Baseline in this context has nothing to do with any biological meaning for the term. Instead, it simply means that all other groups are compared to the baseline. You can use relevel() to change the baseline at will.
And the parameterization you are using to account for pairs requires that one of the subjects be considered a baseline. This is algebraically identical to fitting a conventional paired t-test, and you will not be able to fit a model that accounts for pairs without absorbing one subject into a baseline.
Best,
Jim
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
Hi Mihaela,
Well, as Jim said the coefficient "DiseasestageB" is B-A, accordingly "DiseasestageC" is C-A.
To get B-C you have to extract the contrast "DiseasestageB - DiseasestageC" which is B-A -(C-A) = B-C.
In this factor model you assume that the effects are additive, so "contrasting" two coefficients that are relative to the same genotype base level gives you the difference in mean expression explained by disease independent of genotype.
I sent you in another mail some Teaching material of mine that explains this in more detail. (I will put this on github soon)
Best wishes,
Bernd
Thanks a lot for explanations I'm pleased to go into more detail following the mail and studying!
Thanks a lot so much
Michela