Hi all,
I have this data set https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92776, in which there are 337 blood samples from patients. This is from a Lupus study and so Group A consists of 41 patients who were waned off their Standard of Care (SoC) medications till a flare (aggravation of SLE) occurs, Group B consists of 61 SLE patients who provide a normal blood sample only, and Group C consists of 55 healthy volunteers.
In a nutshell, the un-normalized gene counts provided are as follows:
The 337 samples are organized as columns while genes are organized as row names.
The sample columns from Group A consist of a Baseline profile (column), an improving/other profile and they all end with a Flare profile.
The sample columns from Group B consist of 1 Baseline Profile from each patient only.
The sample columns from Group C consist of 2 profiles from each volunteer over a period of time.
Therefore columns can look like this:
If my objective was to find out what genes are enriched in SLE patients as compared to healthy individuals (baseline/flare conditions), what is the best way to do this?
I stored all the additional information regarding each sample (Group_ID, Donor_ID etc.) as phenoData when creating an expression set object in R for Limma DEG analysis.
My countData looks like this:
B0100V1.BASELINE B0101V1.BASELINE B0102V1.BASELINE ABCA1 26.73609 27.60309 28.08309 ACP1 28.38355 27.44355 26.71755
and my phenoData looks like this (part of it):
patient_and_visit profile donor_id group_id B0100V1.BASELINE 100 Baseline 100 B B0101V1.BASELINE 101 Baseline 101 B
So far, I have been thinking that that perhaps I should compare Group B patients (Baseline) to Group C (healthy):
groupBC <- es[,es$group_id == 'B'|es$group_id == 'C']
unique(groupBC$profile)
design <- model.matrix(~0+groupBC$profile)
colnames(design) <- c('Baseline', 'Flare', 'Healthy', 'Improving', 'Other')
fit <- lmFit(groupBC,design)
contrast.matrix <- makeContrasts(Baseline-Healthy, levels=design)
fit2 <- contrasts.fit(fit,contrast.matrix)
fit2 <- eBayes(fit2)
#compare Baseline-Healthy
results<-topTable(fit2, coef=1, adjust='BH', n=Inf)
Would you all have better suggestions of creating contrast matrices to reach the objective I want? Would I be able to compare the gene profiles per patient in Group A for different conditions (Baseline, Improving, Flare etc.) ?
I apologize for the long question, but any advice would be much appreciated!
Please respond to existing answers using the "Add comment" button, unless you're answering your own question.
Anyway: the results you've shown don't make much sense to me.
"variable lengths differ"
error when you only have one variable (groups
) in yourmodel.matrix
call. I can only suggest making sure thatgroups
is non-empty and contains noNA
values, though these shouldn't be the cause of the problem.Levels
. Why are you only getting theLevels
listing? Are you copy-pasting the output correctly?Hi Aaron,
I am not sure why the error pops up.
I first entered the command:
which gives me:
and then:
which gives the same error:
Just to check, when I enter:
There do not seem to be any NA values in groups.
You should be doing
~ 0 + groups
.Hi Aaron, I am so sorry about this oversight, I did not realize. Thanks for your patience!