Should I use factors for covariate in limma design matrix or not?
1
0
Entering edit mode
@msveldhuis96-23194
Last seen 4.5 years ago

I have seen quite a few different ways for constructing a design matrix with limma. My problem concerns RNAseq data, for which I want to find the top DE genes between "good" and "poor" responders. This meta-variable is stored in a meta data file, which has information about each cell. These cells are all present in the expression data which has counts for each gene. I also want to account for the several batches, as well as the patient origin of these cells.

I have created factors for all three variables, but I am not sure if these are necessary.

responder_group <- factor(meta_data$Responder_status)
patients <- factor(meta_data$Patient_id)
batches <- factor(meta_data$processing_date)

Here's the two design matrices I saw most frequently to model the differences between my responders, while correcting for any differences between batches and patients.

  1. design <- model.matrix(~0 + responder_group + batches + patients, meta_data)

  2. design <- model.matrix(~0 + responder_group + processing_date + Patient_id, meta_data)

So option 1 uses the factors I created from the meta_data file, while option 2 does not.

Finally, I fit the model

fit <- lmFit(data_filtered, design, correlation = NULL)
cont_matrix <- makeContrasts("responder_grouppoor-responder_groupgood",  levels=design)
fit2 <- contrasts.fit(fit, cont_matrix)
fit3 <- eBayes(fit2)

When I look at the toptable results, they differ for these two options. Can someone explain the difference and which one I should use?

limma • 841 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

You want both the processingdate and Patientid to be factor. If they are numeric you will fit them as continuous predictors, which doesn't make sense in this context.

Put a different way, the design matrix should have N - 1 columns for the batches and patients (where N is the number of batches and the number of patients, respectively), with just 1s and 0s. If you have a single column with numbers, then R is fitting that as a continuous variable.

ADD COMMENT

Login before adding your answer.

Traffic: 608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6