Question: Agilent Single-Channel Data: Questions about model statement and model building.
0
18 months ago by
CantExitVIM10
CantExitVIM10 wrote:

I am attempting to analyze single-channel Agilent micro array data from two different groups. I have numerous covariates for each individual within these groups and have specified categorical covariates using the factor() method.

Sex <- targets[,"Sex"] #Note: targets is the target file containing array filenames and covariates.
Levels <-c("1", "2")
Sex <- factor(Sex, levels=Levels)

In order to build my model matrix with Class (the thing I am ultimately interested in comparing), batch, sex, and pH... I use the following command.

design <-model.matrix(~Class + Batch + Sex + pH)

And if I view this model matrix...

   (Intercept) ClassControl Batch2 Sex2   pH
1            1            0      1    0 6.29
2            1            0      1    1 6.44
3            1            1      1    1 6.20

Question 1: Why does the model matrix statement add "control" to the first variable and "2" to the two... but not the last?

To fit the model, I use the following code...

fit <- lmFit(y0, design) #Note: y0 has been normalized and background corrected
fit <-eBayes(fit, trend=TRUE)
results <- topTable(fit, n=Inf, coef=2)
write.table(results, 'results.txt', sep='\t', col.names=TRUE, row.names=FALSE)

Question 2: From what I read, the contrast statement is made by topTable() and not the model matrix. Thus, by specifying coef=2, I am telling limma that I'm interested in comparing the 2nd column, which corresponds to "Class." But, because the other covariates were including in the model statement, the model is adjusting for those covariates. Is that correct?

Question 3: Is there any way to view the fit (AIC/BIC) for the topTable() method. Essentially, I have other covariates and would like some objective measure on how the model is effected when they are added to the model.

modified 18 months ago by Gordon Smyth39k • written 18 months ago by CantExitVIM10
2
18 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Answer 1. Class, Batch and Sex are all factors with two levels, whereas pH is a numeric covariate. When the factors enter the design matrix, the first level of the factor is absorbed into the intercept and you get a design column for the second and subsequent levels of the factor. The column is called XXYY where XX is the name of the factor and YY is the level. For example, Sex has levels 1 and 2, so the column for the second level is called "Sex2". Continuous covariates don't have levels -- for them the column just has the same name as the variable.

Answer 3. The model is not fitted by AIC or BIC, so these measures are not meaningful here. To judge whether other covariates are needed, just try putting them in the model and see if they generate DE genes.

Thank you for the feedback.

Follow up Question: I want topTable() to show the regression slopes. Seeing that this is not an option in the documentation, I need to extract slopes from the MArrayLM object and pertinent information from topTable.

unsortedTT <-topTable(fit, n=Inf, coef=2, sort.by='none')
Probes <- unsortedTT$ProbeName Slopes <- fit$coefficients[,"ClassControl"] #Note:I read that $coefficients contain the slopes Sig <-unsortedTT$P.Value
TTwithSlopes <- cbind(Probes, Slopes)
TTwithSlopes <- cbind(TTwithSlopes, Sig)
TTwithSlopes <- TTwithSlopes[order(Sig),] #Reorder matrix by p-value, similarly to topTable()

Does this look correct? It is a bit difficult for me to follow the information in the MAarrayLM object, so I am a bit unsure about my approach.

If there is a more efficient way to do this, I would love to hear it (I am new to R).

1

The regression slope is already in the topTable output. It is the column called "logFC".

So... yea... that's probably an indication that I should get some sleep.

Thanks again =)