Question: Agilent Single-Channel Data: Questions about model statement and model building.
0
gravatar for CantExitVIM
16 months ago by
CantExitVIM10
CantExitVIM10 wrote:

I am attempting to analyze single-channel Agilent micro array data from two different groups. I have numerous covariates for each individual within these groups and have specified categorical covariates using the factor() method.

Sex <- targets[,"Sex"] #Note: targets is the target file containing array filenames and covariates.
Levels <-c("1", "2")
Sex <- factor(Sex, levels=Levels)

In order to build my model matrix with Class (the thing I am ultimately interested in comparing), batch, sex, and pH... I use the following command.

design <-model.matrix(~Class + Batch + Sex + pH)

And if I view this model matrix...

   (Intercept) ClassControl Batch2 Sex2   pH
1            1            0      1    0 6.29
2            1            0      1    1 6.44
3            1            1      1    1 6.20

Question 1: Why does the model matrix statement add "control" to the first variable and "2" to the two... but not the last?

To fit the model, I use the following code...

fit <- lmFit(y0, design) #Note: y0 has been normalized and background corrected
fit <-eBayes(fit, trend=TRUE)
results <- topTable(fit, n=Inf, coef=2)
write.table(results, 'results.txt', sep='\t', col.names=TRUE, row.names=FALSE)

Question 2: From what I read, the contrast statement is made by topTable() and not the model matrix. Thus, by specifying coef=2, I am telling limma that I'm interested in comparing the 2nd column, which corresponds to "Class." But, because the other covariates were including in the model statement, the model is adjusting for those covariates. Is that correct?

Question 3: Is there any way to view the fit (AIC/BIC) for the topTable() method. Essentially, I have other covariates and would like some objective measure on how the model is effected when they are added to the model.

 

I appreciate your help/feedback.

ADD COMMENTlink modified 16 months ago by Gordon Smyth38k • written 16 months ago by CantExitVIM10
Answer: Agilent Single-Channel Data: Questions about model statement and model building.
2
gravatar for Gordon Smyth
16 months ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

Answer 1. Class, Batch and Sex are all factors with two levels, whereas pH is a numeric covariate. When the factors enter the design matrix, the first level of the factor is absorbed into the intercept and you get a design column for the second and subsequent levels of the factor. The column is called XXYY where XX is the name of the factor and YY is the level. For example, Sex has levels 1 and 2, so the column for the second level is called "Sex2". Continuous covariates don't have levels -- for them the column just has the same name as the variable.

Answer 2. Yes.

Answer 3. The model is not fitted by AIC or BIC, so these measures are not meaningful here. To judge whether other covariates are needed, just try putting them in the model and see if they generate DE genes.

ADD COMMENTlink modified 16 months ago • written 16 months ago by Gordon Smyth38k

Thank you for the feedback.

Follow up Question: I want topTable() to show the regression slopes. Seeing that this is not an option in the documentation, I need to extract slopes from the MArrayLM object and pertinent information from topTable.

unsortedTT <-topTable(fit, n=Inf, coef=2, sort.by='none')
Probes <- unsortedTT$ProbeName
Slopes <-  fit$coefficients[,"ClassControl"] #Note:I read that $coefficients contain the slopes
Sig <-unsortedTT$P.Value
TTwithSlopes <- cbind(Probes, Slopes)
TTwithSlopes <- cbind(TTwithSlopes, Sig)
TTwithSlopes <- TTwithSlopes[order(Sig),] #Reorder matrix by p-value, similarly to topTable()

 

Does this look correct? It is a bit difficult for me to follow the information in the MAarrayLM object, so I am a bit unsure about my approach.

If there is a more efficient way to do this, I would love to hear it (I am new to R).

ADD REPLYlink written 16 months ago by CantExitVIM10
1

The regression slope is already in the topTable output. It is the column called "logFC".

ADD REPLYlink written 16 months ago by Gordon Smyth38k

So... yea... that's probably an indication that I should get some sleep.

Thanks again =)

ADD REPLYlink written 16 months ago by CantExitVIM10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 276 users visited in the last hour