Michael Love Hi. I am following one of your courses about Introduction to Bioconductor. I have a question regarding the code that is given to answer an exercise. I am required to perform the following code but on a different GO:
library(GEOquery)
e = getGEO("GSE34313")[[1]]
e$condition = e$characteristics_ch1.2
levels(e$condition) = c("dex24","dex4","control")
table(e$condition)
names(fData(e))
fData(e)$GO_ID[1:4]
lvls = c("control", "dex4")
es = e[,e$condition %in% lvls]
es$condition = factor(es$condition, levels=lvls)
library(limma)
library(qvalue)
design = model.matrix(~ es$condition)
**fit = lmFit(es, design=design)**
fit = eBayes(fit)
topTable(fit)[,c(6,7,18,22)]
set.seed(1)
idx = grep("GO:0006955", fData(es)$GO_ID)
length(idx)
**r1 = roast(es, idx, design)**
r1
However, when I type fit = lmFit(es, design=design) it shows "Coefficients not estimable: (Intercept) es$conditiondex4 Error in lm.fit(design, t(M)) : 0 (non-NA) cases". I wonder why is that. Since fit is not functioning properly, the roast function is not working properly as well. It shows: "Error in .lmEffects(y = y, design = design, contrast = contrast, array.weights = Dots$array.weights, : No residual degrees of freedom"
In the video of the course, there seems to be no problem with the code, but now there is. If I could get a better insight about what is happening, and how I could correct these functions, I would be thankful. Best regards!
e$condition stands for an expression set that comes from the data in GEO. It was created to store the data in e$characteristics_ch1.2. If I run it, it gives:
It was done like that in a video of the course
What do you get for
design
after runningdesign = model.matrix(~ es$condition)
I get the following
It looks like you only have a single sample.
Mm what could I do about it then? I had followed what was given but then, how would I add more samples?
I see, the problem is that base R has changed its behavior since the code was written. This will fix, changing line 3 to:
Thank you. It actually worked! Now, I think the codes and therefore the answers may be not updated. This exercise had 2 parts. I hope you allow me to present this in advance as well. The 1st part is now solved. The 2nd part asks me to execute the following code
All of this is correct. In fact, I get a data.frame. However, maybe I'm misinterpreting the results. If head(r2) is run, NGenes stands for number of genes, correct? Then, the answer to the question: "Find the GO term that has the the largest proportion of upregulated genes. How many genes does this term have?" would need to biggest value in Ngenes correct?
If this is true, the biggest value in NGenes is 11990, but it's not correct according to the exercise. Maybe I'm not filtering correctly, so how could I filter for the biggest value?
I won’t be able to help more here but look up the help for the function producing the object. This advice really applies to all Bioconductor packages and functions.
Thank you. I hope I can still let you know if I have issues with your exercises. They are very challenging but useful! :)
No I can’t help with exercises. Just to be clear I’m not affiliated with Harvard and have no supported effort in relation to the edx classes. I try to preserve time answering questions here about software my lab supports.