Trying to follow a gene set analyisis in R
1
0
Entering edit mode
@953c4872
Last seen 3 months ago
Mexico

Michael Love Hi. I am following one of your courses about Introduction to Bioconductor. I have a question regarding the code that is given to answer an exercise. I am required to perform the following code but on a different GO:

library(GEOquery)
e = getGEO("GSE34313")[[1]]
e$condition = e$characteristics_ch1.2
levels(e$condition) = c("dex24","dex4","control")
table(e$condition)
names(fData(e))
fData(e)$GO_ID[1:4]
lvls = c("control", "dex4")
es = e[,e$condition %in% lvls]
es$condition = factor(es$condition, levels=lvls)
library(limma)
library(qvalue)
design = model.matrix(~ es$condition)
**fit = lmFit(es, design=design)**
fit = eBayes(fit)
topTable(fit)[,c(6,7,18,22)]
set.seed(1)
idx = grep("GO:0006955", fData(es)$GO_ID)
length(idx)
**r1 = roast(es, idx, design)**
r1

However, when I type fit = lmFit(es, design=design) it shows "Coefficients not estimable: (Intercept) es$conditiondex4 Error in lm.fit(design, t(M)) : 0 (non-NA) cases". I wonder why is that. Since fit is not functioning properly, the roast function is not working properly as well. It shows: "Error in .lmEffects(y = y, design = design, contrast = contrast, array.weights = Dots$array.weights, : No residual degrees of freedom"

In the video of the course, there seems to be no problem with the code, but now there is. If I could get a better insight about what is happening, and how I could correct these functions, I would be thankful. Best regards!

roast-function • 443 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

What is e$condition after line 3?

ADD COMMENT
0
Entering edit mode

e$condition stands for an expression set that comes from the data in GEO. It was created to store the data in e$characteristics_ch1.2. If I run it, it gives:

e$characteristics_ch1.2 [1] "treatment: none"
[2] "treatment: none"
[3] "treatment: none"
[4] "treatment: none"
[5] "treatment: dexamethasone for 4 hr" [6] "treatment: dexamethasone for 4 hr" [7] "treatment: dexamethasone for 4 hr" [8] "treatment: dexamethasone for 24 hr" [9] "treatment: dexamethasone for 24 hr" [10] "treatment: dexamethasone for 24 hr"

It was done like that in a video of the course

ADD REPLY
0
Entering edit mode

What do you get for design after running design = model.matrix(~ es$condition)

ADD REPLY
0
Entering edit mode

I get the following

design = model.matrix(~ es$condition)

design (Intercept) es$conditiondex4 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$es$condition [1] "contr.treatment"

ADD REPLY
0
Entering edit mode

It looks like you only have a single sample.

ADD REPLY
0
Entering edit mode

Mm what could I do about it then? I had followed what was given but then, how would I add more samples?

ADD REPLY
0
Entering edit mode

I see, the problem is that base R has changed its behavior since the code was written. This will fix, changing line 3 to:

e$condition <- factor(e$characteristics_ch1.2)
ADD REPLY
0
Entering edit mode

Thank you. It actually worked! Now, I think the codes and therefore the answers may be not updated. This exercise had 2 parts. I hope you allow me to present this in advance as well. The 1st part is now solved. The 2nd part asks me to execute the following code

library(org.Hs.eg.db)
org.Hs.egGO2EG
go2eg = as.list(org.Hs.egGO2EG)
head(go2eg)
govector = unlist(go2eg)
golengths = sapply(go2eg, length)
head(fData(es)$GENE)

idxvector = match(govector, fData(es)$GENE);table(is.na(idxvector))
##This is the organized list of indexes for genes per GO term:
idx = split(idxvector, rep(names(go2eg), golengths))
##We can see the genes like this:
go2eg[[1]]
fData(es)$GENE[idx[[1]]]
##removal of gene sets with less than 10 genes and have NA values
idxclean = lapply(idx, function(x) x[!is.na(x)])
idxlengths = sapply(idxclean, length)
idxsub = idxclean[idxlengths > 10]
length(idxsub)
##an mroast is run
set.seed(1)
r2 = mroast(es, idxsub, design)
head(r2)

All of this is correct. In fact, I get a data.frame. However, maybe I'm misinterpreting the results. If head(r2) is run, NGenes stands for number of genes, correct? Then, the answer to the question: "Find the GO term that has the the largest proportion of upregulated genes. How many genes does this term have?" would need to biggest value in Ngenes correct?

If this is true, the biggest value in NGenes is 11990, but it's not correct according to the exercise. Maybe I'm not filtering correctly, so how could I filter for the biggest value?

ADD REPLY
0
Entering edit mode

I won’t be able to help more here but look up the help for the function producing the object. This advice really applies to all Bioconductor packages and functions.

ADD REPLY
0
Entering edit mode

Thank you. I hope I can still let you know if I have issues with your exercises. They are very challenging but useful! :)

ADD REPLY
0
Entering edit mode

No I can’t help with exercises. Just to be clear I’m not affiliated with Harvard and have no supported effort in relation to the edx classes. I try to preserve time answering questions here about software my lab supports.

ADD REPLY

Login before adding your answer.

Traffic: 1211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6