Hello,
I am working with Affymetrix 1.1 ST zebrafish gene arrays. The script I have attached works but the pd.zebgene.1.1.st annotation package for these particular arrays do not contain Entrez ID's for each probeset on the platform. I have seen that expression sets can be annotated using, for instance, library(affycoretools) and library(hugene11sttranscriptcluster.db). My problem is that the zebrafish Affy 1.1 ST gene arrays do not contain transcriptcluster.db or probesetcluster.db packages that are available for the human/mouse/rat analogues. My question is: Can anyone explain/provide a reference as to how I can obtain Entrez ID's for my limma topTable results? I fear that I am not good enough at "R" to create my own annotation packages for this array.
When I apply the "annotateEset" function, the Entrez ID's are not returned in my topTable results which look like:
PROBEID | ID | SYMBOL | GENENAME | logFC | AveExpr | t | P.Value | adj.P.Val | B |
13305218 | AF109369 | opn1mw1 | opsin 1 (cone pigments), medium-wave-sensitive, 1 | -6.1018 | 8.754473 | -10.8657 | 1.08E-08 | 1.98E-05 | 10.13624 |
13187576 | AF109373 | opn1sw1 | opsin 1 (cone pigments), short-wave-sensitive 1 | -5.19865 | 8.99628 | -9.02191 | 1.36E-07 | 9.19E-05 | 7.81309 |
13276068 | NA | NA | NA | -4.76802 | 7.529454 | -7.27885 | 2.11E-06 | 0.00053 | 5.207056 |
12988871 | BC134193 | gngt2b | guanine nucleotide binding protein (G protein), gamma transducing activity polypeptide 2b | -4.75024 | 7.834161 | -11.5697 | 4.49E-09 | 1.21E-05 | 10.92152 |
13282126 | BC059650 | arr3b | arrestin 3b, retinal (X-arrestin) | -4.297 | 6.427947 | -6.55812 | 7.38E-06 | 0.001151 | 3.99979 |
setwd("C:\\Users\\mat149\\Desktop\\GG")
library(oligo)
CELlist <- list.celfiles("C:\\Users\\mat149\\Desktop\\GG\\CEL", full.names=TRUE, pattern=NULL, all.file=FALSE)
pdat<-read.AnnotatedDataFrame("C:\\Users\\mat149\\Desktop\\GG\\CEL\\pheno16.txt",header=TRUE,row.name="Filename",sep="\t")
CELdat <- read.celfiles(filenames = CELlist,experimentData=TRUE,phenoData=pdat,verbose=TRUE)
ph = CELdat@phenoData
ph@data[ ,1] = c("WT1","WT2","WT3","WT4","WT5","WT6","WT7","WT8","lepA_MO1","lepA_MO2","lepA_MO3","lepA_MO4","lepA_rescue1","lepA_rescue2","lepA_rescue3","lepA_rescue4")
ph@data[ ,2] = c("CONTROL","CONTROL","CONTROL","CONTROL","CONTROL","CONTROL","CONTROL","CONTROL","morphant","morphant","morphant","morphant","rescue","rescue","rescue","rescue")
colnames(ph@data)[2]="Treatment"
colnames(ph@data)[1]="Sample"
groups = ph@data$Treatment
f = factor(groups,levels=c("CONTROL","morphant","rescue"))
eset<-rma(CELdat, background=TRUE, normalize=TRUE, subset=NULL, target="core")
featureData(eset)<-getNetAffx(eset, type = "transcript")
data.matrix = exprs(eset)
library(affycoretools)
librarypd.zebgene.1.1.st)
eset.annot <- annotateEset(eset, annotation(eset))
estt <- annotateEset(eset.annot, pd.zebgene.1.1.st,columns = c("PROBEID","ENSEMBL","ALIAS","ENTREZID","SYMBOL","GENENAME"),multivals = filter,annocols=1, probecol=2)
library(limma)
design <- model.matrix(~ 0+factor(c(1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3)))
colnames(design) <- c("CONTROL","lepA_MO","lepA_RS")
fit <- lmFit(estt, design)
contrast.matrix <- makeContrasts(lepA_MO-CONTROL, lepA_RS-CONTROL, lepA_MO-lepA_RS, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
results <- decideTests(fit2,p.value=0.01,adjust="BH",lfc=2)
MOWT<-topTable(fit2, coef=1,adjust="BH",p.value=0.01,lfc=2,sort.by="P",n=Inf)
Thanks for the reply. I have added your script but I am still getting an error message:
"Error in `$<-.data.frame`(`*tmp*`, "ENTREZID", value = list(`30037` = 14730L, :
replacement has 37237 rows, data has 75212"
The script I ran was:
setwd("C:\\Users\\mat149\\Desktop\\GG")
library(oligo)
CELlist <- list.celfiles("C:\\Users\\mat149\\Desktop\\GG\\CEL", full.names=TRUE, pattern=NULL, all.file=FALSE)
pdat<-read.AnnotatedDataFrame("C:\\Users\\mat149\\Desktop\\GG\\CEL\\pheno16.txt",header=TRUE,row.name="Filename",sep="\t")
CELdat <- read.celfiles(filenames = CELlist,experimentData=TRUE,phenoData=pdat,verbose=TRUE)
eset<-rma(CELdat, background=TRUE, normalize=TRUE, subset=NULL, target="core")
featureData(eset)<-getNetAffx(eset, type = "transcript")
data.matrix = exprs(eset)
library(affycoretools)
librarypd.zebgene.1.1.st)
eset.annot <- annotateEset(eset, annotation(eset))
eset1 <- annotateEset(eset.annot, pd.zebgene.1.1.st,columns = c("PROBEID","ZFIN","ENSEMBL","ALIAS","ENTREZID","SYMBOL","GENENAME"),multiVals = filter,annocols=1, probecol=2)
library(org.Dr.eg.db)
keys <- keys(org.Dr.eg.db)
fd <- fData(eset1)
fd$ENTREZID <- mapIds(org.Dr.eg.db,keys=keys,fd$SYMBOL,column="ENTREZID",keytype="ENTREZID", multiVals = list) -----> error after this line
fData(eset) <- fd
You are doing a bunch of extra things that don't really help your cause. This:
and this
do very similar things. Is there a particular rationale to do both? And then you do this:
which is a mixture of arguments for two different methods, and probably doesn't do what you think it is doing. In the end you have three (!) different ExpressionSets that are annotated using three closely related methods. You can really only use one at a time, so having three is unnecessarily complex.
The final error comes from the fact that you are doing something that makes no sense at all, and doesn't even remotely resemble what I recommended you do. You are trying to find the Entrez Gene ID for all the genes on your array, right? But then you do this:
That gets all the Entrez Gene IDs in the org.Dr.eg.db package, which corresponds to all known D. rerio Entrez Gene IDs. You then do
Which, translated to English is saying 'give me all the Entrez Gene IDs for D. rerio that are in the org.Dr.eg.db, based on this set of Entrez Gene IDs that I just extracted from that package'. The fact that you put fd$SYMBOL in that function call, without using an argument, means that it is in essence ignored.
You can use positional argument matching, where you put the arguments in the correct position, or you can use direct argument matching with the equals sign, but using a mixture of both will often have unintended consequences.
I gave you three lines of code that would have done exactly what you wanted to do, but you didn't use it! This doesn't have to be as complicated as you are making it, and you should NEVER be accessing S4 slots directly using the @ symbol! Rather than doing all this extra stuff that is not helpful, you can do what you want in very few lines of code.
This is what your script should look like:
Note that adding in all the phenoData isn't strictly necessary, ESPECIALLY if you add it all in and then never use it for anything! Your goal should be to simplify what you do, limit extra steps, and understand what you are doing and why.
Hello James, thanks again for your help. I am not a computer scientist by any means so forgive my ignorance. I have followed the lines of code you have listed but I ran into this error when I tried to run the line:
Error in .testForValidKeys(x, keys, keytype, fks) :
'keys' must be a character vector
I think I need to tell AnnotationDbi which keys/keytype from the expression set feature data to map to Entrez ID's available in org.Dre.eg.db, but I dont know how to do that (the eset contains "PROBEID" identifiers in the first column). To answer your question, yes I am trying to obtain Entrez ID's for all genes on the array.
You are misinterpreting the problem. The error you get is
Which is telling you that the 'keys' argument needs to be a character vector. If we look at the arguments for
mapIds
, we have:So the 'keys' argument is the second argument, for which you are passing fd$SYMBOL. And the error message is telling you that this must be character. You could hypothetically test to see what class that is, using
class(fd$SYMBOL)
(it's a factor), but that won't tell you how to 'fix' the situation. BUT Google will, and Google is almost always your best friend! Part of being an informed programmer is knowing how to answer questions. A simple Google search for 'R convert factor to character' will turn up any number of links that will tell you to useas.character
.At which point you will know that the answer is
Thank you very much; this worked and I also learned a few neat things in the process of you explaining my fails.
Vouch for James!