Hello,
I am running a limma linear model analysis using expression data produced from affymetrix 1.1 st zebrafish gene arrays. I am filtering differentially expressed genes between three contrasts and when I call the topTable function, many probesets with NULL/NA identifiers are returned (see below). I would like to filter all probesets containing N/A from the expression set object. I have read about using the genefilter package but when I run the script I get an error message that I am having trouble interpreting.
eset<-rma(CELdat, background=TRUE, normalize=TRUE, subset=NULL, target="core") library(affycoretools) eset <- annotateEset(eset, annotation(eset)) library(org.Dr.eg.db) fd <- fData(eset) fd$ENTREZID <- mapIds(org.Dr.eg.db, as.character(fd$SYMBOL), "ENTREZID","SYMBOL",multiVals="first") fData(eset) <- fd eset<- nsFilter(eset, require.entrez=TRUE,remove.dupEntrez=TRUE,feature.exclude="^AFFX")$eset ##ERROR MESSAGE HERE: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'columns' for signature '"AffyGenePDInfo"' library(limma) design = model.matrix(~ 0 + f) colnames(design)=c("control","morphant","rescue") design data.fit = lmFit(eset,design) contrast.matrix = makeContrasts(morphant-control,rescue-control,morphant-rescue,levels=design) data.fit.con = contrasts.fit(data.fit,contrast.matrix) data.fit.eb = eBayes(data.fit.con) TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5)
|
PROBEID |
ID |
SYMBOL |
GENENAME |
logFC |
AveExpr |
t |
P.Value |
adj.P.Val |
B |
13298616 |
13298616 |
#N/A |
#N/A |
#N/A |
3.709468 |
3.126387 |
26.0062 |
2.7E-14 |
2.03E-09 |
21.27417 |
13032560 |
13032560 |
ENSDART00000149240 |
SI:DKEY-24I24.3 |
--- |
3.599239 |
2.981615 |
24.849 |
5.42E-14 |
2.04E-09 |
20.78372 |
12930895 |
12930895 |
#N/A |
#N/A |
#N/A |
4.583036 |
4.352965 |
22.63093 |
2.25E-13 |
4.24E-09 |
19.73525 |
12934103 |
12934103 |
#N/A |
#N/A |
#N/A |
4.583036 |
4.352965 |
22.63093 |
2.25E-13 |
4.24E-09 |
19.73525 |
13281654 |
13281654 |
ENSDART00000129395 |
uvrag |
UV radiation resistance associated gene |
4.483284 |
5.017916 |
20.09213 |
1.37E-12 |
2.06E-08 |
18.32921 |
13243932 |
13243932 |
#N/A |
#N/A |
#N/A |
2.114799 |
2.408192 |
19.44796 |
2.23E-12 |
2.8E-08 |
17.93177 |
13004206 |
13004206 |
#N/A |
#N/A |
#N/A |
3.201574 |
5.338119 |
16.65424 |
2.27E-11 |
2.2E-07 |
15.98024 |
|
|
|
|
|
|
|
|
|
|
|
Secondary to filtering N/A probesets from the expression set, I cannot figure out how to write out my topTable after mapping the gene symbols to Entrez ID's.
TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5) library(xlsx) write.xlsx(TT,file="TT.xlsx",showNA=FALSE)
I get the error:
Error in if (is.na(value)) { : argument is of length zero In addition: Warning message: In is.na(value) : is.na() applied to non-(list or vector) of type 'NULL'
I hope that my questions are clear, if not I can attempt to state them more clearly (#1 - How to remove N/A probesets from the limma analysis and #2 - how to coerce the toptable object to a writeable file which contains probeset Entrez ID's ...mapping probeset gene symbols to entrez ID's blocks me from doing this and i dont know why). Thanks for any help you can provide. Best,
Matt
Thanks, Gordon, this code worked for removing the N/A's as i hoped. I am still considering removing duplicate identifiers... but I am uncertain how removing duplicate Symbol/Entrez ID's may influence the analysis.
Unless you have special needs, there is no need to remove multiple probes for the same gene. It's easy to do however if you need to.