Question: How to filter probesets from the expression set that do not contain Entrez/Gene Symbol identifiers
0
2.9 years ago by
mat14940
mat14940 wrote:

Hello,

I am running a limma linear model analysis using expression data produced from affymetrix 1.1 st zebrafish gene arrays.  I am filtering differentially expressed genes between three contrasts and when I call the topTable function, many probesets with NULL/NA identifiers are returned (see below).  I would like to filter all probesets containing N/A from the expression set object.  I have read about using the genefilter package but when I run the script I get an error message that I am having trouble interpreting.

eset<-rma(CELdat, background=TRUE, normalize=TRUE, subset=NULL, target="core")
library(affycoretools)
eset <- annotateEset(eset, annotation(eset))

library(org.Dr.eg.db)
fd <- fData(eset)
fd$ENTREZID <- mapIds(org.Dr.eg.db, as.character(fd$SYMBOL), "ENTREZID","SYMBOL",multiVals="first")
fData(eset) <- fd

eset<- nsFilter(eset, require.entrez=TRUE,remove.dupEntrez=TRUE,feature.exclude="^AFFX")$eset ##ERROR MESSAGE HERE: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'columns' for signature '"AffyGenePDInfo"' library(limma) design = model.matrix(~ 0 + f) colnames(design)=c("control","morphant","rescue") design data.fit = lmFit(eset,design) contrast.matrix = makeContrasts(morphant-control,rescue-control,morphant-rescue,levels=design) data.fit.con = contrasts.fit(data.fit,contrast.matrix) data.fit.eb = eBayes(data.fit.con) TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5)  PROBEID ID SYMBOL GENENAME logFC AveExpr t P.Value adj.P.Val B 13298616 13298616 #N/A #N/A #N/A 3.709468 3.126387 26.0062 2.7E-14 2.03E-09 21.27417 13032560 13032560 ENSDART00000149240 SI:DKEY-24I24.3 --- 3.599239 2.981615 24.849 5.42E-14 2.04E-09 20.78372 12930895 12930895 #N/A #N/A #N/A 4.583036 4.352965 22.63093 2.25E-13 4.24E-09 19.73525 12934103 12934103 #N/A #N/A #N/A 4.583036 4.352965 22.63093 2.25E-13 4.24E-09 19.73525 13281654 13281654 ENSDART00000129395 uvrag UV radiation resistance associated gene 4.483284 5.017916 20.09213 1.37E-12 2.06E-08 18.32921 13243932 13243932 #N/A #N/A #N/A 2.114799 2.408192 19.44796 2.23E-12 2.8E-08 17.93177 13004206 13004206 #N/A #N/A #N/A 3.201574 5.338119 16.65424 2.27E-11 2.2E-07 15.98024 Secondary to filtering N/A probesets from the expression set, I cannot figure out how to write out my topTable after mapping the gene symbols to Entrez ID's. TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5) library(xlsx) write.xlsx(TT,file="TT.xlsx",showNA=FALSE) I get the error: Error in if (is.na(value)) { : argument is of length zero In addition: Warning message: In is.na(value) : is.na() applied to non-(list or vector) of type 'NULL' I hope that my questions are clear, if not I can attempt to state them more clearly (#1 - How to remove N/A probesets from the limma analysis and #2 - how to coerce the toptable object to a writeable file which contains probeset Entrez ID's ...mapping probeset gene symbols to entrez ID's blocks me from doing this and i dont know why). Thanks for any help you can provide. Best, Matt oligo • 852 views ADD COMMENTlink modified 2.9 years ago by James W. MacDonald51k • written 2.9 years ago by mat14940 Answer: How to filter probesets from the expression set that do not contain Entrez/Gene 0 2.9 years ago by Gordon Smyth39k Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia Gordon Smyth39k wrote: Please don't use the genefilter package. It just takes something that is very simple and turns it into something hard. To remove NA values, you just use standard subsetting in R. For example, i <- is.na( fData(eset)$SYMBOL )
data.fit <- lmFit(eset[!i,], design)

Learning how to use standard R operations like this will pay dividends throughout your projects.

If there are still problems with this data, then you might ask a question of the annotateEset() author.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Gordon Smyth39k

Thanks, Gordon, this code worked for removing the N/A's as i hoped.  I am still considering removing duplicate identifiers... but I am uncertain how removing duplicate Symbol/Entrez ID's may influence the analysis.

ADD REPLYlink written 2.9 years ago by mat14940

Unless you have special needs, there is no need to remove multiple probes for the same gene. It's easy to do however if you need to.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Gordon Smyth39k
Answer: How to filter probesets from the expression set that do not contain Entrez/Gene
0
2.9 years ago by
United States
James W. MacDonald51k wrote:

You should probably use getMainProbes first, as the top probeset that you show is a 'rescue' probeset, which by definition isn't particularly interesting. As Gordon notes, the genefilter package is rather complex, and it could be argued that some of the defaults, particularly for nsFilter, are not really that useful and are not intended for the random primer style arrays (such as the one you are using).

As an aside, the Affymetrix CSV files have been changed and now contain the Entrez Gene ID in an easily parseable format - I have updated annotateEset to reflect those changes - it will take a day or two for these changes to propagate through the build servers.

While it's tempting to want to remove all the NA probesets (after removing the controls), I am not sure this is really the best way to proceed. It certainly makes interpretation easier, but it assumes that all those apparently differentially expressed probesets that are not readily annotated are uninteresting. If you have filtered out any low-expressing probesets, and you still have lots of unannotated probesets in your top table, it might be worthwhile to try to figure out what you are measuring.

ADD COMMENTlink written 2.9 years ago by James W. MacDonald51k
Please log in to add an answer.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 292 users visited in the last hour