How to filter probesets from the expression set that do not contain Entrez/Gene Symbol identifiers
2
0
Entering edit mode
mat149 ▴ 70
@mat149-11450
Last seen 16 days ago
United States

Hello,

 

I am running a limma linear model analysis using expression data produced from affymetrix 1.1 st zebrafish gene arrays.  I am filtering differentially expressed genes between three contrasts and when I call the topTable function, many probesets with NULL/NA identifiers are returned (see below).  I would like to filter all probesets containing N/A from the expression set object.  I have read about using the genefilter package but when I run the script I get an error message that I am having trouble interpreting.

 

 

eset<-rma(CELdat, background=TRUE, normalize=TRUE, subset=NULL, target="core")     
library(affycoretools)
eset <- annotateEset(eset, annotation(eset))

library(org.Dr.eg.db)
fd <- fData(eset)
fd$ENTREZID <- mapIds(org.Dr.eg.db, as.character(fd$SYMBOL), "ENTREZID","SYMBOL",multiVals="first")
fData(eset) <- fd

eset<- nsFilter(eset, require.entrez=TRUE,remove.dupEntrez=TRUE,feature.exclude="^AFFX")$eset
##ERROR MESSAGE HERE:
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'columns' for signature '"AffyGenePDInfo"'

library(limma)
design = model.matrix(~ 0 + f)
colnames(design)=c("control","morphant","rescue")
design
data.fit = lmFit(eset,design)
contrast.matrix = makeContrasts(morphant-control,rescue-control,morphant-rescue,levels=design)
data.fit.con = contrasts.fit(data.fit,contrast.matrix)
data.fit.eb = eBayes(data.fit.con)

TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5)

 

PROBEID

ID

SYMBOL

GENENAME

logFC

AveExpr

t

P.Value

adj.P.Val

B

13298616

13298616

#N/A

#N/A

#N/A

3.709468

3.126387

26.0062

2.7E-14

2.03E-09

21.27417

13032560

13032560

ENSDART00000149240

SI:DKEY-24I24.3

---

3.599239

2.981615

24.849

5.42E-14

2.04E-09

20.78372

12930895

12930895

#N/A

#N/A

#N/A

4.583036

4.352965

22.63093

2.25E-13

4.24E-09

19.73525

12934103

12934103

#N/A

#N/A

#N/A

4.583036

4.352965

22.63093

2.25E-13

4.24E-09

19.73525

13281654

13281654

ENSDART00000129395

uvrag

UV radiation resistance associated gene

4.483284

5.017916

20.09213

1.37E-12

2.06E-08

18.32921

13243932

13243932

#N/A

#N/A

#N/A

2.114799

2.408192

19.44796

2.23E-12

2.8E-08

17.93177

13004206

13004206

#N/A

#N/A

#N/A

3.201574

5.338119

16.65424

2.27E-11

2.2E-07

15.98024

 

 

 

 

 

 

 

 

 

 

 

Secondary to filtering N/A probesets from the expression set, I cannot figure out how to write out my topTable after mapping the gene symbols to Entrez ID's.  

TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5)

library(xlsx)

write.xlsx(TT,file="TT.xlsx",showNA=FALSE)

I get the error:

Error in if (is.na(value)) { : argument is of length zero
In addition: Warning message:
In is.na(value) : is.na() applied to non-(list or vector) of type 'NULL'

 

I hope that my questions are clear, if not I can attempt to state them more clearly (#1 - How to remove N/A probesets from the limma analysis and #2 - how to coerce the toptable object to a writeable file which contains probeset Entrez ID's ...mapping probeset gene symbols to entrez ID's blocks me from doing this and i dont know why).  Thanks for any help you can provide. Best,

Matt

oligo • 2.2k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 7 minutes ago
WEHI, Melbourne, Australia

Please don't use the genefilter package. It just takes something that is very simple and turns it into something hard.

To remove NA values, you just use standard subsetting in R. For example,

i <- is.na( fData(eset)$SYMBOL )
data.fit <- lmFit(eset[!i,], design)

Learning how to use standard R operations like this will pay dividends throughout your projects.

If there are still problems with this data, then you might ask a question of the annotateEset() author.

ADD COMMENT
0
Entering edit mode

Thanks, Gordon, this code worked for removing the N/A's as i hoped.  I am still considering removing duplicate identifiers... but I am uncertain how removing duplicate Symbol/Entrez ID's may influence the analysis.

ADD REPLY
0
Entering edit mode

Unless you have special needs, there is no need to remove multiple probes for the same gene. It's easy to do however if you need to.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

You should probably use getMainProbes first, as the top probeset that you show is a 'rescue' probeset, which by definition isn't particularly interesting. As Gordon notes, the genefilter package is rather complex, and it could be argued that some of the defaults, particularly for nsFilter, are not really that useful and are not intended for the random primer style arrays (such as the one you are using).

As an aside, the Affymetrix CSV files have been changed and now contain the Entrez Gene ID in an easily parseable format - I have updated annotateEset to reflect those changes - it will take a day or two for these changes to propagate through the build servers.

While it's tempting to want to remove all the NA probesets (after removing the controls), I am not sure this is really the best way to proceed. It certainly makes interpretation easier, but it assumes that all those apparently differentially expressed probesets that are not readily annotated are uninteresting. If you have filtered out any low-expressing probesets, and you still have lots of unannotated probesets in your top table, it might be worthwhile to try to figure out what you are measuring.

ADD COMMENT

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6