Question: How to filter probesets from the expression set that do not contain Entrez/Gene Symbol identifiers
0
gravatar for mat149
2.9 years ago by
mat14940
mat14940 wrote:

Hello,

 

I am running a limma linear model analysis using expression data produced from affymetrix 1.1 st zebrafish gene arrays.  I am filtering differentially expressed genes between three contrasts and when I call the topTable function, many probesets with NULL/NA identifiers are returned (see below).  I would like to filter all probesets containing N/A from the expression set object.  I have read about using the genefilter package but when I run the script I get an error message that I am having trouble interpreting.

 

 

eset<-rma(CELdat, background=TRUE, normalize=TRUE, subset=NULL, target="core")     
library(affycoretools)
eset <- annotateEset(eset, annotation(eset))

library(org.Dr.eg.db)
fd <- fData(eset)
fd$ENTREZID <- mapIds(org.Dr.eg.db, as.character(fd$SYMBOL), "ENTREZID","SYMBOL",multiVals="first")
fData(eset) <- fd

eset<- nsFilter(eset, require.entrez=TRUE,remove.dupEntrez=TRUE,feature.exclude="^AFFX")$eset
##ERROR MESSAGE HERE:
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'columns' for signature '"AffyGenePDInfo"'

library(limma)
design = model.matrix(~ 0 + f)
colnames(design)=c("control","morphant","rescue")
design
data.fit = lmFit(eset,design)
contrast.matrix = makeContrasts(morphant-control,rescue-control,morphant-rescue,levels=design)
data.fit.con = contrasts.fit(data.fit,contrast.matrix)
data.fit.eb = eBayes(data.fit.con)

TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5)

 

PROBEID

ID

SYMBOL

GENENAME

logFC

AveExpr

t

P.Value

adj.P.Val

B

13298616

13298616

#N/A

#N/A

#N/A

3.709468

3.126387

26.0062

2.7E-14

2.03E-09

21.27417

13032560

13032560

ENSDART00000149240

SI:DKEY-24I24.3

---

3.599239

2.981615

24.849

5.42E-14

2.04E-09

20.78372

12930895

12930895

#N/A

#N/A

#N/A

4.583036

4.352965

22.63093

2.25E-13

4.24E-09

19.73525

12934103

12934103

#N/A

#N/A

#N/A

4.583036

4.352965

22.63093

2.25E-13

4.24E-09

19.73525

13281654

13281654

ENSDART00000129395

uvrag

UV radiation resistance associated gene

4.483284

5.017916

20.09213

1.37E-12

2.06E-08

18.32921

13243932

13243932

#N/A

#N/A

#N/A

2.114799

2.408192

19.44796

2.23E-12

2.8E-08

17.93177

13004206

13004206

#N/A

#N/A

#N/A

3.201574

5.338119

16.65424

2.27E-11

2.2E-07

15.98024

 

 

 

 

 

 

 

 

 

 

 

Secondary to filtering N/A probesets from the expression set, I cannot figure out how to write out my topTable after mapping the gene symbols to Entrez ID's.  

TT<-topTable(data.fit.eb,coef=1,number=Inf,adjust="BH",p.value=0.01,lfc=1.5)

library(xlsx)

write.xlsx(TT,file="TT.xlsx",showNA=FALSE)

I get the error:

Error in if (is.na(value)) { : argument is of length zero
In addition: Warning message:
In is.na(value) : is.na() applied to non-(list or vector) of type 'NULL'

 

I hope that my questions are clear, if not I can attempt to state them more clearly (#1 - How to remove N/A probesets from the limma analysis and #2 - how to coerce the toptable object to a writeable file which contains probeset Entrez ID's ...mapping probeset gene symbols to entrez ID's blocks me from doing this and i dont know why).  Thanks for any help you can provide. Best,

Matt

oligo • 852 views
ADD COMMENTlink modified 2.9 years ago by James W. MacDonald51k • written 2.9 years ago by mat14940
Answer: How to filter probesets from the expression set that do not contain Entrez/Gene
0
gravatar for Gordon Smyth
2.9 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Please don't use the genefilter package. It just takes something that is very simple and turns it into something hard.

To remove NA values, you just use standard subsetting in R. For example,

i <- is.na( fData(eset)$SYMBOL )
data.fit <- lmFit(eset[!i,], design)

Learning how to use standard R operations like this will pay dividends throughout your projects.

If there are still problems with this data, then you might ask a question of the annotateEset() author.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Gordon Smyth39k

Thanks, Gordon, this code worked for removing the N/A's as i hoped.  I am still considering removing duplicate identifiers... but I am uncertain how removing duplicate Symbol/Entrez ID's may influence the analysis.

ADD REPLYlink written 2.9 years ago by mat14940

Unless you have special needs, there is no need to remove multiple probes for the same gene. It's easy to do however if you need to.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Gordon Smyth39k
Answer: How to filter probesets from the expression set that do not contain Entrez/Gene
0
gravatar for James W. MacDonald
2.9 years ago by
United States
James W. MacDonald51k wrote:

You should probably use getMainProbes first, as the top probeset that you show is a 'rescue' probeset, which by definition isn't particularly interesting. As Gordon notes, the genefilter package is rather complex, and it could be argued that some of the defaults, particularly for nsFilter, are not really that useful and are not intended for the random primer style arrays (such as the one you are using).

As an aside, the Affymetrix CSV files have been changed and now contain the Entrez Gene ID in an easily parseable format - I have updated annotateEset to reflect those changes - it will take a day or two for these changes to propagate through the build servers.

While it's tempting to want to remove all the NA probesets (after removing the controls), I am not sure this is really the best way to proceed. It certainly makes interpretation easier, but it assumes that all those apparently differentially expressed probesets that are not readily annotated are uninteresting. If you have filtered out any low-expressing probesets, and you still have lots of unannotated probesets in your top table, it might be worthwhile to try to figure out what you are measuring.

ADD COMMENTlink written 2.9 years ago by James W. MacDonald51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 292 users visited in the last hour