Question

How to remove dublicate probesets matching to the same gene, after toptable in order to have a genelist without dublicate gene Symbols for further GO analysis ?

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 6 hours ago

Germany/Heidelberg/German Cancer Resear…

Dear All,

i have used the limma software package to implement an paired statistical analysis for a dataset regarding DE expression between control & cancer samples. Heres my code:

library(limma)

library(hgu133a.db)

conditions <- data.trusted.eset$condition
condition <- factor(conditions, levels(condition)[c(2,1)])
pairs <- factor(rep(1:13, each = 2))

design <- model.matrix(~condition+pairs)
fit <- lmFit(data.trusted.eset, design)

fit2 <- eBayes(fit)

library(hgu133a.db)

symbols <- unlist(mget(featureNames(data.trusted.eset), env=hgu133aSYMBOL))

top <- topTable(fit2, coef="conditionCancer", number=nrow(fit2), adjust.method="fdr", genelist=symbols)

select <- top[which(abs(top$logFC) >1 & top$ adj.P.Val < 0.05),]

My main question(as a beginner in R/bioconductor), is because my platform is Affymetrix and some probesets match to the same gene, how i can remove these dublicates(select$ID column) which have the same gene 2 or more times and remove them, so that i can extract my DE list without gene dublicate symbols for further analysis ?

Thank you in advance

bioconductor limma differential gene expression dublicate • 1.8k views

ADD COMMENT • link 10.0 years ago svlachavas ▴ 840

score 2 · Answer 1 · 2015-02-14

First of all, you probably shouldn't use "select" as a variable name, because it is already a function, and also the word is a verb, not a noun. I would suggest calling it "selected" instead.

Second, the function you're looking for is duplicated. You would do something like:

# Make sure the table is ordered with most the most 
# significant results at the top
selected <- selected[order(select$adj.P.Val),]
# Keep only the first occurrence of each ID
selected <- selected[!duplicated(select$ID),]

Thirdly, using separate cutoffs for fold change and significance (p-value or FDR) is not recommended. Consider using the treat function instead to test for significance relative to a nonzero threshold.

score 0 · Answer 2 · 2015-02-14

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 6 hours ago

Germany/Heidelberg/German Cancer Resear…

Thank you for your valuable answer and corrections !! . On the other hand, for the separate cutoffs you mean that i should not use simultaneously logFC and p-value cutoffs for sorting the results from topTable ? I thought that using logFC along with p-value cutoff gives more biologically meaningful results regarding the DE genes, instead of using just the adjusted p-value below a threshold. Moreover, one similar suggestion is mentioned in the paper of JJ Chen et al., 2007{....."The fold-change criterion can always be used as a secondary criterion to facilitate the interpretation of biological significance. Some researchers may impose that the P-value and the foldchange are equally important; in such cases all genes satisfying both criteria are selected...."}.

ADD COMMENT • link 10.0 years ago svlachavas ▴ 840

3

Entering edit mode

Go read the documentation and accompanying paper for the treat function. It provides a statistically principled way to combine the p-value and fold change cutoff into a single test.

ADD REPLY • link 10.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

thank you again for your recommendation-i'm definately going to read the paper to get a validated approach on my data set analysis !! i also used another approach,to get unique PROBEID, SYMBOL & ENTREZID but i'm not sure that is correct as the above :

after selected <- top[which(abs(top$logFC) >1 & top$ adj.P.Val < 0.05),]

ls("package:hgu133a.db")

columns(hgu133a.db)

keytypes(hgu133a.db)

res <- select(hgu133a.db, keys=rownames(selected),columns=c("ENTREZID", "SYMBOL"),
keytype="PROBEID")

idx <- match(rownames(selected), res$PROBEID)

res2 <- res[idx,]

ADD REPLY • link 10.0 years ago svlachavas ▴ 840