Question: How to remove dublicate probesets matching to the same gene, after toptable in order to have a genelist without dublicate gene Symbols for further GO analysis ?
0
gravatar for svlachavas
4.4 years ago by
svlachavas660
Greece/Athens/National Hellenic Research Foundation
svlachavas660 wrote:

Dear All,

i have used the limma software package to implement an  paired statistical analysis for a dataset regarding DE expression between control & cancer samples. Heres my code:

library(limma)

library(hgu133a.db)


conditions <- data.trusted.eset$condition
condition <- factor(conditions, levels(condition)[c(2,1)])
pairs <- factor(rep(1:13, each = 2))

design <- model.matrix(~condition+pairs)
fit <- lmFit(data.trusted.eset, design)

fit2 <- eBayes(fit)

library(hgu133a.db)

symbols <- unlist(mget(featureNames(data.trusted.eset), env=hgu133aSYMBOL))

top <- topTable(fit2, coef="conditionCancer", number=nrow(fit2), adjust.method="fdr", genelist=symbols)

select <- top[which(abs(top$logFC) >1 & top$ adj.P.Val < 0.05),]

My main question(as a beginner in R/bioconductor), is because my platform is Affymetrix and some probesets match to the same gene, how i can remove these dublicates(select$ID column) which have the same gene 2 or more times and remove them, so that i can extract my DE list without gene dublicate symbols for further analysis ?

Thank you in advance

 

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by svlachavas660
Answer: How to remove dublicate probesets matching to the same gene, after toptable in o
2
gravatar for Ryan C. Thompson
4.4 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:

First of all, you probably shouldn't use "select" as a variable name, because it is already a function, and also the word is a verb, not a noun. I would suggest calling it "selected" instead.

Second, the function you're looking for is duplicated. You would do something like:

# Make sure the table is ordered with most the most 
# significant results at the top
selected <- selected[order(select$adj.P.Val),]
# Keep only the first occurrence of each ID
selected <- selected[!duplicated(select$ID),]

Thirdly, using separate cutoffs for fold change and significance (p-value or FDR) is not recommended. Consider using the treat function instead to test for significance relative to a nonzero threshold.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Ryan C. Thompson7.3k
Answer: How to remove dublicate probesets matching to the same gene, after toptable in o
0
gravatar for svlachavas
4.4 years ago by
svlachavas660
Greece/Athens/National Hellenic Research Foundation
svlachavas660 wrote:

Thank you for your valuable answer and corrections !! . On the other hand, for the separate cutoffs you mean that i should not use simultaneously logFC and p-value cutoffs for sorting the results from topTable ? I thought that using logFC along with p-value cutoff gives more biologically meaningful results regarding the DE genes, instead of using just the adjusted p-value below a threshold. Moreover, one similar suggestion is mentioned in the paper of  JJ Chen et al., 2007{....."The fold-change criterion can always be used as a secondary criterion to facilitate the interpretation of biological significance. Some researchers may impose that the P-value and the foldchange are equally important; in such cases all genes satisfying both criteria are selected...."}.

ADD COMMENTlink written 4.4 years ago by svlachavas660
3

Go read the documentation and accompanying paper for the treat function. It provides a statistically principled way to combine the p-value and fold change cutoff into a single test.

ADD REPLYlink written 4.4 years ago by Ryan C. Thompson7.3k

thank you again for your recommendation-i'm definately going to read the paper to get a validated approach on my data set analysis !! i also used another approach,to get unique PROBEID, SYMBOL & ENTREZID but i'm not sure that is correct as the above :

after selected <- top[which(abs(top$logFC) >1 & top$ adj.P.Val < 0.05),]

ls("package:hgu133a.db")

columns(hgu133a.db)

keytypes(hgu133a.db)

res <- select(hgu133a.db, keys=rownames(selected),columns=c("ENTREZID", "SYMBOL"), 
keytype="PROBEID")

idx <- match(rownames(selected), res$PROBEID)

res2 <- res[idx,]

ADD REPLYlink written 4.4 years ago by svlachavas660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 280 users visited in the last hour