I am Marco, working as a Bioinformatician for a research company. I'm using HTA 2.0 microarrays to analice cancer cells. My question is about how to do a properly annotation and filter of genes in HTA 2.0 analysis.
I found on internet different ways to do it but I don't know which one is the most appropriate.
For example, I was following this webpage: https://www.bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html
In this case, I used this code:
#Read cel files dat <- read.celfiles() #RMA normalization eset <- rma(dat) #Annotation eset <- annotateEset(eset, pd.hta.2.0) load(system.file("/extdata/netaffxTranscript.rda", package = "pd.hta.2.0")) annot <- pData("netaffxTranscript") annot <- annot[featureNames(eset),] fdat <- fData(eset) fdat$LOCUSTYPE <- annot$locustype fData(eset) <- fdat
After annotation of the transcript clusters I added the gene symbol (SYMBOL) and a short description of the gene the cluster represents (GENENAME) and extra information (Locustype etc.). In a second step, I filtered out the probes that do not map to a gene.
#Remove NA from Symbol column eset<- subset(eset, !is.na(SYMBOL))
And I got this:
However I have genes with different gene ID but same GeneName. Do I need to filter as well? I though to use the maximum absolute deviation (MAD) to eliminate duplicate GeneNames, and keep the ones of greatest interest. The probes of interest in our study are those that present the greatest variability.
What do you think about this??