What is the best way to do annotation of the transcript clusters and filter in HTA 2.0
Entering edit mode
Last seen 16 months ago

Hello everybody.

I am Marco, working as a Bioinformatician for a research company. I'm using HTA 2.0 microarrays to analice cancer cells. My question is about how to do a properly annotation and filter of genes in HTA 2.0 analysis.

I found on internet different ways to do it but I don't know which one is the most appropriate.

For example, I was following this webpage: https://www.bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html

In this case, I used this code:

 #Read cel files
    dat <- read.celfiles()
 #RMA normalization
    eset <- rma(dat)
        eset <- annotateEset(eset, pd.hta.2.0)
        load(system.file("/extdata/netaffxTranscript.rda", package = "pd.hta.2.0"))
        annot <- pData("netaffxTranscript")
        annot <- annot[featureNames(eset),]
        fdat <- fData(eset)
        fdat$LOCUSTYPE <- annot$locustype
        fData(eset) <- fdat

After annotation of the transcript clusters I added the gene symbol (SYMBOL) and a short description of the gene the cluster represents (GENENAME) and extra information (Locustype etc.). In a second step, I filtered out the probes that do not map to a gene.

#Remove NA from Symbol column
eset<- subset(eset, !is.na(SYMBOL))

And I got this:

enter image description here

However I have genes with different gene ID but same GeneName. Do I need to filter as well? I though to use the maximum absolute deviation (MAD) to eliminate duplicate GeneNames, and keep the ones of greatest interest. The probes of interest in our study are those that present the greatest variability.

What do you think about this??

HTA2.0 microarray affy r • 507 views
Entering edit mode
Last seen 3 hours ago
United States

The HTA 2.0 arrays are Affy's 'answer' to RNA-Seq, and are intended to measure transcripts rather than genes. They are actually intended to allow people to detect transcript variants and stuff, so if you are just interested in differential gene expression, you are hunting squirrels with a bazooka. Ideally you would have used the Gene ST array, which is simpler and IMO more useful. But maybe your group got a really good deal? It seems like Affy can't give the HTA arrays away, so it's possible.

Anyway, you are asking a perennial question, and it is actually an analysis question rather than a software question. We can help you with how to get a package to do what you want, but cannot (and you shouldn't want us to) tell you how to analyze your data. If you are the analyst, then you are the analyst and it's up to you to analyze. Asking mostly pseudonymous randos on the interwebs if they think you are doing the right thing is probably not the ideal way to proceed.

Entering edit mode

I know that the HTA 2.0 arrays measure transcripts rather than genes, but from what my professors told me at the university, if you know the transcript, you can know the gene. The samples in my study came from cancer patients before they were treated with a drug and after treatment, so it was decided to use the HTA 2.0 arrays.

Of course I don't want you to tell me how I have to do all the analysis, but as I have specified above, I have found guidelines that make a totally different analysis and I don't know which is the best way.

I can give you an IKEA piece of furniture without instructions... I know you'll probably end up assembling it without any problems, but the instructions and if you have some help from someone who has experience in this... they help, don't they?


Login before adding your answer.

Traffic: 410 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6