My name is Maria Maqueda and I am working with some data from HuGene20st microarrays (at transcript cluster level). This is not the first time working with these arrays but it seems I am again struggling with the annotation. Mainly, I have two questions:
1) Regarding lincRNA annotation. I am obtaining around 730 lincRNA-related transcripts through hugene20sttranscriptcluster.db (v8.3.0), while in annotation file from Affymetrix, there are around 12k (mrna assignment category). Some time ago (late 2013) I already asked about this difference regarding lincRNA annotation (https://support.bioconductor.org/p/56347/#56349), do you foresee any better alignment between them?
2) Regarding cross-hybridization category. I have obtained 2613 transcripts from hugene20sttranscriptcluster.db (v8.3.0) which have "Mixed" cross-hybridization value in Affymetrix annotation file. My initial idea was to keep only "main" and "unique" (X-hyb) transcripts for further analysis, but based on this result I have my doubts. Could it be an error in the Affymetrix annotation files? Anyone has any suggestion about how to deal with this "mixed" X-hyb transcripts?
Many thanks in advance for any help you could bring.
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.1 (Yosemite)
attached base packages:
 parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
 hugene20sttranscriptcluster.db_8.3.0 org.Hs.eg.db_3.1.2
 RSQLite_1.0.0 DBI_0.3.1
 AnnotationDbi_1.30.1 GenomeInfoDb_1.4.0
 IRanges_2.2.1 S4Vectors_0.6.0
 Biobase_2.28.0 BiocGenerics_0.14.0
loaded via a namespace (and not attached):