How can I map Tag to RefSeq ID or gene symbol? My goal is making a matrix of expression profile.
Unfortunately, I did not find any useful information in GPL9115. SOFT formatted family file(s) in this page contains something like above columns for all of the available samples in GEO.
I would guess these were generated from a SAGE-seq like protocol, similar to what was used in this paper.
I worked with this type of data a bit in grad school, and you can map these sequences to their host gene in much the same way you would analyze "normal" sequencing data, ie. align the original data to the genome and just count it as normal (via something like featureCounts), or you could just align the first column of the data you've shown, then resolve those gene hits with the counts in the table. Note that you will most likely have several rows of that data file resolving to the same gene, so you'll have to figure out if you want to sum all of those up to the gene level, or analyze at the unique tag level (I previously rolled up to the gene level).
Use Biostrings::matchPDict(), e.g., following the example labelled A. A SIMPLE EXAMPLE OF EXACT MATCHING. But if the 'tpm' column is from a different mapping then your results will not make any sense.
This question has been posted also to Biostars: https://www.biostars.org/p/165111
Yeah, but I could not find a specific answer, e.g. R code or something like that.