Question

Probe sequences to Entrez Ids

0

Entering edit mode

theobroma22 ▴ 10

@theobroma22-11920

Last seen 7.3 years ago

Is the a package, or a means by altering any usable package functions to get the Entrez IDs starting with a set of array probe sequences?

Thanks.

r • 929 views

ADD COMMENT • link updated 7.3 years ago by Gordon Smyth 50k • written 7.3 years ago by theobroma22 ▴ 10

0

Entering edit mode

hey theobroma22,

can you provide the name of the array platform that you are using? if it is Affymetrix, try 'affycoretools'

ADD REPLY • link 7.3 years ago mat149 ▴ 70

0

Entering edit mode

It's an apple fruit nimblegen array with little annotation. I have all of the array probe sequences, and files of the DE gene probe sets from a limma result. The ids use the contig number with a subsequent numerical code if such as Contig00001_2_f_100_1_200 which perhaps represents the nucleotide sequence position. To retrieve the Entrez number using BLAST I have to click a hyperlink to retrieve the Entrez gene ID for windows and Linux systems. I was thinking to hack a current R function in order to parse out the Entrez ID using a sequence.

Thanks.

ADD REPLY • link 7.3 years ago theobroma22 ▴ 10

score 0 · Answer 1 · 2017-01-08

The natural solution would be to use a aligner to map the probe sequences to the apple genome. I would use a splice-aware aligner although that might not be essential. You would have to create a FastQ file containing all the probe sequences. You would also have to download the apple genome from NCBI as well as gene annotation for the same species. Then align the probe sequences to the genome and assign each one if possible to a gene. The alignment and assignment can be done efficiently using the Rsubread package for example (using Unix rather than Windows).