Hi all,
I'm having a bit of problem with the TxDb.Hsapiens.UCSC.hg19.knownGene, most of the Entrez identifiers I used are fine when I use the function: transcriptsBy (TxDb.Hsapiens.UCSC.hg19.knownGene, by = "gene")
For example: (example that works)
------------------------------------------------------------------------------------------------------------------
MYOT <-'9499'
transcriptCoordsByGene.GRangesList.MYOT <-
  transcriptsBy (TxDb.Hsapiens.UCSC.hg19.knownGene, by = "gene") [MYOT]
transcriptCoordsByGene.GRangesList.MYOT
#GRangesList object of length 1:
#$9499 
#GRanges object with 4 ranges and 2 metadata columns:
#seqnames                 ranges strand |     tx_id     tx_name
#<Rle>              <IRanges>  <Rle> | <integer> <character>
#[1]     chr5 [137022410, 137223540]      + |     21288  uc011cye.2
#[2]     chr5 [137203545, 137223540]      + |     21290  uc003lbv.3
#[3]     chr5 [137203545, 137223540]      + |     21291  uc011cyg.2
#[4]     chr5 [137203545, 137223540]      + |     21292  uc011cyh.2
---------------------------------------------------------------------------------------------------
However, for some other genes such as 201625 which is Entrez code for DNAH12 gene in human (I used library(org.Hs.eg.db) and checked with NCBI) I start to get:
Error: subscript contains out-of-bounds indices
Could you please tell me how I can solve this problem?
or is there any other packages I can use to extract these genes' data?
I need the data here so that I can analyse the motif using the rGADEM package
I am a medical student and extremely new to R and bioconductor
my entire set of genes (Entrez identifiers) which I need to analyse are
 [1]   1002  10233 114798 122481 126792 126820 128344 130827   1428 146845   1493 150483 150572
[14] 159989    183   1852 201625   2167  22824  22885  23676 254956 255101 257177  25992  26576
[27] 266629 283152 283726 285141  29895   3067 340286 340706   3860 387712 389125 389177   4617
[40]   4621   4625  51364   5144  51778   5212  54585  55815  56203  56849  56901  57494   6345
[53]  64102  64446 644890   6588   7042   7060   7138   7273   7322   7337    796  79933   8048
[66]   8091   8125  83450  83657  83894     88  89765   9172   9499
Like I said, some of these work perfectly, others don't.
Any help would be appreciated.
Thank you.

Ah that makes sense... thank you very much
I'll try that now.
I checked all my genes codes - only 1 mismatch though which I have already removed and it still didn't work.
all other codes does exist in that human genome file.
However, it won't work if I use any code with "value" higher than 23459
i.e. any code below that would work e.g. 23459 23458 1223 etc would work
but 23460 and above would not work e.g. 23460 23461 124234 would not work
Could you please tell me if there is any way I could correct this problem?
Thank you
That's most likely because you are subsetting your list using integer Entrez Gene IDs. But the names of your list aren't integers, they are character. In other words, if you do
you are saying 'give me the 124234th list item'. But there aren't 124,234 genes! There are like 22,000 or so. What you want to do is
which will give you the list item that has that Entrez Gene ID as its name.
oh wow!
I'm just stupid then I guess....
ok so I removed a single code that wasn't valid and then use as.character to convert everything in my vector as character and it's now up and running!!
Thank you so much!!