Question

Genes missing from mta10sttranscriptcluster.db?

0

Entering edit mode

al.ivens • 0

@alivens-8311

Last seen 9.7 years ago

United Kingdom

Hi,

Whilst looking at a dataset just now, I was searching for a couple of critical gene symbols, and couldnt find them.

grep("Foxp3",unlist(as.list(mta10sttranscriptclusterSYMBOL)))
integer(0)
grep("Foxp3",unlist(as.list(mta10sttranscriptclusterGENENAME)))
integer(0)

grep("Gata3",unlist(as.list(mta10sttranscriptclusterSYMBOL)))
integer(0)
grep("Gata3",unlist(as.list(mta10sttranscriptclusterGENENAME)))
integer(0)

I have the latest version for the mta10sttranscriptcluster.db package, as far as I am aware:

sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

...

other attached packages:
 [1] mta10sttranscriptcluster.db_8.3.1 org.Mm.eg.db_3.1.2               
 [3] pd.mta.1.0_3.12.0                 oligo_1.32.0                     
 [5] Biostrings_2.36.1                 XVector_0.8.0                    
 [7] oligoClasses_1.30.0               GOstats_2.34.0                   

...

To check they were actually on the array in the first place, I went to the Affymetrix www site and downloaded their annotation (MTA-1_0.na35.mm10.transcript.csv) and looked there:

egrep -m2 -i "gata3|foxp3" MTA-1_0.na35.mm10.transcript.csv | cut -d "," -f1,2,3,4,5,6,7,8
"TC0200002935.mm.1","TC0200002935.mm.1","chr2","-","9857078","9890034","248","NM_008091 // Gata3 ....
"TC0X00000058.mm.1","TC0X00000058.mm.1","chrX","+","7573600","7595243","241","NM_001199347 // Foxp3 ...

So, it seems that for some wierd reason, these loci didnt make it into the Bioc annotation package for this array! I havent checked for any other genes. Would it be possible to update the package please?

Many thanks, cheers!

Al

annotation • 629 views

ADD COMMENT • link updated 9.7 years ago by James W. MacDonald 68k • written 9.7 years ago by al.ivens • 0

score 0 · Answer 1 · 2015-07-01

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

> select(mta10sttranscriptcluster.db, c("Foxp3","Gata3"), c("SYMBOL","GENENAME","ENTREZID","PROBEID"), "SYMBOL")
  SYMBOL               GENENAME ENTREZID           PROBEID
1  Foxp3        forkhead box P3    20371 TC0X00000058.mm.1
2  Gata3 GATA binding protein 3    14462 TC0200002935.mm.1
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] mta10sttranscriptcluster.db_8.3.1 org.Mm.eg.db_3.1.2               
 [3] RSQLite_1.0.0                     DBI_0.3.1                        
 [5] AnnotationDbi_1.30.1              GenomeInfoDb_1.4.1               
 [7] IRanges_2.2.4                     S4Vectors_0.6.0                  
 [9] Biobase_2.28.0                    BiocGenerics_0.14.0              
[11] BiocInstaller_1.18.3             

loaded via a namespace (and not attached):
[1] tools_3.2.0
>

ADD COMMENT • link 9.7 years ago James W. MacDonald 68k

0

Entering edit mode

But is it surprising that they are not present in the traditional maps?

> mta10sttranscriptclusterSYMBOL[["TC0X00000058.mm.1"]]
[1] NA
> packageVersion("mta10sttranscriptcluster.db")
[1] '8.3.1'

ADD REPLY • link 9.7 years ago Martin Morgan 25k

0

Entering edit mode

Not really. The reason we (Marc Carlson and I, primarily) keep trying to shove people towards the new select() interface is that the old maps return NA for multi-mapping probes:

> mta10sttranscriptclusterSYMBOL[["TC0X00000058.mm.1"]]
[1] NA
> z <- toggleProbes(mta10sttranscriptclusterSYMBOL, "multiple")
> z[["TC0X00000058.mm.1"]]
[1] "Foxp3"     "Ppp1r3fos"
> select(mta10sttranscriptcluster.db, "TC0X00000058.mm.1", c("SYMBOL","ENTREZID","GENENAME"))
            PROBEID    SYMBOL ENTREZID
1 TC0X00000058.mm.1     Foxp3    20371
2 TC0X00000058.mm.1 Ppp1r3fos    78185
                                                       GENENAME
1                                               forkhead box P3
2 protein phosphatase 1, regulatory subunit 3F, opposite strand
>

Old habits die hard, and there is a wealth of easily Googleable web pages that will give naive users the idea that

unlist(as.list(mta10sttranscriptclusterSYMBOL))

is 'the way to go' rather than 'an archaic thing to do, left over for backwards compatiblity'.

ADD REPLY • link 9.7 years ago James W. MacDonald 68k