Genes missing from mta10sttranscriptcluster.db?
1
0
Entering edit mode
al.ivens • 0
@alivens-8311
Last seen 9.3 years ago
United Kingdom

Hi,

Whilst looking at a dataset just now, I was searching for a couple of critical gene symbols, and couldnt find them.

grep("Foxp3",unlist(as.list(mta10sttranscriptclusterSYMBOL)))
integer(0)
grep("Foxp3",unlist(as.list(mta10sttranscriptclusterGENENAME)))
integer(0)

grep("Gata3",unlist(as.list(mta10sttranscriptclusterSYMBOL)))
integer(0)
grep("Gata3",unlist(as.list(mta10sttranscriptclusterGENENAME)))
integer(0)

I have the latest version for the mta10sttranscriptcluster.db package, as far as I am aware:

sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

...

other attached packages:
 [1] mta10sttranscriptcluster.db_8.3.1 org.Mm.eg.db_3.1.2               
 [3] pd.mta.1.0_3.12.0                 oligo_1.32.0                     
 [5] Biostrings_2.36.1                 XVector_0.8.0                    
 [7] oligoClasses_1.30.0               GOstats_2.34.0                   

...

To check they were actually on the array in the first place, I went to the Affymetrix www site and downloaded their annotation (MTA-1_0.na35.mm10.transcript.csv) and looked there:

egrep -m2 -i "gata3|foxp3" MTA-1_0.na35.mm10.transcript.csv | cut -d "," -f1,2,3,4,5,6,7,8
"TC0200002935.mm.1","TC0200002935.mm.1","chr2","-","9857078","9890034","248","NM_008091 // Gata3 ....
"TC0X00000058.mm.1","TC0X00000058.mm.1","chrX","+","7573600","7595243","241","NM_001199347 // Foxp3 ...

So, it seems that for some wierd reason, these loci didnt make it into the Bioc annotation package for this array!  I havent checked for any other genes.  Would it be possible to update the package please?

Many thanks, cheers!

Al

 

annotation • 562 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States
> select(mta10sttranscriptcluster.db, c("Foxp3","Gata3"), c("SYMBOL","GENENAME","ENTREZID","PROBEID"), "SYMBOL")
  SYMBOL               GENENAME ENTREZID           PROBEID
1  Foxp3        forkhead box P3    20371 TC0X00000058.mm.1
2  Gata3 GATA binding protein 3    14462 TC0200002935.mm.1
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] mta10sttranscriptcluster.db_8.3.1 org.Mm.eg.db_3.1.2               
 [3] RSQLite_1.0.0                     DBI_0.3.1                        
 [5] AnnotationDbi_1.30.1              GenomeInfoDb_1.4.1               
 [7] IRanges_2.2.4                     S4Vectors_0.6.0                  
 [9] Biobase_2.28.0                    BiocGenerics_0.14.0              
[11] BiocInstaller_1.18.3             

loaded via a namespace (and not attached):
[1] tools_3.2.0
>

 

ADD COMMENT
0
Entering edit mode

But is it surprising that they are not present in the traditional maps?

> mta10sttranscriptclusterSYMBOL[["TC0X00000058.mm.1"]]
[1] NA
> packageVersion("mta10sttranscriptcluster.db")
[1] '8.3.1'

 

ADD REPLY
0
Entering edit mode

Not really. The reason we (Marc Carlson and I, primarily) keep trying to shove people towards the new select() interface is that the old maps return NA for multi-mapping probes:

> mta10sttranscriptclusterSYMBOL[["TC0X00000058.mm.1"]]
[1] NA
> z <- toggleProbes(mta10sttranscriptclusterSYMBOL, "multiple")
> z[["TC0X00000058.mm.1"]]
[1] "Foxp3"     "Ppp1r3fos"
> select(mta10sttranscriptcluster.db, "TC0X00000058.mm.1", c("SYMBOL","ENTREZID","GENENAME"))
            PROBEID    SYMBOL ENTREZID
1 TC0X00000058.mm.1     Foxp3    20371
2 TC0X00000058.mm.1 Ppp1r3fos    78185
                                                       GENENAME
1                                               forkhead box P3
2 protein phosphatase 1, regulatory subunit 3F, opposite strand
>

Old habits die hard, and there is a wealth of easily Googleable web pages that will give naive users the idea that

unlist(as.list(mta10sttranscriptclusterSYMBOL))

is 'the way to go' rather than 'an archaic thing to do, left over for backwards compatiblity'.

 

ADD REPLY

Login before adding your answer.

Traffic: 593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6