Question

Mystery transcript clusters on Mouse Gene 2.0 ST

0

Entering edit mode

knaxerova ▴ 10

@knaxerova-7541

Last seen 3.0 years ago

United States

Hi all,

I would like to learn more about certain "mystery" transcript clusters that pop up when I analyze Mouse Gene 2.0 ST data.

Brief summary: I use the oligo package to summarize at the transcript level, then I get rid of everything but the main array content with Jim MacDonald's suggestion:

load(paste0(path.package("pd.mogene.2.0.st"), "/extdata/netaffxTranscript.rda"))
annot <- pData(netaffxTranscript)
ind <- annot$category %in% "main"
esetfiltered <- eset[ind,]

Finally, I use Limma to do group comparisons. Some probe sets that come up in my differentially expressed genes are a puzzle to me though. One example would be "17290922". According to mogene20sttranscriptcluster.db, this probe set does not map to any accession numbers (none at all?). When I search for it in Affy's MoGene-2_0-st-v1.na33.mm10.transcript.csv file, it is not listed. NetAffx returns no hits, but instead directs me to the page of transcript cluster 17290921, where at the bottom of the page I am informed that 17290922 is a "related" exon probe set. So it seems to me that it should not even exist on my array. I checked in the annotation for MoGene 2.1, and on that chip, the 17290922 probe set seems to be an intronic control. But I am sure I am working with MoGene 2.0 (as my CEL files say so).

I am very curious to learn more, could somebody perhaps give a pointer? Thanks so much in advance.

Kamila

sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] limma_3.22.7 pd.mogene.2.0.st_2.14.1
[3] oligo_1.30.0 Biostrings_2.34.1
[5] XVector_0.6.0 oligoClasses_1.28.0
[7] mogene20sttranscriptcluster.db_8.2.0 org.Mm.eg.db_3.0.0
[9] RSQLite_1.0.0 DBI_0.3.1
[11] AnnotationDbi_1.28.2 GenomeInfoDb_1.2.4
[13] IRanges_2.0.1 S4Vectors_0.4.0
[15] Biobase_2.26.0 BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] affxparser_1.38.0 affyio_1.34.0 BiocInstaller_1.16.2 bit_1.1-12
[5] codetools_0.2-11 ff_2.2-13 foreach_1.4.2 GenomicRanges_1.18.4
[9] iterators_1.0.7 preprocessCore_1.28.0 splines_3.1.3 zlibbioc_1.12.0

mogene20sttranscriptcluster.db • 1.3k views

ADD COMMENT • link 9.1 years ago knaxerova ▴ 10

score 0 · Answer 1 · 2015-04-06

When you use the netaffxTranscript.rda file, you have to ensure that the probesets in the transcript file and the featureNames() for your ExpressionSet are in the same order. Otherwise you filter the wrong stuff.

In other words, if you are still seeing this probeset after filtering, you did something wrong.

> load(paste0(path.package("pd.mogene.2.0.st"), "/extdata/netaffxTranscript.rda"))
> annot <- pData(netaffxTranscript)
> annot[annot[,1] == "17290922", c(1,8,9,18)]
         transcriptclusterid geneassignment mrnaassignment       category
17290922            17290922           <NA>           <NA> normgene->exon

This is one of the probeset-level (or PSR level) probes that does 'double duty' on the MoGene 2.0 ST array. It has two probes that are collected into the 17290921 probeset, in which case it is intended to measure Zkscan3. It is also summarized separately (just the two probes, as probeset 17290922), as a normgene->exon probeset. Which you can use for, umm, stuff. Or something.

score 0 · Answer 2 · 2015-04-06

0