I'm trying to analyze some Affymetrix Rat Clariom D array data, and from what I understand, it's basically the same as Rat Transcriptome Array. There are annotation packages in Bioconductor for both the human and mouse versions of this array platform (hta20sttranscriptcluster.db and mta10sttranscriptcluster.db), but nothing for the rat.
Are there any plans in creating an annotation library for the rat arrays?
I've also tried using pdInfoBuilder/oligo with the necessary files downloaded from Affymetrix, but encounter an error. I'm running R version 3.2.4 on OS X (10.11.6) with Bioconductor version 3.2.
> library(pdInfoBuilder) > p <- new("AffyHTAPDInfoPkgSeed", + clfFile = "RTA-1_0.r3.clf", + pgfFile = "RTA-1_0.r3.pgf", + coreMps = "RTA-1_0.r3.Psrs.mps", + transFile = "RTA-1_0.na36.1.rn6.transcript.csv", + probeFile = "RTA-1_0.na36.1.rn6.probeset.csv", + author = "", + email = "someone@gmail dot com", + genomebuild="rn6", + organism="Rattus norvegicus", + species="Rattus norvegicus", + version = "0.0.1") > makePdInfoPackage(p, unlink=TRUE) ========================================================================================================================================== Building annotation package for Affymetrix HTA Array PGF.........: RTA-1_0.r3.pgf CLF.........: RTA-1_0.r3.clf Probeset....: RTA-1_0.na36.1.rn6.probeset.csv Transcript..: RTA-1_0.na36.1.rn6.transcript.csv Core MPS....: RTA-1_0.r3.Psrs.mps ========================================================================================================================================== Parsing file: RTA-1_0.r3.pgf... OK Parsing file: RTA-1_0.r3.clf... OK Creating initial table for probes... OK Creating dictionaries... OK Parsing file: RTA-1_0.na36.1.rn6.probeset.csv... OK Parsing file: RTA-1_0.r3.Psrs.mps... OK Creating package in ./pd.rta.1.0 Inserting 2760 rows into table chrom_dict... OK Inserting 5 rows into table level_dict... OK Inserting 10 rows into table type_dict... OK Inserting 425813 rows into table core_mps... OK Inserting 726149 rows into table featureSet... OK Inserting 6048415 rows into table pmfeature... OK Counting rows in chrom_dict Counting rows in core_mps Counting rows in featureSet Counting rows in level_dict Counting rows in pmfeature Counting rows in type_dict Creating index idx_pmfsetid on pmfeature... OK Creating index idx_pmfid on pmfeature... OK Creating index idx_fsfsetid on featureSet... OK Creating index idx_core_meta_fsetid on core_mps... OK Creating index idx_core_fsetid on core_mps... OK Saving DataFrame object for PM. Saving NetAffx Annotation... Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': ‘’
Please don't post further questions using the 'Add your answer' box - a new question is by definition not an answer!
I'll take a look at this, but I doubt it will be this week.
So, I am the original question asker on the post you referenced. from having dealt with this (but for the human array) I was able to find a work around.
The problem with mine ended up being when you try to annotate the object as an expression set, you can not have repeat names. As this array has stuff that does not map to a gene you are going to get a fair amount of 'NA's.
I can not promise 100% that this is the right work around, as I am just as new to this chip as you, and possibly newer to micro array analysis in general, but this is what worked for me.
This is after you do RMA normalization. also, you will need to make a pData.txt file. where the first row is just the names of your samples (and the name needs to match the sample names as they are in your expression set). In my code I used ',' as my separator, but you an use a tab delineated file (I believe that is the default).
Also, the str-split code when applying gene names may need to change for you slightly (if you need it at all). The Clariom D Human array lists the gene id as a concatenation of the gene id, the full written out name and some other things. In my case the gene id i needed was always listed in the second location. While they may also have Entrez id's listed in some locations, i found that the netAffx fie for the Clariom D Human had not been updated since 2013, so i broke it into two steps so I would have more updated mappings of the Affyids.
If that is not your speed you can also look into extracting the sequence information (also contained in the netAffx, calling Var on the netAffx object should allow you to see all the data types contained) you could remap yourself, but that is not something I know how to do.