Issues about how to filter and annotate the MoGene-2_0-st and MoEx-1

Issues about how to filter and annotate the MoGene-2_0-st and MoEx-1_0-st-v1 array probe sets

0

Entering edit mode

张超 ▴ 50

@-6595

Last seen 9.6 years ago

Dear list, I would like to use the paCalls from oligo package for filtering probe sets with absence of transcripts. My data are from MoGene-2_0-st and MoEx-1_0-st-v1 array (Affymetrix). My data after reading CEL files is a GeneFeatureSet with 12 samples (6 for control groups, and 6 for experimental groups). What should I do with these data computed by paCalls(PSDABG) as below ? > library(oligo) > OligoRawData<-read.celfiles(CEL file lists) > eset<-rma(OligoRawData) > dagbPS <- paCalls(OligoRawData, "PSDABG") What to do next to filter the probe sets? Could you please send me a complete examples and a detailed explanation for it? In addition, moex10sttranscriptcluster.db can be used for annotation of data from MoEx-1_0-st-v1 array, and both of mogene20stprobeset.db and mogene20sttranscriptcluster.db can be used for that of data from MoGene-2_0-st (including both of gene and lncRNA lists). But only more than half of the probe sets are anotated with gene symbols by below commands. > results<-decideTests(fit2, method="global", adjust.method="fdr", p.value=0.05, lfc=0.5) #DEGs determination by t tests > genesymbol = getText(aafSymbol(rownames(results), "moex10sttranscriptcluster.db" ));#annotated by moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array Only 1217 and 24709 can be annotated by mogene20stprobeset.db and mogene20sttranscriptcluster.db seperately for data of MoGene-2_0-st (length(genesymbol[which(genesymbol!="")])). But the total num is 41345 (length(results)). Only 14966 can be mapped by moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num is 23332 - length(results)). Should I need to add some more db for the annotation? BTW, I am a beginner of this field. I found there are too few documents for examples about how to use functions of oligo package. Could you please also give me some suggestions? Looking forword to your reply. I really appreciate for your any helps. Thanks again. Best regards. Chao [[alternative HTML version deleted]]

probe oligo probe oligo • 1.1k views

ADD COMMENT • link updated 9.9 years ago by James W. MacDonald 65k • written 9.9 years ago by 张超 ▴ 50

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

Hi Chao, On 6/8/2014 10:37 AM, ?? wrote: > Dear list, > > > I would like to use the paCalls from oligo package for filtering probe sets with absence of transcripts. My data are from MoGene-2_0-st and MoEx-1_0-st-v1 array (Affymetrix). My data after reading CEL files is a GeneFeatureSet with 12 samples (6 for control groups, and 6 for experimental groups). What should I do with these data computed by paCalls(PSDABG) as below ? >> library(oligo) >> OligoRawData<-read.celfiles(CEL file lists) >> eset<-rma(OligoRawData) >> dagbPS <- paCalls(OligoRawData, "PSDABG") > What to do next to filter the probe sets? Could you please send me a complete examples and a detailed explanation for it? > You need to decide what constitutes 'present' and how many samples have to be present in order to keep the probeset. So if I were to say that a p < 0.05 is present and I needed 20 such samples, I could do keep <- rowSums(dagbPS < 0.05) > 19 eset <- eset[keep,] If the above code is mysterious to you, then you need to read 'An Introduction to R'. > > In addition, moex10sttranscriptcluster.db can be used for annotation of data from MoEx-1_0-st-v1 array, and both of mogene20stprobeset.db and mogene20sttranscriptcluster.db can be used for that of data from MoGene-2_0-st (including both of gene and lncRNA lists). But only more than half of the probe sets are anotated with gene symbols by below commands. >> results<-decideTests(fit2, method="global", adjust.method="fdr", p.value=0.05, lfc=0.5) #DEGs determination by t tests >> genesymbol = getText(aafSymbol(rownames(results), "moex10sttranscriptcluster.db" ));#annotated by moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array > Only 1217 and 24709 can be annotated by mogene20stprobeset.db and mogene20sttranscriptcluster.db seperately for data of MoGene-2_0-st (length(genesymbol[which(genesymbol!="")])). But the total num is 41345 (length(results)). Only 14966 can be mapped by moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num is 23332 - length(results)). Should I need to add some more db for the annotation? > The annotation packages with 'transcriptcluster' in their names are for instances where you have summarized probesets at the transcript level (which is the default for rma() in oligo). If you want to summarize at the probeset level (which I would not recommend doing, btw), you need to use target = "probeset" in your call to rma(). In other words, you should only be using the transcriptcluster annotation packages. Although please note that the moex10transcriptcluster.db package is for the Mouse Exon 10 ST array, not the Gene ST array. There are any number of reasons that only a subset of probesets on the array have symbols. First, there are lots of controls, which won't have gene symbols. Second, the lincRNA/snoRNA/miRNA probesets that Affy put on these array won't have gene symbols either (because, they aren't genes). Third, there is still some speculative content on these arrays; things that might end up being genes, with gene names, in the future, but which are just hypothetical at this point in time. Fourth, the annaffy package uses the old style methods of getting annotations, in which case any probeset that matches more than one gene symbol will be masked. You will be much better served if you were to do something like gns <- select(mogene10sttranscriptcluster.db, featureNames(eset), c("ENTREZID","SYMBOL","GENENAME")) Which will result in a warning that you have multiple mappings. You will have to deal with those multiple mappings as you see fit. But after doing so, you can then do fit$genes <- gns and your topTable object will then be populated with the annotations. You might then consider using the ReportingTools package, which is under active development and maintenance, rather than the annaffy package which may still be actively maintained, but is no longer AFAICT under active development. Best, Jim > > BTW, I am a beginner of this field. I found there are too few documents for examples about how to use functions of oligo package. Could you please also give me some suggestions? Looking forword to your reply. I really appreciate for your any helps. > > > Thanks again. > > > Best regards. > > > Chao > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 9.9 years ago James W. MacDonald 65k

Login before adding your answer.