Issues about how to filter and annotate the MoGene-2_0-st and MoEx-1_0-st-v1 array probe sets
3
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Chao, Please don't take conversations off-list (e.g., use 'Reply-all' when responding). On 6/11/2014 5:02 PM, ?? wrote: > Hi Jim, > > Thanks a lot for your prompt reply and I really learned a lot from it. > But I don't understand what do you mean as you said below, > > >You will have to deal with those multiple mappings as you see fit. > But after doing so, you can then do > >>fit$genes <- gns > > >and your topTable object will then be populated with the annotations. > > My way to deal with those multiple mappings is to retain only the first > one probe set of the multiple mappings(i.e. gns_rm_multmap).And I have > finished it. But what to do next and what is fit$genes? Could you please > explain it in more details? I have found a way to populated with the > annotations with "for,if else" commands, and it costs too much time for > my machine to run these commands. It seems that your way is much easier, > but I haven't got it. The MArrayLM object (your 'fit2' object below) has a slot for gene annotations, and as long as the annotation object is in the correct order you can just put it in there. So you have done something like this: fit <- lmFit(eset, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) And then if you annotate those genes gns <- select(hugene10sttranscriptcluster.db, row.names(fit2$coef), c("ENTREZID","SYMBOL","GENENAME")) and then remove duplicates like you said you did gns <- gns[duplicated(gns[,1]),] you can check first that things line up all.equal(row.names(fit2$coef), gns[,1]) and if that is TRUE (it should be), you can now put those data into your fit2 object fit2$genes <- gns and then if you do topTable(fit2, 1) you will have all the annotation in the result as well. And then you can do something like idx <- HTMLReport("index", "Look at these results!") publish(fit2, idx, eset, coef=1, pvalueCutoff = 0.05) finish(idx) and you will have an index.html file that you can open to look at your results. See the ReportingTools vignettes for more information. > > In addition, as you said, you would not recommend summarizing the oligo > data at the probeset level. However if rma(target = "core") is used, how > to use paCalls, and if paCalls("DABG") is used, how to use rma to make > them to be consistent? I have no idea. I don't use paCalls(). Best, Jim > > Looking forward to your reply. Many thanks in advance.. > > Best Regards > > Chao > ? > > > > > > > > > At 2014-06-10 10:26:47, "James W. MacDonald" <jmacdon at="" uw.edu=""> wrote: >>Hi Chao, >> >>On 6/8/2014 10:37 AM, ?? wrote: >>> Dear list, >>> >>> >>> I would like to use the paCalls from oligo package for filtering probe sets with absence of transcripts. My data are from MoGene-2_0-st and MoEx-1_0-st-v1 array (Affymetrix). My data after reading CEL files is a GeneFeatureSet with 12 samples (6 for control groups, and 6 for experimental groups). What should I do with these data computed by paCalls(PSDABG) as below ? >>>> library(oligo) >>>> OligoRawData<-read.celfiles(CEL file lists) >>>> eset<-rma(OligoRawData) >>>> dagbPS <- paCalls(OligoRawData, "PSDABG") >>> What to do next to filter the probe sets? Could you please send me a complete examples and a detailed explanation for it? >>> >> >>You need to decide what constitutes 'present' and how many samples have >>to be present in order to keep the probeset. >> >>So if I were to say that a p < 0.05 is present and I needed 20 such >>samples, I could do >> >>keep <- rowSums(dagbPS < 0.05) > 19 >>eset <- eset[keep,] >> >>If the above code is mysterious to you, then you need to read 'An >>Introduction to R'. >> >>> >>> In addition, moex10sttranscriptcluster.db can be used for annotation of data from MoEx-1_0-st-v1 array, and both of mogene20stprobeset.db and mogene20sttranscriptcluster.db can be used for that of data from MoGene-2_0-st (including both of gene and lncRNA lists). But only more than half of the probe sets are anotated with gene symbols by below commands. >>>> results<-decideTests(fit2, method="global", adjust.method="fdr", p.value=0.05, lfc=0.5) #DEGs determination by t tests >>>> genesymbol = getText(aafSymbol(rownames(results), "moex10sttranscriptcluster.db" ));#annotated by moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array >>> Only 1217 and 24709 can be annotated by mogene20stprobeset.db and mogene20sttranscriptcluster.db seperately for data of MoGene-2_0-st (length(genesymbol[which(genesymbol!="")])). But the total num is 41345 (length(results)). Only 14966 can be mapped by moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num is 23332 - length(results)). Should I need to add some more db for the annotation? >>> >> >>The annotation packages with 'transcriptcluster' in their names are for >>instances where you have summarized probesets at the transcript level >>(which is the default for rma() in oligo). If you want to summarize at >>the probeset level (which I would not recommend doing, btw), you need to >>use target = "probeset" in your call to rma(). >> >>In other words, you should only be using the transcriptcluster >>annotation packages. Although please note that the >>moex10transcriptcluster.db package is for the Mouse Exon 10 ST array, >>not the Gene ST array. >> >>There are any number of reasons that only a subset of probesets on the >>array have symbols. First, there are lots of controls, which won't have >>gene symbols. Second, the lincRNA/snoRNA/miRNA probesets that Affy put >>on these array won't have gene symbols either (because, they aren't >>genes). Third, there is still some speculative content on these arrays; >>things that might end up being genes, with gene names, in the future, >>but which are just hypothetical at this point in time. Fourth, the >>annaffy package uses the old style methods of getting annotations, in >>which case any probeset that matches more than one gene symbol will be >>masked. >> >>You will be much better served if you were to do something like >> >>gns <- select(mogene10sttranscriptcluster.db, featureNames(eset), >>c("ENTREZID","SYMBOL","GENENAME")) >> >>Which will result in a warning that you have multiple mappings. You will >>have to deal with those multiple mappings as you see fit. But after >>doing so, you can then do >> >>fit$genes <- gns >> >>and your topTable object will then be populated with the annotations. >>You might then consider using the ReportingTools package, which is under >>active development and maintenance, rather than the annaffy package >>which may still be actively maintained, but is no longer AFAICT under >>active development. >> >>Best, >> >>Jim >> >> >> >>> >>> BTW, I am a beginner of this field. I found there are too few documents for examples about how to use functions of oligo package. Could you please also give me some suggestions? Looking forword to your reply. I really appreciate for your any helps. >>> >>> >>> Thanks again. >>> >>> >>> Best regards. >>> >>> >>> Chao >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >>-- >>James W. MacDonald, M.S. >>Biostatistician >>University of Washington >>Environmental and Occupational Health Sciences >>4225 Roosevelt Way NE, # 100 >>Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Chao, One more thing; see below. On 6/12/2014 10:55 AM, James W. MacDonald wrote: > Hi Chao, > > Please don't take conversations off-list (e.g., use 'Reply-all' when > responding). > > On 6/11/2014 5:02 PM, ?? wrote: >> Hi Jim, >> >> Thanks a lot for your prompt reply and I really learned a lot from it. >> But I don't understand what do you mean as you said below, >> >> >You will have to deal with those multiple mappings as you see fit. >> But after doing so, you can then do >> >>> fit$genes <- gns >> >> >and your topTable object will then be populated with the annotations. >> >> My way to deal with those multiple mappings is to retain only the first >> one probe set of the multiple mappings(i.e. gns_rm_multmap).And I have >> finished it. But what to do next and what is fit$genes? Could you please >> explain it in more details? I have found a way to populated with the >> annotations with "for,if else" commands, and it costs too much time for >> my machine to run these commands. It seems that your way is much easier, >> but I haven't got it. > > The MArrayLM object (your 'fit2' object below) has a slot for gene > annotations, and as long as the annotation object is in the correct > order you can just put it in there. So you have done something like this: > > fit <- lmFit(eset, design) > fit2 <- contrasts.fit(fit, contrast) > fit2 <- eBayes(fit2) > > And then if you annotate those genes > > gns <- select(hugene10sttranscriptcluster.db, row.names(fit2$coef), > c("ENTREZID","SYMBOL","GENENAME")) > > and then remove duplicates like you said you did > > gns <- gns[duplicated(gns[,1]),] > > you can check first that things line up > > all.equal(row.names(fit2$coef), gns[,1]) > > and if that is TRUE (it should be), you can now put those data into your > fit2 object > > fit2$genes <- gns > > and then if you do > > topTable(fit2, 1) > > you will have all the annotation in the result as well. > > And then you can do something like > > idx <- HTMLReport("index", "Look at these results!") > publish(fit2, idx, eset, coef=1, pvalueCutoff = 0.05) > finish(idx) > > and you will have an index.html file that you can open to look at your > results. See the ReportingTools vignettes for more information. I should note that ReportingTools does the annotation of your data separately (last time I checked), so you don't necessarily have to annotate things by hand. However, the code used in ReportingTools is based on 'old school' methods from the annotate package that by default will not annotate any probeset that has a one-to-many mapping. Because of this, I tend to do things a different way, using some functions in my affycoretools package: idx <- HTMLReport("index", "Woah, nice data!") tab <- makeImages(topTable(fit2, 1), eset, , design, contrast, 1) publish(tab, idx, .modifyDF = list(entrezLinks, affyLinks)) finish(idx) Best, Jim > >> >> In addition, as you said, you would not recommend summarizing the oligo >> data at the probeset level. However if rma(target = "core") is used, how >> to use paCalls, and if paCalls("DABG") is used, how to use rma to make >> them to be consistent? > > I have no idea. I don't use paCalls(). > > Best, > > Jim > > >> >> Looking forward to your reply. Many thanks in advance.. >> >> Best Regards >> >> Chao >> ? >> >> >> >> >> >> >> >> >> At 2014-06-10 10:26:47, "James W. MacDonald" <jmacdon at="" uw.edu=""> wrote: >>> Hi Chao, >>> >>> On 6/8/2014 10:37 AM, ?? wrote: >>>> Dear list, >>>> >>>> >>>> I would like to use the paCalls from oligo package for filtering >>>> probe sets with absence of transcripts. My data are from >>>> MoGene-2_0-st and MoEx-1_0-st-v1 array (Affymetrix). My data after >>>> reading CEL files is a GeneFeatureSet with 12 samples (6 for control >>>> groups, and 6 for experimental groups). What should I do with these >>>> data computed by paCalls(PSDABG) as below ? >>>>> library(oligo) >>>>> OligoRawData<-read.celfiles(CEL file lists) >>>>> eset<-rma(OligoRawData) >>>>> dagbPS <- paCalls(OligoRawData, "PSDABG") >>>> What to do next to filter the probe sets? Could you please send me a >>>> complete examples and a detailed explanation for it? >>>> >>> >>> You need to decide what constitutes 'present' and how many samples have >>> to be present in order to keep the probeset. >>> >>> So if I were to say that a p < 0.05 is present and I needed 20 such >>> samples, I could do >>> >>> keep <- rowSums(dagbPS < 0.05) > 19 >>> eset <- eset[keep,] >>> >>> If the above code is mysterious to you, then you need to read 'An >>> Introduction to R'. >>> >>>> >>>> In addition, moex10sttranscriptcluster.db can be used for annotation >>>> of data from MoEx-1_0-st-v1 array, and both of mogene20stprobeset.db >>>> and mogene20sttranscriptcluster.db can be used for that of data from >>>> MoGene-2_0-st (including both of gene and lncRNA lists). But only >>>> more than half of the probe sets are anotated with gene symbols by >>>> below commands. >>>>> results<-decideTests(fit2, method="global", adjust.method="fdr", >>>>> p.value=0.05, lfc=0.5) #DEGs determination by t tests >>>>> genesymbol = getText(aafSymbol(rownames(results), >>>>> "moex10sttranscriptcluster.db" ));#annotated by >>>>> moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array >>>> Only 1217 and 24709 can be annotated by mogene20stprobeset.db and >>>> mogene20sttranscriptcluster.db seperately for data of MoGene- 2_0-st >>>> (length(genesymbol[which(genesymbol!="")])). But the total num is >>>> 41345 (length(results)). Only 14966 can be mapped by >>>> moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num >>>> is 23332 - length(results)). Should I need to add some more db for >>>> the annotation? >>>> >>> >>> The annotation packages with 'transcriptcluster' in their names are for >>> instances where you have summarized probesets at the transcript level >>> (which is the default for rma() in oligo). If you want to summarize at >>> the probeset level (which I would not recommend doing, btw), you need to >>> use target = "probeset" in your call to rma(). >>> >>> In other words, you should only be using the transcriptcluster >>> annotation packages. Although please note that the >>> moex10transcriptcluster.db package is for the Mouse Exon 10 ST array, >>> not the Gene ST array. >>> >>> There are any number of reasons that only a subset of probesets on the >>> array have symbols. First, there are lots of controls, which won't have >>> gene symbols. Second, the lincRNA/snoRNA/miRNA probesets that Affy put >>> on these array won't have gene symbols either (because, they aren't >>> genes). Third, there is still some speculative content on these arrays; >>> things that might end up being genes, with gene names, in the future, >>> but which are just hypothetical at this point in time. Fourth, the >>> annaffy package uses the old style methods of getting annotations, in >>> which case any probeset that matches more than one gene symbol will be >>> masked. >>> >>> You will be much better served if you were to do something like >>> >>> gns <- select(mogene10sttranscriptcluster.db, featureNames(eset), >>> c("ENTREZID","SYMBOL","GENENAME")) >>> >>> Which will result in a warning that you have multiple mappings. You will >>> have to deal with those multiple mappings as you see fit. But after >>> doing so, you can then do >>> >>> fit$genes <- gns >>> >>> and your topTable object will then be populated with the annotations. >>> You might then consider using the ReportingTools package, which is under >>> active development and maintenance, rather than the annaffy package >>> which may still be actively maintained, but is no longer AFAICT under >>> active development. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>>> >>>> BTW, I am a beginner of this field. I found there are too few >>>> documents for examples about how to use functions of oligo package. >>>> Could you please also give me some suggestions? Looking forword to >>>> your reply. I really appreciate for your any helps. >>>> >>>> >>>> Thanks again. >>>> >>>> >>>> Best regards. >>>> >>>> >>>> Chao >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >> >> >> > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
0
Entering edit mode
Hi Jim, Thanks a lot for your great help. You are really a nice man and good teature. Best Chao At 2014-06-12 11:06:45, "James W. MacDonald" <jmacdon@uw.edu> wrote: >Hi Chao, > >One more thing; see below. > >On 6/12/2014 10:55 AM, James W. MacDonald wrote: >> Hi Chao, >> >> Please don't take conversations off-list (e.g., use 'Reply-all' when >> responding). >> >> On 6/11/2014 5:02 PM, å¼ è¶ wrote: >>> Hi Jim, >>> >>> Thanks a lot for your prompt reply and I really learned a lot from it. >>> But I don't understand what do you mean as you said below, >>> >>> >You will have to deal with those multiple mappings as you see fit. >>> But after doing so, you can then do >>> >>>> fit$genes <- gns >>> >>> >and your topTable object will then be populated with the annotations. >>> >>> My way to deal with those multiple mappings is to retain only the first >>> one probe set of the multiple mappings(i.e. gns_rm_multmap).And I have >>> finished it. But what to do next and what is fit$genes? Could you please >>> explain it in more details? I have found a way to populated with the >>> annotations with "for,if else" commands, and it costs too much time for >>> my machine to run these commands. It seems that your way is much easier, >>> but I haven't got it. >> >> The MArrayLM object (your 'fit2' object below) has a slot for gene >> annotations, and as long as the annotation object is in the correct >> order you can just put it in there. So you have done something like this: >> >> fit <- lmFit(eset, design) >> fit2 <- contrasts.fit(fit, contrast) >> fit2 <- eBayes(fit2) >> >> And then if you annotate those genes >> >> gns <- select(hugene10sttranscriptcluster.db, row.names(fit2$coef), >> c("ENTREZID","SYMBOL","GENENAME")) >> >> and then remove duplicates like you said you did >> >> gns <- gns[duplicated(gns[,1]),] >> >> you can check first that things line up >> >> all.equal(row.names(fit2$coef), gns[,1]) >> >> and if that is TRUE (it should be), you can now put those data into your >> fit2 object >> >> fit2$genes <- gns >> >> and then if you do >> >> topTable(fit2, 1) >> >> you will have all the annotation in the result as well. >> >> And then you can do something like >> >> idx <- HTMLReport("index", "Look at these results!") >> publish(fit2, idx, eset, coef=1, pvalueCutoff = 0.05) >> finish(idx) >> >> and you will have an index.html file that you can open to look at your >> results. See the ReportingTools vignettes for more information. > >I should note that ReportingTools does the annotation of your data >separately (last time I checked), so you don't necessarily have to >annotate things by hand. However, the code used in ReportingTools is >based on 'old school' methods from the annotate package that by default >will not annotate any probeset that has a one-to-many mapping. > >Because of this, I tend to do things a different way, using some >functions in my affycoretools package: > >idx <- HTMLReport("index", "Woah, nice data!") >tab <- makeImages(topTable(fit2, 1), eset, here>, design, contrast, 1) >publish(tab, idx, .modifyDF = list(entrezLinks, affyLinks)) >finish(idx) > >Best, > >Jim > > >> >>> >>> In addition, as you said, you would not recommend summarizing the oligo >>> data at the probeset level. However if rma(target = "core") is used, how >>> to use paCalls, and if paCalls("DABG") is used, how to use rma to make >>> them to be consistent? >> >> I have no idea. I don't use paCalls(). >> >> Best, >> >> Jim >> >> >>> >>> Looking forward to your reply. Many thanks in advance.. >>> >>> Best Regards >>> >>> Chao >>> â >>> >>> >>> >>> >>> >>> >>> >>> >>> At 2014-06-10 10:26:47, "James W. MacDonald" <jmacdon@uw.edu> wrote: >>>> Hi Chao, >>>> >>>> On 6/8/2014 10:37 AM, å¼ è¶ wrote: >>>>> Dear list, >>>>> >>>>> >>>>> I would like to use the paCalls from oligo package for filtering >>>>> probe sets with absence of transcripts. My data are from >>>>> MoGene-2_0-st and MoEx-1_0-st-v1 array (Affymetrix). My data after >>>>> reading CEL files is a GeneFeatureSet with 12 samples (6 for control >>>>> groups, and 6 for experimental groups). What should I do with these >>>>> data computed by paCalls(PSDABG) as below ? >>>>>> library(oligo) >>>>>> OligoRawData<-read.celfiles(CEL file lists) >>>>>> eset<-rma(OligoRawData) >>>>>> dagbPS <- paCalls(OligoRawData, "PSDABG") >>>>> What to do next to filter the probe sets? Could you please send me a >>>>> complete examples and a detailed explanation for it? >>>>> >>>> >>>> You need to decide what constitutes 'present' and how many samples have >>>> to be present in order to keep the probeset. >>>> >>>> So if I were to say that a p < 0.05 is present and I needed 20 such >>>> samples, I could do >>>> >>>> keep <- rowSums(dagbPS < 0.05) > 19 >>>> eset <- eset[keep,] >>>> >>>> If the above code is mysterious to you, then you need to read 'An >>>> Introduction to R'. >>>> >>>>> >>>>> In addition, moex10sttranscriptcluster.db can be used for annotation >>>>> of data from MoEx-1_0-st-v1 array, and both of mogene20stprobeset.db >>>>> and mogene20sttranscriptcluster.db can be used for that of data from >>>>> MoGene-2_0-st (including both of gene and lncRNA lists). But only >>>>> more than half of the probe sets are anotated with gene symbols by >>>>> below commands. >>>>>> results<-decideTests(fit2, method="global", adjust.method="fdr", >>>>>> p.value=0.05, lfc=0.5) #DEGs determination by t tests >>>>>> genesymbol = getText(aafSymbol(rownames(results), >>>>>> "moex10sttranscriptcluster.db" ));#annotated by >>>>>> moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array >>>>> Only 1217 and 24709 can be annotated by mogene20stprobeset.db and >>>>> mogene20sttranscriptcluster.db seperately for data of MoGene- 2_0-st >>>>> (length(genesymbol[which(genesymbol!="")])). But the total num is >>>>> 41345 (length(results)). Only 14966 can be mapped by >>>>> moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num >>>>> is 23332 - length(results)). Should I need to add some more db for >>>>> the annotation? >>>>> >>>> >>>> The annotation packages with 'transcriptcluster' in their names are for >>>> instances where you have summarized probesets at the transcript level >>>> (which is the default for rma() in oligo). If you want to summarize at >>>> the probeset level (which I would not recommend doing, btw), you need to >>>> use target = "probeset" in your call to rma(). >>>> >>>> In other words, you should only be using the transcriptcluster >>>> annotation packages. Although please note that the >>>> moex10transcriptcluster.db package is for the Mouse Exon 10 ST array, >>>> not the Gene ST array. >>>> >>>> There are any number of reasons that only a subset of probesets on the >>>> array have symbols. First, there are lots of controls, which won't have >>>> gene symbols. Second, the lincRNA/snoRNA/miRNA probesets that Affy put >>>> on these array won't have gene symbols either (because, they aren't >>>> genes). Third, there is still some speculative content on these arrays; >>>> things that might end up being genes, with gene names, in the future, >>>> but which are just hypothetical at this point in time. Fourth, the >>>> annaffy package uses the old style methods of getting annotations, in >>>> which case any probeset that matches more than one gene symbol will be >>>> masked. >>>> >>>> You will be much better served if you were to do something like >>>> >>>> gns <- select(mogene10sttranscriptcluster.db, featureNames(eset), >>>> c("ENTREZID","SYMBOL","GENENAME")) >>>> >>>> Which will result in a warning that you have multiple mappings. You will >>>> have to deal with those multiple mappings as you see fit. But after >>>> doing so, you can then do >>>> >>>> fit$genes <- gns >>>> >>>> and your topTable object will then be populated with the annotations. >>>> You might then consider using the ReportingTools package, which is under >>>> active development and maintenance, rather than the annaffy package >>>> which may still be actively maintained, but is no longer AFAICT under >>>> active development. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>>> >>>>> BTW, I am a beginner of this field. I found there are too few >>>>> documents for examples about how to use functions of oligo package. >>>>> Could you please also give me some suggestions? Looking forword to >>>>> your reply. I really appreciate for your any helps. >>>>> >>>>> >>>>> Thanks again. >>>>> >>>>> >>>>> Best regards. >>>>> >>>>> >>>>> Chao >>>>> >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>> >>> >>> >> > >-- >James W. MacDonald, M.S. >Biostatistician >University of Washington >Environmental and Occupational Health Sciences >4225 Roosevelt Way NE, # 100 >Seattle WA 98105-6099 [[alternative HTML version deleted]]
0
Entering edit mode

@-6595
Last seen 8.1 years ago
Hi Jim, Following your suggestions, I have resolved other issues. But there is some issue when I use HTMLReport to sholw the results. It shows warning that no such files listed as below, .\library\ReportingTools\extdata\csslib\imagesglyphicons-halflings- white.png .\library\ReportingTools\extdata\csslib\imagesglyphicons-halflings.png .\library\ReportingTools\extdata\csslib\imagessort_asc.png .\library\ReportingTools\extdata\csslib\imagessort_asc_disabled.png .\library\ReportingTools\extdata\csslib\imagessort_both.png .\library\ReportingTools\extdata\csslib\imagessort_desc.png .\library\ReportingTools\extdata\csslib\imagessort_desc_disabled.png I can not find these files even in the ReportingTools_2.4.0.zip package under temp directory. And when I run the next command, > publish(fit2, idx, eset2, coef=1, pvalueCutoff = 0.05) Error in .make.gene.plots(df, eSet, factor, figure.directory, ...) : Can't find expression data for some features In addition, when I run another command, > library(affycoretools ) > tab <- makeImages(topTable(fit2, 1), eset, factor(c(1,1,1,1,1,1,2,2,2,2,2,2)), design, contrast.matrix, 1) Error in png(minipng.file, height = 40, width = 200) : can not launch png() device warningï¼ 1: In png(minipng.file, height = 40, width = 200) : can not open 'reports/control_-_SRH/mini.17484091.png' text writing 2: In png(minipng.file, height = 40, width = 200) : opening device failed Is it a bug for HTMLReport? Thanks in advance for your help. Best Chao At 2014-06-12 10:55:59, "James W. MacDonald" <jmacdon@uw.edu> wrote: >Hi Chao, > >Please don't take conversations off-list (e.g., use 'Reply-all' when >responding). > >On 6/11/2014 5:02 PM, å¼ è¶ wrote: >> Hi Jim, >> >> Thanks a lot for your prompt reply and I really learned a lot from it. >> But I don't understand what do you mean as you said below, >> >> >You will have to deal with those multiple mappings as you see fit. >> But after doing so, you can then do >> >>>fit$genes <- gns >> >> >and your topTable object will then be populated with the annotations. >> >> My way to deal with those multiple mappings is to retain only the first >> one probe set of the multiple mappings(i.e. gns_rm_multmap).And I have >> finished it. But what to do next and what is fit$genes? Could you please >> explain it in more details? I have found a way to populated with the >> annotations with "for,if else" commands, and it costs too much time for >> my machine to run these commands. It seems that your way is much easier, >> but I haven't got it. > >The MArrayLM object (your 'fit2' object below) has a slot for gene >annotations, and as long as the annotation object is in the correct >order you can just put it in there. So you have done something like this: > >fit <- lmFit(eset, design) >fit2 <- contrasts.fit(fit, contrast) >fit2 <- eBayes(fit2) > >And then if you annotate those genes > >gns <- select(hugene10sttranscriptcluster.db, row.names(fit2$coef), >c("ENTREZID","SYMBOL","GENENAME")) > >and then remove duplicates like you said you did > >gns <- gns[duplicated(gns[,1]),] > >you can check first that things line up > >all.equal(row.names(fit2$coef), gns[,1]) > >and if that is TRUE (it should be), you can now put those data into your >fit2 object > >fit2$genes <- gns > >and then if you do > >topTable(fit2, 1) > >you will have all the annotation in the result as well. > >And then you can do something like > >idx <- HTMLReport("index", "Look at these results!") >publish(fit2, idx, eset, coef=1, pvalueCutoff = 0.05) >finish(idx) > >and you will have an index.html file that you can open to look at your >results. See the ReportingTools vignettes for more information. > >> >> In addition, as you said, you would not recommend summarizing the oligo >> data at the probeset level. However if rma(target = "core") is used, how >> to use paCalls, and if paCalls("DABG") is used, how to use rma to make >> them to be consistent? > >I have no idea. I don't use paCalls(). > >Best, > >Jim > > >> >> Looking forward to your reply. Many thanks in advance.. >> >> Best Regards >> >> Chao >> â >> >> >> >> >> >> >> >> >> At 2014-06-10 10:26:47, "James W. MacDonald" <jmacdon@uw.edu> wrote: >>>Hi Chao, >>> >>>On 6/8/2014 10:37 AM, å¼ è¶ wrote: >>>> Dear list, >>>> >>>> >>>> I would like to use the paCalls from oligo package for filtering probe sets with absence of transcripts. My data are from MoGene-2_0-st and MoEx-1_0-st-v1 array (Affymetrix). My data after reading CEL files is a GeneFeatureSet with 12 samples (6 for control groups, and 6 for experimental groups). What should I do with these data computed by paCalls(PSDABG) as below ? >>>>> library(oligo) >>>>> OligoRawData<-read.celfiles(CEL file lists) >>>>> eset<-rma(OligoRawData) >>>>> dagbPS <- paCalls(OligoRawData, "PSDABG") >>>> What to do next to filter the probe sets? Could you please send me a complete examples and a detailed explanation for it? >>>> >>> >>>You need to decide what constitutes 'present' and how many samples have >>>to be present in order to keep the probeset. >>> >>>So if I were to say that a p < 0.05 is present and I needed 20 such >>>samples, I could do >>> >>>keep <- rowSums(dagbPS < 0.05) > 19 >>>eset <- eset[keep,] >>> >>>If the above code is mysterious to you, then you need to read 'An >>>Introduction to R'. >>> >>>> >>>> In addition, moex10sttranscriptcluster.db can be used for annotation of data from MoEx-1_0-st-v1 array, and both of mogene20stprobeset.db and mogene20sttranscriptcluster.db can be used for that of data from MoGene-2_0-st (including both of gene and lncRNA lists). But only more than half of the probe sets are anotated with gene symbols by below commands. >>>>> results<-decideTests(fit2, method="global", adjust.method="fdr", p.value=0.05, lfc=0.5) #DEGs determination by t tests >>>>> genesymbol = getText(aafSymbol(rownames(results), "moex10sttranscriptcluster.db" ));#annotated by moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array >>>> Only 1217 and 24709 can be annotated by mogene20stprobeset.db and mogene20sttranscriptcluster.db seperately for data of MoGene-2_0-st (length(genesymbol[which(genesymbol!="")])). But the total num is 41345 (length(results)). Only 14966 can be mapped by moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num is 23332 - length(results)). Should I need to add some more db for the annotation? >>>> >>> >>>The annotation packages with 'transcriptcluster' in their names are for >>>instances where you have summarized probesets at the transcript level >>>(which is the default for rma() in oligo). If you want to summarize at >>>the probeset level (which I would not recommend doing, btw), you need to >>>use target = "probeset" in your call to rma(). >>> >>>In other words, you should only be using the transcriptcluster >>>annotation packages. Although please note that the >>>moex10transcriptcluster.db package is for the Mouse Exon 10 ST array, >>>not the Gene ST array. >>> >>>There are any number of reasons that only a subset of probesets on the >>>array have symbols. First, there are lots of controls, which won't have >>>gene symbols. Second, the lincRNA/snoRNA/miRNA probesets that Affy put >>>on these array won't have gene symbols either (because, they aren't >>>genes). Third, there is still some speculative content on these arrays; >>>things that might end up being genes, with gene names, in the future, >>>but which are just hypothetical at this point in time. Fourth, the >>>annaffy package uses the old style methods of getting annotations, in >>>which case any probeset that matches more than one gene symbol will be >>>masked. >>> >>>You will be much better served if you were to do something like >>> >>>gns <- select(mogene10sttranscriptcluster.db, featureNames(eset), >>>c("ENTREZID","SYMBOL","GENENAME")) >>> >>>Which will result in a warning that you have multiple mappings. You will >>>have to deal with those multiple mappings as you see fit. But after >>>doing so, you can then do >>> >>>fit$genes <- gns >>> >>>and your topTable object will then be populated with the annotations. >>>You might then consider using the ReportingTools package, which is under >>>active development and maintenance, rather than the annaffy package >>>which may still be actively maintained, but is no longer AFAICT under >>>active development. >>> >>>Best, >>> >>>Jim >>> >>> >>> >>>> >>>> BTW, I am a beginner of this field. I found there are too few documents for examples about how to use functions of oligo package. Could you please also give me some suggestions? Looking forword to your reply. I really appreciate for your any helps. >>>> >>>> >>>> Thanks again. >>>> >>>> >>>> Best regards. >>>> >>>> >>>> Chao >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>>-- >>>James W. MacDonald, M.S. >>>Biostatistician >>>University of Washington >>>Environmental and Occupational Health Sciences >>>4225 Roosevelt Way NE, # 100 >>>Seattle WA 98105-6099 >> >> >> > >-- >James W. MacDonald, M.S. >Biostatistician >University of Washington >Environmental and Occupational Health Sciences >4225 Roosevelt Way NE, # 100 >Seattle WA 98105-6099 [[alternative HTML version deleted]]
0
Entering edit mode
Jason Hackney ▴ 160
@jason-hackney-5882
Last seen 8.1 years ago
Hi Chao, You can always get the sessionInfo in an R session by running the sessionInfo() command. As for the ReportingTools error, I'm pretty sure that the png's were included in the package. I can see them by running $> unzip -l ReportingTools_2.4.0 The files are just misnamed in your error message: 8777 04-12-14 09:25 extdata/csslib/images/glyphicons-halflings-white.png 12799 04-12-14 09:25 extdata/csslib/images/glyphicons- halflings.png 253 04-12-14 09:25 extdata/csslib/images/sort_asc.png 1050 04-12-14 09:25 extdata/csslib/images/sort_asc_disabled.png 1136 04-12-14 09:25 extdata/csslib/images/sort_both.png 266 04-12-14 09:25 extdata/csslib/images/sort_desc.png 1045 04-12-14 09:25 extdata/csslib/images/sort_desc_disabled.png My guess is that this is a bug in ReportingTools on Windows, but I can't confirm just yet, as I lack a test machine. I'll look into this later this week. As for your other error. I'm not sure what's going on, but it looks like there's a mismatch between the rownames in your MArrayLM object (the result of calling eBayes) and the data.frame that ReportingTools is making from it. There are several reasons for this, but you might just check that R> all.equal(rownames(fit2), featureNames(eset)) If this is TRUE, then I might need to do a bit more thinking about this. Best, Jason On Thu, Jun 19, 2014 at 4:47 AM, å¼ è¶ <zhangchao0103@163.com> wrote: > Hi Jason, > > It is very nice for you to help resolve my issues. Thanks very much. > I have cought the session info as below. But it seems that when the > package was installed, these pngs were not included in the downloaded > ReportingTools_2.4.0.zip packages. I tried to reinstall this package to > catch the session info, but it showed that this package has been installed > and can not be installed again. I am a new beginner of R, so could you > please tell me what to do is helpful to you? Thanks again. > > > R version 3.1.0 (2014-04-10) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 > [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 > [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 > [4] LC_NUMERIC=C > [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ReportingTools_2.4.0 BiocInstaller_1.14.2 > [3] knitr_1.6 limma_3.20.5 > [5] moex10stprobeset.db_2.14.0 org.Mm.eg.db_2.14.0 > [7] AnnotationDbi_1.26.0 GenomeInfoDb_1.0.2 > [9] pd.moex.1.0.st.v1_3.8.0 RSQLite_0.11.4 > [11] DBI_0.2-7 oligo_1.28.2 > [13] Biostrings_2.32.0 XVector_0.4.0 > [15] IRanges_1.22.9 Biobase_2.24.0 > [17] oligoClasses_1.26.0 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] affxparser_1.36.0 affyio_1.32.0 > [3] annotate_1.42.0 AnnotationForge_1.6.1 > [5] BatchJobs_1.2 BBmisc_1.6 > [7] BiocParallel_0.6.1 biomaRt_2.20.0 > [9] biovizBase_1.12.1 bit_1.1-12 > [11] bitops_1.0-6 brew_1.0-6 > [13] BSgenome_1.32.0 Category_2.30.0 > [15] cluster_1.15.2 codetools_0.2-8 > [17] colorspace_1.2-4 DESeq2_1.4.5 > [19] dichromat_2.0-0 digest_0.6.4 > [21] edgeR_3.6.2 evaluate_0.5.5 > [23] fail_1.2 ff_2.2-13 > [25] foreach_1.4.2 formatR_0.10 > [27] Formula_1.1-1 genefilter_1.46.1 > [29] geneplotter_1.42.0 GenomicAlignments_1.0.1 > [31] GenomicFeatures_1.16.2 GenomicRanges_1.16.3 > [33] ggbio_1.12.4 ggplot2_1.0.0 > [35] GO.db_2.14.0 GOstats_2.30.0 > [37] graph_1.42.0 grid_3.1.0 > [39] gridExtra_0.9.1 GSEABase_1.26.0 > [41] gtable_0.1.2 Hmisc_3.14-4 > [43] hwriter_1.3 iterators_1.0.7 > [45] lattice_0.20-29 latticeExtra_0.6-26 > [47] locfit_1.5-9.1 MASS_7.3-33 > [49] Matrix_1.1-4 munsell_0.4.2 > [51] PFAM.db_2.14.0 plyr_1.8.1 > [53] preprocessCore_1.26.1 proto_0.3-10 > [55] R.methodsS3_1.6.1 R.oo_1.18.0 > [57] R.utils_1.32.4 RBGL_1.40.0 > [59] RColorBrewer_1.0-5 Rcpp_0.11.2 > [61] RcppArmadillo_0.4.300.8.0 RCurl_1.95-4.1 > [63] reshape2_1.4 Rsamtools_1.16.1 > [65] rtracklayer_1.24.2 scales_0.2.4 > [67] sendmailR_1.1-2 splines_3.1.0 > [69] stats4_3.1.0 stringr_0.6.2 > [71] survival_2.37-7 tools_3.1.0 > [73] VariantAnnotation_1.10.3 XML_3.98-1.1 > [75] xtable_1.7-3 zlibbioc_1.10.0 > > > Best > > Chao > > > > > > > > > -- Jason A. Hackney, Ph.D. Bioinformatics and Computational Biology Genentech hackney.jason@gene.com 650-467-5084 [[alternative HTML version deleted]] ADD COMMENT 0 Entering edit mode Hi Jason, Thanks a lot for your prompt reply. I will check my code as you said. Thanks again. Best, Chao At 2014-06-24 02:26:24, "Jason Hackney" <hackney.jason@gene.com> wrote: Hi Chao, You can always get the sessionInfo in an R session by running the sessionInfo() command. As for the ReportingTools error, I'm pretty sure that the png's were included in the package. I can see them by running$> unzip -l ReportingTools_2.4.0 The files are just misnamed in your error message: 8777 04-12-14 09:25 extdata/csslib/images/glyphicons- halflings-white.png 12799 04-12-14 09:25 extdata/csslib/images/glyphicons- halflings.png 253 04-12-14 09:25 extdata/csslib/images/sort_asc.png 1050 04-12-14 09:25 extdata/csslib/images/sort_asc_disabled.png 1136 04-12-14 09:25 extdata/csslib/images/sort_both.png 266 04-12-14 09:25 extdata/csslib/images/sort_desc.png 1045 04-12-14 09:25 extdata/csslib/images/sort_desc_disabled.png My guess is that this is a bug in ReportingTools on Windows, but I can't confirm just yet, as I lack a test machine. I'll look into this later this week. As for your other error. I'm not sure what's going on, but it looks like there's a mismatch between the rownames in your MArrayLM object (the result of calling eBayes) and the data.frame that ReportingTools is making from it. There are several reasons for this, but you might just check that R> all.equal(rownames(fit2), featureNames(eset)) If this is TRUE, then I might need to do a bit more thinking about this. Best, Jason On Thu, Jun 19, 2014 at 4:47 AM, ÕÅ³¬ <zhangchao0103@163.com> wrote: Hi Jason, It is very nice for you to help resolve my issues. Thanks very much. I have cought the session info as below. But it seems that when the package was installed, these pngs were not included in the downloaded ReportingTools_2.4.0.zip packages. I tried to reinstall this package to catch the session info, but it showed that this package has been installed and can not be installed again. I am a new beginner of R, so could you please tell me what to do is helpful to you? Thanks again. R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ReportingTools_2.4.0 BiocInstaller_1.14.2 [3] knitr_1.6 limma_3.20.5 [5] moex10stprobeset.db_2.14.0 org.Mm.eg.db_2.14.0 [7] AnnotationDbi_1.26.0 GenomeInfoDb_1.0.2 [9] pd.moex.1.0.st.v1_3.8.0 RSQLite_0.11.4 [11] DBI_0.2-7 oligo_1.28.2 [13] Biostrings_2.32.0 XVector_0.4.0 [15] IRanges_1.22.9 Biobase_2.24.0 [17] oligoClasses_1.26.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] affxparser_1.36.0 affyio_1.32.0 [3] annotate_1.42.0 AnnotationForge_1.6.1 [5] BatchJobs_1.2 BBmisc_1.6 [7] BiocParallel_0.6.1 biomaRt_2.20.0 [9] biovizBase_1.12.1 bit_1.1-12 [11] bitops_1.0-6 brew_1.0-6 [13] BSgenome_1.32.0 Category_2.30.0 [15] cluster_1.15.2 codetools_0.2-8 [17] colorspace_1.2-4 DESeq2_1.4.5 [19] dichromat_2.0-0 digest_0.6.4 [21] edgeR_3.6.2 evaluate_0.5.5 [23] fail_1.2 ff_2.2-13 [25] foreach_1.4.2 formatR_0.10 [27] Formula_1.1-1 genefilter_1.46.1 [29] geneplotter_1.42.0 GenomicAlignments_1.0.1 [31] GenomicFeatures_1.16.2 GenomicRanges_1.16.3 [33] ggbio_1.12.4 ggplot2_1.0.0 [35] GO.db_2.14.0 GOstats_2.30.0 [37] graph_1.42.0 grid_3.1.0 [39] gridExtra_0.9.1 GSEABase_1.26.0 [41] gtable_0.1.2 Hmisc_3.14-4 [43] hwriter_1.3 iterators_1.0.7 [45] lattice_0.20-29 latticeExtra_0.6-26 [47] locfit_1.5-9.1 MASS_7.3-33 [49] Matrix_1.1-4 munsell_0.4.2 [51] PFAM.db_2.14.0 plyr_1.8.1 [53] preprocessCore_1.26.1 proto_0.3-10 [55] R.methodsS3_1.6.1 R.oo_1.18.0 [57] R.utils_1.32.4 RBGL_1.40.0 [59] RColorBrewer_1.0-5 Rcpp_0.11.2 [61] RcppArmadillo_0.4.300.8.0 RCurl_1.95-4.1 [63] reshape2_1.4 Rsamtools_1.16.1 [65] rtracklayer_1.24.2 scales_0.2.4 [67] sendmailR_1.1-2 splines_3.1.0 [69] stats4_3.1.0 stringr_0.6.2 [71] survival_2.37-7 tools_3.1.0 [73] VariantAnnotation_1.10.3 XML_3.98-1.1 [75] xtable_1.7-3 zlibbioc_1.10.0 Best Chao -- Jason A. Hackney, Ph.D. Bioinformatics and Computational Biology Genentech hackney.jason@gene.com 650-467-5084 [[alternative HTML version deleted]]