Entering edit mode
Hi Chao,
Please don't take conversations off-list (e.g., use 'Reply-all' when
responding).
On 6/11/2014 5:02 PM, ?? wrote:
> Hi Jim,
>
> Thanks a lot for your prompt reply and I really learned a lot from
it.
> But I don't understand what do you mean as you said below,
>
> >You will have to deal with those multiple mappings as you see
fit.
> But after doing so, you can then do
>
>>fit$genes <- gns
>
> >and your topTable object will then be populated with the
annotations.
>
> My way to deal with those multiple mappings is to retain only the
first
> one probe set of the multiple mappings(i.e. gns_rm_multmap).And I
have
> finished it. But what to do next and what is fit$genes? Could you
please
> explain it in more details? I have found a way to populated with the
> annotations with "for,if else" commands, and it costs too much time
for
> my machine to run these commands. It seems that your way is much
easier,
> but I haven't got it.
The MArrayLM object (your 'fit2' object below) has a slot for gene
annotations, and as long as the annotation object is in the correct
order you can just put it in there. So you have done something like
this:
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)
And then if you annotate those genes
gns <- select(hugene10sttranscriptcluster.db, row.names(fit2$coef),
c("ENTREZID","SYMBOL","GENENAME"))
and then remove duplicates like you said you did
gns <- gns[duplicated(gns[,1]),]
you can check first that things line up
all.equal(row.names(fit2$coef), gns[,1])
and if that is TRUE (it should be), you can now put those data into
your
fit2 object
fit2$genes <- gns
and then if you do
topTable(fit2, 1)
you will have all the annotation in the result as well.
And then you can do something like
idx <- HTMLReport("index", "Look at these results!")
publish(fit2, idx, eset, coef=1, pvalueCutoff = 0.05)
finish(idx)
and you will have an index.html file that you can open to look at your
results. See the ReportingTools vignettes for more information.
>
> In addition, as you said, you would not recommend summarizing the
oligo
> data at the probeset level. However if rma(target = "core") is used,
how
> to use paCalls, and if paCalls("DABG") is used, how to use rma to
make
> them to be consistent?
I have no idea. I don't use paCalls().
Best,
Jim
>
> Looking forward to your reply. Many thanks in advance..
>
> Best Regards
>
> Chao
> ?
>
>
>
>
>
>
>
>
> At 2014-06-10 10:26:47, "James W. MacDonald" <jmacdon at="" uw.edu="">
wrote:
>>Hi Chao,
>>
>>On 6/8/2014 10:37 AM, ?? wrote:
>>> Dear list,
>>>
>>>
>>> I would like to use the paCalls from oligo package for filtering
probe sets with absence of transcripts. My data are from MoGene-2_0-st
and MoEx-1_0-st-v1 array (Affymetrix). My data after reading CEL files
is a GeneFeatureSet with 12 samples (6 for control groups, and 6 for
experimental groups). What should I do with these data computed by
paCalls(PSDABG) as below ?
>>>> library(oligo)
>>>> OligoRawData<-read.celfiles(CEL file lists)
>>>> eset<-rma(OligoRawData)
>>>> dagbPS <- paCalls(OligoRawData, "PSDABG")
>>> What to do next to filter the probe sets? Could you please send me
a complete examples and a detailed explanation for it?
>>>
>>
>>You need to decide what constitutes 'present' and how many samples
have
>>to be present in order to keep the probeset.
>>
>>So if I were to say that a p < 0.05 is present and I needed 20 such
>>samples, I could do
>>
>>keep <- rowSums(dagbPS < 0.05) > 19
>>eset <- eset[keep,]
>>
>>If the above code is mysterious to you, then you need to read 'An
>>Introduction to R'.
>>
>>>
>>> In addition, moex10sttranscriptcluster.db can be used for
annotation of data from MoEx-1_0-st-v1 array, and both of
mogene20stprobeset.db and mogene20sttranscriptcluster.db can be used
for that of data from MoGene-2_0-st (including both of gene and lncRNA
lists). But only more than half of the probe sets are anotated with
gene symbols by below commands.
>>>> results<-decideTests(fit2, method="global", adjust.method="fdr",
p.value=0.05, lfc=0.5) #DEGs determination by t tests
>>>> genesymbol = getText(aafSymbol(rownames(results),
"moex10sttranscriptcluster.db" ));#annotated by
moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array
>>> Only 1217 and 24709 can be annotated by mogene20stprobeset.db and
mogene20sttranscriptcluster.db seperately for data of MoGene-2_0-st
(length(genesymbol[which(genesymbol!="")])). But the total num is
41345 (length(results)). Only 14966 can be mapped by
moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num is
23332 - length(results)). Should I need to add some more db for the
annotation?
>>>
>>
>>The annotation packages with 'transcriptcluster' in their names are
for
>>instances where you have summarized probesets at the transcript
level
>>(which is the default for rma() in oligo). If you want to summarize
at
>>the probeset level (which I would not recommend doing, btw), you
need to
>>use target = "probeset" in your call to rma().
>>
>>In other words, you should only be using the transcriptcluster
>>annotation packages. Although please note that the
>>moex10transcriptcluster.db package is for the Mouse Exon 10 ST
array,
>>not the Gene ST array.
>>
>>There are any number of reasons that only a subset of probesets on
the
>>array have symbols. First, there are lots of controls, which won't
have
>>gene symbols. Second, the lincRNA/snoRNA/miRNA probesets that Affy
put
>>on these array won't have gene symbols either (because, they aren't
>>genes). Third, there is still some speculative content on these
arrays;
>>things that might end up being genes, with gene names, in the
future,
>>but which are just hypothetical at this point in time. Fourth, the
>>annaffy package uses the old style methods of getting annotations,
in
>>which case any probeset that matches more than one gene symbol will
be
>>masked.
>>
>>You will be much better served if you were to do something like
>>
>>gns <- select(mogene10sttranscriptcluster.db, featureNames(eset),
>>c("ENTREZID","SYMBOL","GENENAME"))
>>
>>Which will result in a warning that you have multiple mappings. You
will
>>have to deal with those multiple mappings as you see fit. But after
>>doing so, you can then do
>>
>>fit$genes <- gns
>>
>>and your topTable object will then be populated with the
annotations.
>>You might then consider using the ReportingTools package, which is
under
>>active development and maintenance, rather than the annaffy package
>>which may still be actively maintained, but is no longer AFAICT
under
>>active development.
>>
>>Best,
>>
>>Jim
>>
>>
>>
>>>
>>> BTW, I am a beginner of this field. I found there are too few
documents for examples about how to use functions of oligo package.
Could you please also give me some suggestions? Looking forword to
your reply. I really appreciate for your any helps.
>>>
>>>
>>> Thanks again.
>>>
>>>
>>> Best regards.
>>>
>>>
>>> Chao
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>--
>>James W. MacDonald, M.S.
>>Biostatistician
>>University of Washington
>>Environmental and Occupational Health Sciences
>>4225 Roosevelt Way NE, # 100
>>Seattle WA 98105-6099
>
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099