Entering edit mode
Hi Jim,I fully understand your comments. I thought I made a Reply-all,
but I will pay more attention next time. Sorry for the inconvenience.
Maria
> Date: Fri, 29 Nov 2013 14:33:36 -0500
> From: jmacdon@uw.edu
> To: mmaqueda@live.com
> CC: Bioconductor@r-project.org
> Subject: Re: [BioC] Analysis of Affymetrix Human Gene 2.0 ST arrays
>
> Hi Maria,
>
> Please don't take messages off-list (e.g., use Reply-all). We like
to
> think of the list archives as a repository of information that
people
> can search, and if messages become private, that hampers the
usefulness
> of the archives.
>
>
> On 11/29/2013 1:38 PM, María Maqueda González wrote:
> > Hi Jim,
> > Many thanks for your quick and very comprehensive response.
> >
> > From your comments, I have one more question related:
> >
> > (1) I understand your comments about the intron control
transcripts,
> > but I do not fully understand the rescue transcript category that
I
> > have also obtained in my topTable transcripts.
>
> There are two things to think about here.
>
> First, there is the issue of statistical significance versus
biological
> significance. Note that the t-statistic is a fraction, and in the
> numerator you have the difference between the means of two groups,
and
> in the denominator you have the standard error of that difference.
The
> standard error is based on the intra-group variability. So if you
have a
> particular probeset and the intra-group variability for that
probeset is
> extremely small, then you can end up with a statistically
significant
> result even if the fold change isn't very large at all.
>
> The eBayes step is intended to protect against this to some extent,
by
> adjusting 'too small' standard errors towards the overall variance
> estimate, but protecting against something and completely
eliminating it
> are two different things. So it may be that the differences for
these
> controls aren't that great, and it is just happenstance that the
> intra-group variance is small enough to get statistical
significance.
> One thing you can do to protect against that sort of thing is to
filter
> out probesets that don't really change expression very much in any
> samples (or just use getMainProbes and nuke all these controls in
the
> first place, which is what I would do).
>
> Second, just because something shows up in a topTable, doesn't mean
it
> is actually differentially expressed. I don't know how you are
adjusting
> for multiple comparisons, but let's just assume you are using FDR.
If
> you then take the probesets with an FDR > 0.05, you are accepting
that
> up to 5% of the probesets in that list are false positives. In other
> words, 5% of the probesets in that table aren't really
differentially
> expressed, they just happen to have a large t-statistic by chance.
Thus,
> the rescue probeset(s) that you have might just be false positives.
>
> Best,
>
> Jim
>
> >
> > No need to send the function, but thanks in any case for offering.
> >
> > Regards,
> >
> > Maria
> >
> > > Date: Fri, 29 Nov 2013 09:04:20 -0500
> > > From: jmacdon@uw.edu
> > > To: guest@bioconductor.org
> > > CC: bioconductor@r-project.org; mmaqueda@live.com
> > > Subject: Re: [BioC] Analysis of Affymetrix Human Gene 2.0 ST
arrays
> > >
> > > Hi Maria,
> > >
> > >
> > > On 11/29/2013 6:18 AM, María Maqueda [guest] wrote:
> > > > Dear all,
> > > >
> > > > I am analyzing a set of Affymetrix Human Gene 2.0 ST arrays,
this
> > is my first time working with this type of arrays so I have a few
> > general questions. I would very much appreciate any advice you
could give.
> > > >
> > > > (1) I have obtained different lists of differentially
expressed
> > genes (using eBayes() from limma). In those lists, some control
> > transcripts are popping up (i.e normgene -> intron category among
> > other categories). I was not expecting this type of transcripts at
> > this point. In theory after normalization, no control transcripts
> > should appear, am I right? Have you experienced this?
> > > > I have read that one possibility is to use getMainProbes
before
> > topTable selection but I wonder if there could be something wrong
from
> > the beginning with my normalization process (I have used rma() â
> > transcript level - from oligo). What is your opinion?
> > >
> > > I don't think it has anything to do with the normalization.
Instead, I
> > > think it is a combination of poorly designed probes and highly
> > expressed
> > > genes for which there are sufficient unprocessed mRNA
transcripts that
> > > still have their introns intact (remember that the processing of
> > samples
> > > stops all enzymatic activity very quickly as a first step, so
any mRNA
> > > that is in the process of being transcribed, or is just
finishing
> > > transcription will likely still have introns).
> > >
> > > >
> > > > (2) This type of arrays also includes lincRNA transcripts and
I am
> > interested in considering them for my analysis. The thing is that
I am
> > using hugene20sttranscriptcluster.db for annotation and these
lincRNA
> > are not included. Would this library be able to handle them?
> > >
> > > Hypothetically yes, as of now not really. It doesn't seem like
that
> > many
> > > have been annotated with Entrez Gene IDs, and until that happens
they
> > > won't appear in the annotation packages. And even for those that
do
> > have
> > > Entrez Gene IDs, the information stops there - you go to NCBI
and it
> > > just says that the lincRNA is supposed to exist, but nothing
else.
> > >
> > > >
> > > > (3) I tried to make my own annotation package thru
makeDBPackage
> > based on .csv annotation file from Affy but I got an errorâ¦:
Error
> > in `[.data.frame`(csvFile, , GenBank IDName) : undefined columns
selected
> > > > I have already read in this mailing list that makeDBPackage
may
> > expect a HGU133plus2 annotation âstyleâ. Would the library
> > annotationForge be able to handle this?
> > >
> > > AnnotationForge cannot handle the csv files for these arrays
directly,
> > > as they are completely different from the old style 3'-biased
arrays
> > > like the hgu133plus2 that you mention. I have a function I can
give you
> > > to make the input file for the annotation package, but I don't
think it
> > > is worth it because it would be the function that I already used
to
> > make
> > > the annotation package you can get from BioC. So you could go
through
> > > all the effort to make something you can already get.
> > >
> > > But if you want it, I will send it to you.
> > >
> > > Best,
> > >
> > > Jim
> > >
> > >
> > > >
> > > >
> > > > Many thanks in advance for any help!
> > > >
> > > >
> > > > MarÃa Maqueda
> > > >
> > > > Biomedical Engineering Research Centre (CREB)
> > > > Universitat Politècnica de Catalunya (UPC)
> > > >
> > > > -- output of sessionInfo():
> > > >
> > > >> sessionInfo()
> > > > R version 3.0.1 (2013-05-16)
> > > > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > > >
> > > > locale:
> > > > [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
> > > > [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> > > > [5] LC_TIME=Spanish_Spain.1252
> > > >
> > > > attached base packages:
> > > > [1] parallel stats graphics grDevices utils datasets methods
base
> > > >
> > > > other attached packages:
> > > > [1] human.db0_2.9.0 AnnotationForge_1.2.2
> > > > [3] hugene20sttranscriptcluster.db_2.12.1 org.Hs.eg.db_2.9.0
> > > > [5] AnnotationDbi_1.22.6 BiocInstaller_1.12.0
> > > > [7] limma_3.16.8 pd.hugene.2.0.st_3.8.0
> > > > [9] oligo_1.24.2 Biobase_2.20.1
> > > > [11] oligoClasses_1.22.0 BiocGenerics_0.6.0
> > > > [13] RSQLite_0.11.4 DBI_0.2-7
> > > >
> > > > loaded via a namespace (and not attached):
> > > > [1] affxparser_1.32.3 affyio_1.28.0 annotate_1.38.0
> > > > [4] Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8
> > > > [7] ff_2.2-12 foreach_1.4.1 genefilter_1.42.0
> > > > [10] GenomicRanges_1.12.5 IRanges_1.18.4 iterators_1.0.6
> > > > [13] preprocessCore_1.22.0 splines_3.0.1 stats4_3.0.1
> > > > [16] survival_2.37-4 tools_3.0.1 XML_3.98-1.1
> > > > [19] xtable_1.7-1 zlibbioc_1.6.0
> > > >
> > > > --
> > > > Sent via the guest posting facility at bioconductor.org.
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor@r-project.org
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> > > --
> > > James W. MacDonald, M.S.
> > > Biostatistician
> > > University of Washington
> > > Environmental and Occupational Health Sciences
> > > 4225 Roosevelt Way NE, # 100
> > > Seattle WA 98105-6099
> > >
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
[[alternative HTML version deleted]]