Entering edit mode
Hi William,
Please keep the posts on the list.
You should certainly remove from analysis those probes which do not
express in any of your samples, ie keeping only the probes which
express in at least one sample. You can do so by applying a detection
p value cutoff (eg 0.05 or 0.01) or you may run the propexpr function
to estimate the proportion of expressed probes and then use that
information to filter out probes. See ?propexpr for more details.
Best wishes,
Wei
On Jul 4, 2013, at 2:55 PM, William D'Avigdor wrote:
> Hi Wei,
>
> Many thanks for your response.
>
> I would like to ask you another question, specifically about probe
filtering.
>
> So far I have performed all my analyses on UNFILTERED Illumina data
from Genome Studio. Is it still VALID for Illumina data to use
unfiltered data in contrast to filtered probes (comparing to
background signal) with a particular p-value (eg p=0.01, or 0.1
according to your paper: Illumina WG-6 BeadChip strips should be
normalised separately).
>
> I am assuming when performing hierachical clustering on the full
data, the genes at background level will not significantly contribute
to the clustering. However, I do notice that the clustering distance
is narrowed obviously because the samples appear closer than they
otherwise would.
>
> Further, when performing t-tests / LIMMA on the full data, those
genes that are close to background level should not contribute to
significant differences across groups. Is this correct? And is there
anything I am missing out on? Apart from maybe a contribution by FDR.
>
> Many thanks,
> Wil
>
> On 2/07/2013 7:18 PM, Wei Shi wrote:
>> Dear William,
>>
>> What you have done is correct. As you have found, the 'ProbeID' is
the same as the Array_Address_ID. The 'ProbeID' column was used in the
old versions of Illumina BeadChip arrays, and it was later replaced
with 'PROBE_ID" in the newer versions of BeadChips.
>>
>> The neqc() function uses negative control probes to carry out
background correction. The 'TargetID' column in the control probe
profile file indicates the types of control probes and the negative
control probes have the type of 'NEGATIVE'. Neqc also uses all the
probes including regular probes and all types of control probes
(negative controls, housekeeping, ...) to perform a quantile between-
array normalization.
>>
>> Best wishes,
>>
>> Wei
>>
>> On Jul 2, 2013, at 3:56 PM, William D'Avigdor wrote:
>>
>>> Hi,
>>>
>>> I am doing some Illumina analysis using HumanWG-6_V2 microarrays,
and have been using the annotation file: HumanWG-
6_V2_0_R4_11223189_A.bgx, and I am normalising using the NEQC function
in the LIMMA package.
>>>
>>> I know there are traditionally a number of Illumina identifiers
and I am concerned that I may have potentially been using the wrong
ones, and I'm not sure whether this has affected the normalisation
proceedure, or anything at all.
>>>
>>> After summarisation in Genome Studio, when looking at the 'Sample
Probe Profile', the main identifiers that come up (and which I have
used in LIMMA) are 'PROBE_ID' and 'SYMBOL', the first row being
ILMN_1762337 and 7A5 respectively. I also noticed that this PROBE_ID
column was the one used in the Illumina example in the LIMMA manual.
>>>
>>> HOWEVER, in Genome Studio, there is also a column called
'ProbeID'. This does not exist in the original annotation file
(HumanWG-6_V2_0_R4_11223189_A), but it is identical to the
Array_Address_ID (except for the preceeding 000s), the latter of which
is both in Genome Strudio and in the Annotation file, and UNIQUE to
the version of the microarray.
>>>
>>> IN CONTRAST, in the 'Control Probe Profile' in Genome Studio,
there is only the 'TargetID' and the 'ProbeID' available, the latter
of which (I believe) is the Array_Address_ID?
>>>
>>> HENCE, for the LIMMA input, I am wondering whether I am correct
when I have included the Sample Probe ID text file (which includes
PROBE_ID, that is, ILMN_1762337), and the Control Probe ID text file
(which includes ProbeID instead, which is most likely the Array
Address ID).
>>>
>>> Many thanks in advance,
>>> William d'Avigdor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the addressee.
>> You must not disclose, forward, print or use it without the
permission of the sender.
>>
______________________________________________________________________
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}