Entering edit mode
Guido Hooiveld
★
4.1k
@guido-hooiveld-2020
Last seen 2 days ago
Wageningen University, Wageningen, the …
Hi,
Just to confirm I am doing things properly:
I have created an expressionSet from a set of 120 Affymetrix arrays
and I also added some metadata (phenoData) to that expressionSet. This
all goes OK. Now I would like to subset the expressionSet based on one
of the variables described in the phenoData.
Although I am able to 'extract' the proper arrays, I noticed something
unexpected when looking at the phenoData of the new, subset object;
the phenoData slot that has been used to subset *seems* to still have
3 levels, whereas I expect only one level. This behaviour also occurs
for the other variables of the phenoData dataframe (i.e. more levels
are reported than are present). To be sure, can anyone explain if this
is to be expected, or whether I do something wrong?
Thanks,
Guido
# read data & normalize
>library(affyPLM)
>pheno <- read.delim(file="A213_metadata.txt", row.names=1)
>affy.data <- ReadAffy(cdfname="mogene11stv1mmentrezg",
phenoData=as.data.frame(pheno))
>
> # check
>validObject(affy.data)
[1] TRUE
>
> # normalize
>x.norm <- fitPLM(affy.data)
># convert PLMset to eSet!
>x.norm <- pset2eset(x.norm)
> #check
> validObject(x.norm)
[1] TRUE
>
> x.norm
ExpressionSet (storageMode: lockedEnvironment)
assayData: 21225 features, 120 samples
element names: exprs, se.exprs
protocolData: none
phenoData
sampleNames: G014_A05_01_801_I1_chow.CEL G014_A07_09_809_I1_HF.CEL
... G020_H12_120_824_I10_HF.CEL (120 total)
varLabels: Simil Diet ... Labeling (5 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: mogene11stv1mmentrezg
> dim(x.norm)
Features Samples
21225 120
>
> #Check Diet assignment
> x.norm$Diet
[1] chow hfd lfd chow hfd lfd chow hfd lfd chow hfd lfd lfd
chow hfd
[16] lfd chow hfd lfd chow hfd lfd chow hfd chow chow chow chow
lfd lfd
[31] lfd lfd hfd hfd hfd hfd chow chow chow chow lfd lfd lfd
lfd hfd
[46] hfd hfd hfd chow chow chow chow lfd lfd lfd lfd hfd hfd
hfd hfd
[61] chow chow chow chow lfd lfd lfd lfd hfd hfd hfd hfd chow
chow chow
[76] chow lfd lfd lfd lfd hfd hfd hfd hfd chow chow chow chow
lfd lfd
[91] lfd lfd hfd hfd hfd hfd chow chow chow chow lfd lfd lfd
lfd hfd
[106] hfd hfd hfd chow chow chow chow lfd lfd lfd lfd hfd hfd
hfd hfd
Levels: chow hfd lfd
>
> str(x.norm)
<<snip>
..@ phenoData :Formal class 'AnnotatedDataFrame' [package
"Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 5 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr [1:5] NA NA NA NA ...
.. .. ..@ data :'data.frame': 120 obs. of 5
variables:
.. .. .. ..$ Simil : Factor w/ 10 levels "i1","i10","i2",..: 1 1 3
1 1 3 1 1 3 1 ...
.. .. .. ..$ Diet : Factor w/ 3 levels "chow","hfd","lfd": 1 2 3
1 2 3 1 2 3 1 ...
.. .. .. ..$ Group : Factor w/ 30 levels "i10_chow","i10_hfd",..:
4 5 9 4 5 9 4 5 9 4 ...
.. .. .. ..$ Plate : Factor w/ 2 levels "G014","G020": 1 1 1 1 1 1
1 1 1 1 ...
.. .. .. ..$ Labeling: int [1:120] 3 1 2 1 1 2 1 1 2 1 ...
So far, so good.
Now I would like to extract data of only the 40 chow samples by
subsetting x.norm on variable 'Diet'.
># backup x.norm
> x.norm2 <- x.norm
>
> #subset only chow samples
> x.norm <- x.norm2[,x.norm2$Diet=="chow"]
> dim(x.norm)
Features Samples
21225 40
Subsetting samples seem to go OK...
> #Again check Diet assigment
> x.norm$Diet
[1] chow chow chow chow chow chow chow chow chow chow chow chow chow
chow chow
[16] chow chow chow chow chow chow chow chow chow chow chow chow chow
chow chow
[31] chow chow chow chow chow chow chow chow chow chow
Levels: chow hfd lfd
>
^^ why are there still 3 levels; i expected only one level, namely
"chow"
> str(x.norm)
<<snip>
..@ phenoData :Formal class 'AnnotatedDataFrame' [package
"Biobase"] with 4 slots
.. .. ..@ varMetadata :'data.frame': 5 obs. of 1 variable:
.. .. .. ..$ labelDescription: chr [1:5] NA NA NA NA ...
.. .. ..@ data :'data.frame': 40 obs. of 5
variables:
.. .. .. ..$ Simil : Factor w/ 10 levels "i1","i10","i2",..: 1 1 1
1 3 3 3 3 4 4 ...
.. .. .. ..$ Diet : Factor w/ 3 levels "chow","hfd","lfd": 1 1 1
1 1 1 1 1 1 1 ...
.. .. .. ..$ Group : Factor w/ 30 levels "i10_chow","i10_hfd",..:
4 4 4 4 7 7 7 7 10 10 ...
.. .. .. ..$ Plate : Factor w/ 2 levels "G014","G020": 1 1 1 1 1 1
1 1 2 2 ...
.. .. .. ..$ Labeling: int [1:40] 3 1 1 1 2 2 2 2 3 3 ...
^^ idem, why are for all variables the 'original' levels reported and
not the subset ones?
> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SpeCond_1.8.0 RColorBrewer_1.0-5
[3] hwriter_1.3 fields_6.6.3
[5] spam_0.27-0 mclust_3.4.11
[7] mogene11stv1mmentrezgcdf_14.1.0 affyPLM_1.30.0
[9] preprocessCore_1.16.0 gcrma_2.26.0
[11] affy_1.33.2 Biobase_2.14.0
[13] BiocGenerics_0.1.3
loaded via a namespace (and not attached):
[1] affyio_1.22.0 BiocInstaller_1.2.1 Biostrings_2.22.0
[4] IRanges_1.12.5 splines_2.14.0 tools_2.14.0
[7] zlibbioc_1.0.0
>
---------------------------------------------------------
Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Wageningen University
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
the Netherlands
tel: (+)31 317 485788
fax: (+)31 317 483342
email: guido.hooiveld@wur.nl
internet: http://nutrigene.4t.com
http://scholar.google.com/citations?user=qFHaMnoAAAAJ
http://www.researcherid.com/rid/F-4912-2010
[[alternative HTML version deleted]]