Entering edit mode
Jakub Stanislaw Nowak
▴
70
@jakub-stanislaw-nowak-6656
Last seen 10.2 years ago
Hello everyone,
This my first attempt so it may not be a perfect email. I am not very
advanced in bioinformatics so I tried to be very detailed.
Basically I am trying to annotate microarray dataset from Affymetrix
using bioconductor.
I used following steps:
1. loading libraries
> library(annotate)
> library(limma)
> library(mogene10sttranscriptcluster.db)
> library(affy)
2. I read my target files into annotated data frame using
>adf <- read.AnnotatedDataFrame("target.txt",header=TRUE,row.names=1
,as.is=TRUE)
3. I read my expression .CEL files into target files using
>mydata <- ReadAffy(filenames=pData(adf)$FileName,phenoData=adf)
> > mydata
> AffyBatch object
> size of arrays=1050x1050 features (19 kb)
> cdf=MoGene-1_0-st-v1 (34760 affyids)
> number of samples=6
> number of genes=34760
> annotation=mogene10stv1
> notes=
4. I normalised my data with ram
>eset <- rma(mydata)
when you check eset it looks ok
> > eset
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 34760 features, 6 samples
> element names: exprs
> protocolData
> sampleNames: GSM910962.CEL GSM910963.CEL ... GSM910967.CEL (6
total)
> varLabels: ScanDate
> varMetadata: labelDescription
> phenoData
> sampleNames: GSM910962.CEL GSM910963.CEL ... GSM910967.CEL (6
total)
> varLabels: FileName Description
> varMetadata: labelDescription
> featureData
> featureNames: 10338001 10338003 ... 10608724 (34760 total)
> fvarLabels: ID Symbol Name
> fvarMetadata: labelDescription
> experimentData: use 'experimentData(object)'
> Annotation: mogene10stv1
5. I was able to generate fold change vector for interesting samples
but I have problem annotating eset.
6. I tried annotate the eset file using annotation from above using
mogene10sttranscriptcluster.db or mogene10stprobes.db.
#I build an annotation table
ID <- featureNames(eset)
Symbol <- getSYMBOL(ID, "mogene10sttranscriptcluster.db")
Name <- as.character(lookUp(ID, "mogene10sttranscriptcluster.db",
"GENENAME"))
tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name,
stringsAsFactors=F)
tmp[tmp=="NA"] <- NA #fix padding with NA characters
The problem I have is that for large number of IDs - all initial 6500
- I am getting Symbol and Name annotated as NA
Here is the output for some of them
> > Name[6500:6550]
> [1] "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA"
"NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA"
> [22] "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA"
"NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA"
> [43] "NA" "NA" "NA" "NA" "NA" "NA" "NA" "NA" ?NA"
>
> > Symbol[6500:6550]
> 10344545 10344546 10344547 10344548 10344549 10344550 10344551
10344552 10344553 10344554 10344555 10344556
> NA NA NA NA NA NA NA
NA NA NA NA NA
> 10344557 10344558 10344559 10344560 10344561 10344562 10344563
10344564 10344565 10344566 10344567 10344568
> NA NA NA NA NA NA NA
NA NA NA NA NA
> 10344569 10344570 10344571 10344572 10344573 10344574 10344575
10344576 10344577 10344578 10344579 10344580
> NA NA NA NA NA NA NA
NA NA NA NA NA
> 10344581 10344582 10344583 10344584 10344585 10344586 10344587
10344588 10344589 10344590 10344591 10344592
> NA NA NA NA NA NA NA
NA NA NA NA NA
> 10344593 10344594 10344595
> NA NA NA
#So if I assign it as feature data of the current Eset
fData(eset) <- tmp
#and perform stat with limma
#Build the design matrix
design <- model.matrix(~-1+factor(c(1,1,2,2,3,3)))
colnames(design) <- c(?mock?,"siGFP?,"siLin28a",?")
> > design
> mock siGFP siLin28a
> 1 1 0 0
> 2 1 0 0
> 3 0 1 0
> 4 0 1 0
> 5 0 0 1
> 6 0 0 1
> attr(,"assign")
> [1] 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$`factor(c(1, 1, 2, 2, 3, 3))`
> [1] "contr.treatment"
# instructs Limma which comparisons to make
contrastmatrix <- makeContrasts(mock-siGFP,mock-siLin28a,siGFP-
siLin28a,levels=design)
> > contrastmatrix
> Contrasts
> Levels mock - siGFP mock - siLin28a siGFP - siLin28a
> mock 1 1 0
> siGFP -1 0 1
> siLin28a 0 -1 -1
# make the contrasts
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrastmatrix)
fit2 <- eBayes(fit2)
#listed the top differentially expressed genes
> > topTable(fit2,coef=1,adjust="fdr")
> ID Symbol Name logFC AveExpr t
P.Value adj.P.Val B
> 10342604 10342604 <na> <na> 1.831809 2.845863 14.037480
1.248257e-05 0.4338942 -3.694032
> 10343224 10343224 <na> <na> -2.751868 2.658551 -10.992289
4.802754e-05 0.8347186 -3.715792
> 10339733 10339733 <na> <na> 1.703917 3.426402 9.860457
8.674159e-05 1.0000000 -3.728996
> 10341175 10341175 <na> <na> -2.861665 2.167471 -8.877976
1.526481e-04 1.0000000 -3.744350
> 10340405 10340405 <na> <na> -1.368074 2.944242 -8.245297
2.263752e-04 1.0000000 -3.756919
> 10343199 10343199 <na> <na> -1.238289 2.077839 -8.130892
2.437752e-04 1.0000000 -3.759472
> 10339048 10339048 <na> <na> 1.101959 2.129288 7.949683
2.746238e-04 1.0000000 -3.763713
> 10344413 10344413 <na> <na> -1.407850 2.259821 -7.810608
3.014011e-04 1.0000000 -3.767144
> 10343919 10343919 <na> <na> 1.134134 2.650166 7.644642
3.374231e-04 1.0000000 -3.771452
> 10338867 10338867 <na> <na> 1.162114 5.792621 7.454279
3.850651e-04 1.0000000 -3.776701
As you can see just this small portion is already missing information
about Symbol and Name.
So my question is do I use a correct .db library for annotation? As it
looks like I missing a lot of ID cannot be annotated
How can I fix that problem?
Many Thanks,
Jakub
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.