Question: Clariom D Assays, annotation
0
gravatar for anitabaery
17 months ago by
anitabaery0
anitabaery0 wrote:

Hi,

I am using Clariom D Assays arrays and have tried to annotate it using the following:

library(oligo)

rma <- oligo::rma(data)

library(affycoretools)

anno<- annotateEset(rma, pd.clariom.d.human)

write.table(anno, file = "anno.txt")

Unfortunately, I got not annotated txt file with samples as rows and TC* IDs as columns.

How can I get annotated file (with gene annotation and transcript genome position) and with samples as a column and IDs as a row.

Thank you

Anita

 

annotation • 576 views
ADD COMMENTlink modified 17 months ago by James W. MacDonald50k • written 17 months ago by anitabaery0
Answer: Clariom D Assays, annotation
0
gravatar for James W. MacDonald
17 months ago by
United States
James W. MacDonald50k wrote:

I don't think there is a write method for the ExpressionSet container, as the goal for that container is to make it easier to do your analysis within Bioconductor. Anyway, you could always just extract the fData and exprs slots into a data.frame and then output.

out <- data.frame(fData(anno), exprs(anno))
write.table(out, "anno.txt")

 

ADD COMMENTlink written 17 months ago by James W. MacDonald50k

Hi James.

Thank you for your reply. Yes, you are right "write" didn't work properly, using data.frame did.

but in my annotated file I have the following columns:

"PROBEID" "ID" "SYMBOL" "GENENAME" "Xsample1" "Xsample2" (...)

1. I do not know why "X" is added at the beginning of all my sample names

2.What I would like to have is a chromosome number and chromosome start position for a gene to do expression quantitative trait loci. Is this something I could get from annotateEset?

Thank you

Anita

ADD REPLYlink written 17 months ago by anitabaery0

1.) Usually an X is appended if the column names don't meet certain criteria, in which case make.names is called. But if your sampleNames are sample1 - sampleX, that shouldn't happen:

> make.names(paste0("sample", 1:5))
[1] "sample1" "sample2" "sample3" "sample4" "sample5"
> make.names(1:5)
[1] "X1" "X2" "X3" "X4" "X5"

Anyway, it's simple enough to strip off the X characters. See ?gsub.

2.) You will have to do an extra couple of steps to add chromosomal positions. The genomic locations aren't in the orgDb packages any longer, so you will need the Homo.sapiens package:

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")
ADD REPLYlink written 17 months ago by James W. MacDonald50k

Hi James,

Thanks very much. I have run:

data.rma <- oligo::rma(data)

library(affycoretools)

anno <- annotateEset(data.rma, pd.clariom.d.human)

library(Homo.sapiens)

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

out <- data.frame(fData(fd$CHRLOC), exprs(fd$CHRLOC))
write.table(out, "data.txt")

but got an error:

Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) :

mapIds must have at least one key to match against.

Calls: mapIds -> mapIds -> mapIds_base

Execution halted

 

 

ADD REPLYlink written 17 months ago by anitabaery0

The data in the pdInfoPackage doesn't include the Entrez Gene ID. You are better off using the clariomdhumantranscriptcluster.db package.

library(clariomdhumantranscriptcluster.db)

anno <- annotateEset(data.rma, clariomdhumantranscriptcluster.db)

And then proceed as I mentioned. But do note that this

out <- data.frame(fData(fd$CHRLOC), exprs(fd$CHRLOC))

isn't something I ever recommended you do, and whatever you think it will do, it won't. I leave it up to you to figure out why that is.

 

ADD REPLYlink written 17 months ago by James W. MacDonald50k

Thank you James, yes you are right fd$CHR and fd$CHR  are list objects.

What I did was to change anno into matrix, list into data.frame and try cbind but the number or rows do not much.

matrix <- exprs(anno)

df <- data.frame(matrix(unlist(fd$CHRLOC)) )

df2 <- data.frame(matrix(unlist(fd$CHR)) )

results <- cbind(df,df2, t(matrix))

Error in data.frame(..., check.names = FALSE) :

  arguments imply differing number of rows: 25762, 350

> dim(df2)

[1] 25762     1

> dim(df)

[1] 25762     1

> dim(matrix)

[1] 138745    350

ADD REPLYlink written 17 months ago by anitabaery0

Why didn't you simply use the code that I already provided?

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

fData(anno) <- fd

Then you would normally proceed to analysis using e.g., limma. Or if you really want, you can do

out <- data.frame(fd, exprs(anno))

and do whatever it is that you want to do with those data.

ADD REPLYlink written 17 months ago by James W. MacDonald50k

Thank you for that James. So, I run according to your suggestions, but the output doesn't look right:

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

'select()' returned 1:many mapping between keys and columns

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

'select()' returned 1:many mapping between keys and columns

fData(anno) <- fd

out <- data.frame(fd, exprs(anno))

write.table(out, "annotated.txt")

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  :

  unimplemented type 'list' in 'EncodeElement'

I guess the error is because I have list objects in my data.frame.

> typeof(out$PROBEID)

[1] "integer"

> typeof(out$ENTREZID)

[1] "integer"

> typeof(out$SYMBOL)

[1] "integer"

> typeof(out$GENENAME)

[1] "integer"

> typeof(out$CHR)

[1] "list"

> typeof(out$CHRLOC)

[1] "list"

and unfortunately the output is truncated and looks like that:

PROBEID

ENTREZID

SYMBOL

GENENAME

CHR

CHRLOC

Xaample1  Xsample2  (...)

AFFX-BkGr-GC03_st

AFFX-BkGr-GC03_st

NA

NA

NA

   

 

ADD REPLYlink written 17 months ago by anitabaery0

Hi James,

I have tried

df <- apply(df,5,as.character)

fwrite(out, file ="..")

out$col5 = as.character(out$col5)

but nothing is working ...

Can I also ask if the annotation is build human genome build 37, and if not is it possible to change to 37?

Thanks again

 

ADD REPLYlink written 17 months ago by anitabaery0

Hi again James,

I have modified the lines and it worked:

out$CHR = as.character(out$CHR)

out$CHRLOC = as.character(out$CHRLOC)

out = as.matrix(out)

write.csv(out, "data.csv")

However, I have 2 question

1. how to add CHRSTOP , I have tried CDSSTOP but had

Error in .testForValidCols(x, cols) :

  Invalid columns: CDSSTOP. Please use the columns method to see a listing of valid arguments.

2. Is the annotation based on human genome build 37?

Thanks!

 

 

ADD REPLYlink written 17 months ago by anitabaery0

So you got this message:

Error in .testForValidCols(x, cols) :

  Invalid columns: CDSSTOP. Please use the columns method to see a listing of valid arguments.

Which seems pretty clear to me; you asked for something that isn't valid, and here's how to get a list of things that are valid. If you are planning to do much of anything with R/Bioconductor, you will have to learn to figure things out for yourself, particularly those things where you get an error message that clearly spells out where you went wrong and how to fix it.

In addition, the code you show won't return a list, so your example code doesn't match what you must have done:

> typeof(mapIds(Homo.sapiens, keys(Homo.sapiens, "ENTREZID"), "CDSCHROM","ENTREZID"))
'select()' returned 1:many mapping between keys and columns
[1] "character"

So if your out$CHR and out$CHRLOC are lists, it is most likely that you are doing

out$CHR <- mapIds(Homo.sapiens, as.character(fData$ENTREZID), "CDSCHROM","ENTREZID", mapIds = "list")

which as you found doesn't work as you might expect. And to answer your final question, note that the Homo.sapiens package has a show method:

> Homo.sapiens
OrganismDb Object:
# Includes GODb Object:  GO.db
# With data about:  Gene Ontology
# Includes OrgDb Object:  org.Hs.eg.db
# Gene data about:  Homo sapiens
# Taxonomy Id:  9606
# Includes TxDb Object:  TxDb.Hsapiens.UCSC.hg19.knownGene
# Transcriptome data about:  Homo sapiens
# Based on genome:  hg19
# The OrgDb gene id ENTREZID is mapped to the TxDb gene id GENEID .

 

ADD REPLYlink written 17 months ago by James W. MacDonald50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 120 users visited in the last hour