Question: Clariom D Assays, annotation
0
22 months ago by
anitabaery0 wrote:

Hi,

I am using Clariom D Assays arrays and have tried to annotate it using the following:

library(oligo)

rma <- oligo::rma(data)

library(affycoretools)

anno<- annotateEset(rma, pd.clariom.d.human)

write.table(anno, file = "anno.txt")

Unfortunately, I got not annotated txt file with samples as rows and TC* IDs as columns.

How can I get annotated file (with gene annotation and transcript genome position) and with samples as a column and IDs as a row.

Thank you

Anita

annotation • 665 views
modified 22 months ago by James W. MacDonald51k • written 22 months ago by anitabaery0
0
22 months ago by
United States
James W. MacDonald51k wrote:

I don't think there is a write method for the ExpressionSet container, as the goal for that container is to make it easier to do your analysis within Bioconductor. Anyway, you could always just extract the fData and exprs slots into a data.frame and then output.

out <- data.frame(fData(anno), exprs(anno))
write.table(out, "anno.txt")

Hi James.

Thank you for your reply. Yes, you are right "write" didn't work properly, using data.frame did.

but in my annotated file I have the following columns:

"PROBEID" "ID" "SYMBOL" "GENENAME" "Xsample1" "Xsample2" (...)

1. I do not know why "X" is added at the beginning of all my sample names

2.What I would like to have is a chromosome number and chromosome start position for a gene to do expression quantitative trait loci. Is this something I could get from annotateEset?

Thank you

Anita

1.) Usually an X is appended if the column names don't meet certain criteria, in which case make.names is called. But if your sampleNames are sample1 - sampleX, that shouldn't happen:

> make.names(paste0("sample", 1:5))
[1] "sample1" "sample2" "sample3" "sample4" "sample5"
> make.names(1:5)
[1] "X1" "X2" "X3" "X4" "X5"

Anyway, it's simple enough to strip off the X characters. See ?gsub.

2.) You will have to do an extra couple of steps to add chromosomal positions. The genomic locations aren't in the orgDb packages any longer, so you will need the Homo.sapiens package:

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

Hi James,

Thanks very much. I have run:

data.rma <- oligo::rma(data)

library(affycoretools)

anno <- annotateEset(data.rma, pd.clariom.d.human)

library(Homo.sapiens)

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

out <- data.frame(fData(fd$CHRLOC), exprs(fd$CHRLOC))
write.table(out, "data.txt")

but got an error:

Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) :

mapIds must have at least one key to match against.

Calls: mapIds -> mapIds -> mapIds_base

Execution halted

The data in the pdInfoPackage doesn't include the Entrez Gene ID. You are better off using the clariomdhumantranscriptcluster.db package.

library(clariomdhumantranscriptcluster.db)

anno <- annotateEset(data.rma, clariomdhumantranscriptcluster.db)

And then proceed as I mentioned. But do note that this

out <- data.frame(fData(fd$CHRLOC), exprs(fd$CHRLOC))

isn't something I ever recommended you do, and whatever you think it will do, it won't. I leave it up to you to figure out why that is.

Thank you James, yes you are right fd$CHR and fd$CHR  are list objects.

What I did was to change anno into matrix, list into data.frame and try cbind but the number or rows do not much.

matrix <- exprs(anno)

df <- data.frame(matrix(unlist(fd$CHRLOC)) ) df2 <- data.frame(matrix(unlist(fd$CHR)) )

results <- cbind(df,df2, t(matrix))

Error in data.frame(..., check.names = FALSE) :

arguments imply differing number of rows: 25762, 350

> dim(df2)

[1] 25762     1

> dim(df)

[1] 25762     1

> dim(matrix)

[1] 138745    350

Why didn't you simply use the code that I already provided?

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

fData(anno) <- fd

Then you would normally proceed to analysis using e.g., limma. Or if you really want, you can do

out <- data.frame(fd, exprs(anno))

and do whatever it is that you want to do with those data.

Thank you for that James. So, I run according to your suggestions, but the output doesn't look right:

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

'select()' returned 1:many mapping between keys and columns

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

'select()' returned 1:many mapping between keys and columns

fData(anno) <- fd

out <- data.frame(fd, exprs(anno))

write.table(out, "annotated.txt")

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  :

unimplemented type 'list' in 'EncodeElement'

I guess the error is because I have list objects in my data.frame.

> typeof(out$PROBEID) [1] "integer" > typeof(out$ENTREZID)

[1] "integer"

> typeof(out$SYMBOL) [1] "integer" > typeof(out$GENENAME)

[1] "integer"

> typeof(out$CHR) [1] "list" > typeof(out$CHRLOC)

[1] "list"

and unfortunately the output is truncated and looks like that:

 PROBEID ENTREZID SYMBOL GENENAME CHR CHRLOC Xaample1  Xsample2  (...) AFFX-BkGr-GC03_st AFFX-BkGr-GC03_st NA NA NA

Hi James,

I have tried

df <- apply(df,5,as.character)

fwrite(out, file ="..")

out$col5 = as.character(out$col5)

but nothing is working ...

Can I also ask if the annotation is build human genome build 37, and if not is it possible to change to 37?

Thanks again

Hi again James,

I have modified the lines and it worked:

out$CHR = as.character(out$CHR)

out$CHRLOC = as.character(out$CHRLOC)

out = as.matrix(out)

write.csv(out, "data.csv")

However, I have 2 question

1. how to add CHRSTOP , I have tried CDSSTOP but had

Error in .testForValidCols(x, cols) :

Invalid columns: CDSSTOP. Please use the columns method to see a listing of valid arguments.

2. Is the annotation based on human genome build 37?

Thanks!

So you got this message:

Error in .testForValidCols(x, cols) :

Invalid columns: CDSSTOP. Please use the columns method to see a listing of valid arguments.

Which seems pretty clear to me; you asked for something that isn't valid, and here's how to get a list of things that are valid. If you are planning to do much of anything with R/Bioconductor, you will have to learn to figure things out for yourself, particularly those things where you get an error message that clearly spells out where you went wrong and how to fix it.

In addition, the code you show won't return a list, so your example code doesn't match what you must have done:

> typeof(mapIds(Homo.sapiens, keys(Homo.sapiens, "ENTREZID"), "CDSCHROM","ENTREZID"))
'select()' returned 1:many mapping between keys and columns
[1] "character"

So if your out$CHR and out$CHRLOC are lists, it is most likely that you are doing

out$CHR <- mapIds(Homo.sapiens, as.character(fData$ENTREZID), "CDSCHROM","ENTREZID", mapIds = "list")

which as you found doesn't work as you might expect. And to answer your final question, note that the Homo.sapiens package has a show method:

> Homo.sapiens
OrganismDb Object:
# Includes GODb Object:  GO.db
# With data about:  Gene Ontology
# Includes OrgDb Object:  org.Hs.eg.db
# Gene data about:  Homo sapiens
# Taxonomy Id:  9606
# Includes TxDb Object:  TxDb.Hsapiens.UCSC.hg19.knownGene
# Transcriptome data about:  Homo sapiens
# Based on genome:  hg19
# The OrgDb gene id ENTREZID is mapped to the TxDb gene id GENEID .