Question

Clariom D Assays, annotation

0

Entering edit mode

anitabaery • 0

@anitabaery-14544

Last seen 6.3 years ago

Hi,

I am using Clariom D Assays arrays and have tried to annotate it using the following:

library(oligo)

rma <- oligo::rma(data)

library(affycoretools)

anno<- annotateEset(rma, pd.clariom.d.human)

write.table(anno, file = "anno.txt")

Unfortunately, I got not annotated txt file with samples as rows and TC* IDs as columns.

How can I get annotated file (with gene annotation and transcript genome position) and with samples as a column and IDs as a row.

Thank you

Anita

annotation • 2.7k views

ADD COMMENT • link updated 6.4 years ago by James W. MacDonald 65k • written 6.4 years ago by anitabaery • 0

score 0 · Answer 1 · 2017-12-05

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

I don't think there is a write method for the ExpressionSet container, as the goal for that container is to make it easier to do your analysis within Bioconductor. Anyway, you could always just extract the fData and exprs slots into a data.frame and then output.

out <- data.frame(fData(anno), exprs(anno))
write.table(out, "anno.txt")

ADD COMMENT • link 6.4 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James.

Thank you for your reply. Yes, you are right "write" didn't work properly, using data.frame did.

but in my annotated file I have the following columns:

"PROBEID" "ID" "SYMBOL" "GENENAME" "Xsample1" "Xsample2" (...)

1. I do not know why "X" is added at the beginning of all my sample names

2.What I would like to have is a chromosome number and chromosome start position for a gene to do expression quantitative trait loci. Is this something I could get from annotateEset?

Thank you

Anita

ADD REPLY • link 6.4 years ago anitabaery • 0

0

Entering edit mode

1.) Usually an X is appended if the column names don't meet certain criteria, in which case make.names is called. But if your sampleNames are sample1 - sampleX, that shouldn't happen:

> make.names(paste0("sample", 1:5))
[1] "sample1" "sample2" "sample3" "sample4" "sample5"
> make.names(1:5)
[1] "X1" "X2" "X3" "X4" "X5"

Anyway, it's simple enough to strip off the X characters. See ?gsub.

2.) You will have to do an extra couple of steps to add chromosomal positions. The genomic locations aren't in the orgDb packages any longer, so you will need the Homo.sapiens package:

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

ADD REPLY • link 6.4 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,

Thanks very much. I have run:

data.rma <- oligo::rma(data)

library(affycoretools)

anno <- annotateEset(data.rma, pd.clariom.d.human)

library(Homo.sapiens)

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

out <- data.frame(fData(fd$CHRLOC), exprs(fd$CHRLOC))
write.table(out, "data.txt")

but got an error:

Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) :

mapIds must have at least one key to match against.

Calls: mapIds -> mapIds -> mapIds_base

Execution halted

ADD REPLY • link 6.4 years ago anitabaery • 0

0

Entering edit mode

The data in the pdInfoPackage doesn't include the Entrez Gene ID. You are better off using the clariomdhumantranscriptcluster.db package.

library(clariomdhumantranscriptcluster.db)

anno <- annotateEset(data.rma, clariomdhumantranscriptcluster.db)

And then proceed as I mentioned. But do note that this

out <- data.frame(fData(fd$CHRLOC), exprs(fd$CHRLOC))

isn't something I ever recommended you do, and whatever you think it will do, it won't. I leave it up to you to figure out why that is.

ADD REPLY • link 6.4 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you James, yes you are right fd$CHR and fd$CHR are list objects.

What I did was to change anno into matrix, list into data.frame and try cbind but the number or rows do not much.

matrix <- exprs(anno)

df <- data.frame(matrix(unlist(fd$CHRLOC)) )

df2 <- data.frame(matrix(unlist(fd$CHR)) )

results <- cbind(df,df2, t(matrix))

Error in data.frame(..., check.names = FALSE) :

arguments imply differing number of rows: 25762, 350

> dim(df2)

[1] 25762 1

> dim(df)

[1] 25762 1

> dim(matrix)

[1] 138745 350

ADD REPLY • link 6.4 years ago anitabaery • 0

0

Entering edit mode

Why didn't you simply use the code that I already provided?

fd <- fData(anno)

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

fData(anno) <- fd

Then you would normally proceed to analysis using e.g., limma. Or if you really want, you can do

out <- data.frame(fd, exprs(anno))

and do whatever it is that you want to do with those data.

ADD REPLY • link 6.4 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you for that James. So, I run according to your suggestions, but the output doesn't look right:

fd$CHR <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSCHROM", "ENTREZID")

'select()' returned 1:many mapping between keys and columns

fd$CHRLOC <- mapIds(Homo.sapiens, as.character(fd$ENTREZID), "CDSSTART", "ENTREZID")

'select()' returned 1:many mapping between keys and columns

fData(anno) <- fd

out <- data.frame(fd, exprs(anno))

write.table(out, "annotated.txt")

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :

unimplemented type 'list' in 'EncodeElement'

I guess the error is because I have list objects in my data.frame.

> typeof(out$PROBEID)

[1] "integer"

> typeof(out$ENTREZID)

[1] "integer"

> typeof(out$SYMBOL)

[1] "integer"

> typeof(out$GENENAME)

[1] "integer"

> typeof(out$CHR)

[1] "list"

> typeof(out$CHRLOC)

[1] "list"

and unfortunately the output is truncated and looks like that:

PROBEID	ENTREZID	SYMBOL	GENENAME	CHR	CHRLOC	Xaample1 Xsample2 (...)
AFFX-BkGr-GC03_st	AFFX-BkGr-GC03_st	NA	NA	NA

ADD REPLY • link 6.4 years ago anitabaery • 0

0

Entering edit mode

Hi James,

I have tried

df <- apply(df,5,as.character)

fwrite(out, file ="..")

out$col5 = as.character(out$col5)

but nothing is working ...

Can I also ask if the annotation is build human genome build 37, and if not is it possible to change to 37?

Thanks again

ADD REPLY • link 6.4 years ago anitabaery • 0

0

Entering edit mode

Hi again James,

I have modified the lines and it worked:

out$CHR = as.character(out$CHR)

out$CHRLOC = as.character(out$CHRLOC)

out = as.matrix(out)

write.csv(out, "data.csv")

However, I have 2 question

1. how to add CHRSTOP , I have tried CDSSTOP but had

Error in .testForValidCols(x, cols) :

Invalid columns: CDSSTOP. Please use the columns method to see a listing of valid arguments.

2. Is the annotation based on human genome build 37?

Thanks!

ADD REPLY • link 6.4 years ago anitabaery • 0

0

Entering edit mode

So you got this message:

Error in .testForValidCols(x, cols) :

  Invalid columns: CDSSTOP. Please use the columns method to see a listing of valid arguments.

Which seems pretty clear to me; you asked for something that isn't valid, and here's how to get a list of things that are valid. If you are planning to do much of anything with R/Bioconductor, you will have to learn to figure things out for yourself, particularly those things where you get an error message that clearly spells out where you went wrong and how to fix it.

In addition, the code you show won't return a list, so your example code doesn't match what you must have done:

> typeof(mapIds(Homo.sapiens, keys(Homo.sapiens, "ENTREZID"), "CDSCHROM","ENTREZID"))
'select()' returned 1:many mapping between keys and columns
[1] "character"

So if your out$CHR and out$CHRLOC are lists, it is most likely that you are doing

out$CHR <- mapIds(Homo.sapiens, as.character(fData$ENTREZID), "CDSCHROM","ENTREZID", mapIds = "list")

which as you found doesn't work as you might expect. And to answer your final question, note that the Homo.sapiens package has a show method:

> Homo.sapiens
OrganismDb Object:
# Includes GODb Object:  GO.db
# With data about:  Gene Ontology
# Includes OrgDb Object:  org.Hs.eg.db
# Gene data about:  Homo sapiens
# Taxonomy Id:  9606
# Includes TxDb Object:  TxDb.Hsapiens.UCSC.hg19.knownGene
# Transcriptome data about:  Homo sapiens
# Based on genome:  hg19
# The OrgDb gene id ENTREZID is mapped to the TxDb gene id GENEID .

ADD REPLY • link 6.4 years ago James W. MacDonald 65k