How to add more columns to ReportingTools output?
1
0
Entering edit mode
jirkanov • 0
@jirkanov-14624
Last seen 8 months ago

Hello,

I am doing differential expression analysis from Illumina microarray. Annotation package for microarray (illuminaHumanv3.db) contains a lot of useful information which I want to be included in ReportingTools HTML table (e.g. GO ID). How to include this additional information (columns respectively)? I have been searching in documentation, but without success.

This is my current code:

# data_onco is my ExpressionSet

# only "PROBEID", "SYMBOL", "GENENAME", "ENTREZID" will be included in final HTML :/
genes <- select(illuminaHumanv3.db,
keys = featureNames(data_onco),
columns = c("PROBEID", "SYMBOL", "GENENAME", "ENTREZID", "GO"),
keytype = "PROBEID")
genes <- genes[!duplicated(genes[, "PROBEID"]), ]
fData(data_onco) <- genes

sample_group_ <- sample_info\$Sample_Group
design <- model.matrix(~0 + sample_group_)
colnames(design) <- levels(sample_group_)

fit <- lmFit(data_onco, design = design)
contrasts <- makeContrasts(levels = design,
NormalVsTumour = Tumour-Normal)
fit2 <- contrasts.fit(fit, contrasts)
fit2 <- eBayes(fit2)

topTable(fit2, coef = "NormalVsTumour", number = N_REPORT_GENES, sort.by = "logFC")

lattice.options(default.theme = reporting.theme())
deReport <- HTMLReport(shortName = "Oncogene",
title = "Oncogene Normal vs Tumour samples",
reportDirectory = REPORT_PATH)
p <- publish(fit2, deReport, eSet = data_onco, factor = sample_group_, n = N_REPORT_GENES, coef = c("NormalVsTumour"))
finish(deReport)
2
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

I am not sure how useful what you are doing is. There are any number of GO IDs that will be appended to a given gene, and you are simply taking the first one.

> z <- mapIds(illuminaHumanv3.db, keys(illuminaHumanv3.db), "GO", "PROBEID", multiVals="list")
'select()' returned 1:many mapping between keys and columns

> table(table(sapply(z, length)))

1     2     3     4     5     6     7     8     9    10    11    12    13
17    11     6     6     6     2     3     5     3     4     2     1     1
14    15    16    17    18    19    20    25    26    29    30    33    36
1     1     4     1     2     1     1     3     2     1     2     3     1
44    45    47    52    58    60    62    71    72    77    90    91    97
1     1     1     1     1     1     1     1     1     1     3     2     1
103   135   140   146   163   178   179   197   213   252   272   281   302
1     1     1     1     1     1     1     1     1     1     1     1     1
336   342   408   432   483   571   624   655   690   829   830   877  1002
1     1     1     1     1     1     1     1     1     1     1     1     1
1022  1107  1145  1186  1191  1286  1323  1368  1470  1654  1677 22536
1     1     1     1     1     1     1     1     1     1     1     1

So only 17 probes have just one GO ID, and the others (that actually have a GO ID) have two or more. Just selecting a single GO ID seems...not useful?

Anyway, please note that the method called by publish is based on the input, which then coerces your input (an MArrayLM object in your case) into a data.frame, based on some defaults. I think there are ways to change the defaults, but it has never been readily apparent to me how one does that, so I tend to just convert to the data.frame I want, using topTable, and then feed that to publish rather than relying on the internal coercion that ReportingTools does. If you want to include the little glyphs, you could use makeImages from my affycoretools package to add those to your data.frame.

0
Entering edit mode

Thanks, good solution! I didn't realize publish is internally using topTable which in fact generates a data.frame. I will definitely look at your package, looks handy.

I haven't mentioned I need only genes with specific GO IDs (I know I am deleting duplicates in my example; that code line shouldn't be there). So I can put a few GO IDs to one row and it will still be well-arranged.