Question: ALL dataset, correct syntax using the exprs call?
0
gravatar for detroit.drive
7 months ago by
detroit.drive0 wrote:

Hello - I'm new to the bioconductor suite and still don't quite understand how to invoke the exprs class.  For example, let's say I'm trying to find genes within the ALL dataset with the most and the least significant effect on age.  I've been able to write snippets to find the min/max exprs (expression level) within the set...

> min(exprs(ALL))
[1] 1.984919
> max(exprs(ALL))
[1] 14.12657

...but cannot seem to write a snippet that maps that min/max value to the actual gene?

As for assessing how age effects exprs, this is tricky.  Each patient obviously has an age (which is a column) and also an expression level (which as I understand is a function) that maps back to a specific gene.  As a start - I thought I'd just plot them but again I'm having trouble with the syntax, I've tried the following from what I've read on the web but to no avail...

> plot(exprs,age(ALL))

and...

> plot(exprs~age(ALL))

Anyone see a glaring problem?

 

A concept that I cannot quite grasp is from the above command I can see all the column names, there are 21 total. However, when I perform the following command...

> dim(exprs(ALL))
[1] 12625   128

...I can see that I have 12625 genes (rows) and 128 patients (columns). Why don't I see 128 column names with colnames(pData(ALL)) command?

 

 

biobase bioconductor all exprs • 264 views
ADD COMMENTlink modified 7 months ago by Martin Morgan ♦♦ 23k • written 7 months ago by detroit.drive0
Answer: ALL dataset, correct syntax using the exprs call?
1
gravatar for Martin Morgan
7 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

The ALL data is at it's heart a MATRIX of 12625 features (genes) x 128 sample expression values. The pData describe each sample, so pData(ALL) is a data frame of 128 samples x 21 attributes measures on those samples. If one wanted to plot the relationship between expression of a single gene, and the age of the sample, across all samples, you could get the age

age <- pData(ALL)$age  # shortcut: ALL$age

the expression of a single feature (row), e.g., the gene in row 8435)

expression <- exprs(ALL)[8435,]

and plot these, e.g.,

plot(expression ~ age)

or perhaps

plot(expression ~ ALL$sex)

The feature is

> rownames(ALL)[8435]
[1] "38355_at"

Since

> annotation(ALL)
[1] "hgu95av2"

We can attach this package and map the affymetrix ID to a more familiar gene symbol or other identifier

> library(hgu95av2.db)
> mapIds(hgu95av2.db::hgu95av2.db, rownames(ALL)[8435], "SYMBOL", "PROBEID")
'select()' returned 1:1 mapping between keys and columns
38355_at 
 "DDX3Y" 
> mapIds(hgu95av2.db::hgu95av2.db, rownames(ALL)[8435], "GENENAME", "PROBEID")
'select()' returned 1:1 mapping between keys and columns
                       38355_at 
"DEAD-box helicase 3, Y-linked" 
> mapIds(hgu95av2.db::hgu95av2.db, rownames(ALL)[8435], "ENTREZID", "PROBEID")
'select()' returned 1:1 mapping between keys and columns
38355_at 
  "8653"

Simple mathematical calculations like 'min' and 'max' won't provide information on 'significant effect on age', for which you would need to fit a linear model using, e.g., the limma package. The limma vignette, available on the page just linked) is extremely thorough.

ADD COMMENTlink written 7 months ago by Martin Morgan ♦♦ 23k

Thanks Martin.   I think my issue revolves around writing the proper syntax to call the required pData (data.frame) and exprs (matrix) elements.  I think I'll need to develop a loop to help me plot the distribution of p-values for gene effect on age (for all 12625 genes).

ADD REPLYlink written 7 months ago by detroit.drive0
1

Trying to understand 12000 graphs does not sound like the right thing to do at all. Instead you'll want to regress expression on age for each gene, and develop a 'top table' ranking from most to least significant. Good luck!

ADD REPLYlink written 7 months ago by Martin Morgan ♦♦ 23k

Hi Martin and other Bioconductors,

I've updated my packages to include limma and developed the following model:

> ourData <- exprs(ALL)
> ourDataFactor <- ourDataFactor <- pData(ALL)$age 

> design <-model.matrix(~ ourDataFactor)
> fit <- lmFit(ourData, design)
> fit <- ebayes(fit)
> topTable(fit)

However, I keep getting the following error for the lmFit step (also not showing as a function with help(lmFit).

> fit <- lmFit(ourData, design)​

Error in lmFit(ourData, design) : 

  row dimension of design doesn't match column dimension of data object

Why can't I complete this step?

ADD REPLYlink modified 7 months ago • written 7 months ago by detroit.drive0
1

lmFit is in the limma package, so you need to say

library(limma)
fit <- lmFit(ourData, design)
...

or

fit <- limma::lmFit(ourData, design)

probably the former is better for interactive use.

ADD REPLYlink written 7 months ago by Martin Morgan ♦♦ 23k

Martin, I found this webpage you put together a few years ago (https://www.bioconductor.org/help/course-materials/2016/BiocIntro-May/A3_Statistical_Analysis.html).  I'm wondering if its possible to adapt your code so it builds a linear model for age effect on gene expression in ALL.  You have the following:

(fit <- lm(age ~ sex, pdata))

I'd like to switch sex with exprs(ALL).

ADD REPLYlink written 7 months ago by detroit.drive0
1

You're asking a different question, so ask a new top-level post 'building a linear model' or some such, after of course reading the excellent limma manual.

ADD REPLYlink written 7 months ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 445 users visited in the last hour