Help with differential gene expression using Agilent Microarray data (from GEO database)
1
0
Entering edit mode
r02eb20 • 0
@d524ab81
Last seen 11 months ago
United Kingdom

Hi, I am quite new to R and I have been trying to use a dataset from the GEO database (GEO104438) that is interesting to my PhD work. The file that I have been able to download and use in R, is a table with ID references and then the normalised signal intensity values for all of the donors (and cell types). I have put a screenshot of the table for reference.

Data File as seen on my computer

I am hoping to try look at whether there are any genes that are differentially expressed depending on what cell type they are (as they are both derived from a common precursor). So for each donor I have 2 different cell types. Due to this, I was initially trying to tell R that my data is paired but I am stuck.

I have been reading online and trying to use the limma guide but I am completely confused as all of the examples I have seen online are not in a similar format to this dataset.

I would appreciate any help!

limma Paireddata DifferentialExpression • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

I would use GEOquery to get the data.


> dat <- getGEO("GSE104438")[[1]]

## now you have to get rid of some extra cruft
> names(fData(dat))
 [1] "ID"                   "SPOT_ID"              "CONTROL_TYPE"        
 [4] "REFSEQ"               "GB_ACC"               "GENE"                
 [7] "GENE_SYMBOL"          "GENE_NAME"            "UNIGENE_ID"          
[10] "ENSEMBL_ID"           "TIGR_ID"              "ACCESSION_STRING"    
[13] "CHROMOSOMAL_LOCATION" "CYTOBAND"             "DESCRIPTION"         
[16] "GO_ID"                "SEQUENCE"    

## that's too much
> fData(dat) <- fData(dat)[,c(1,6:8)]
## now parse the phenotypic data. It's usually a mess
> pData(dat)$title
 [1] "ATDC Donor 1"    "ATDC Donor 2"    "ATDC Donor 3"    "ATDC Donor 4"   
 [5] "ATDC Donor 5"    "ATDC Donor 6"    "MoDC Donor 1"    "MoDC Donor 2"   
 [9] "MoDC Donor 3"    "MoDC Donor 4"    "MoDC Donor 5"    "MoDC Donor 6"   
[13] "MoMacro Donor 1" "MoMacro Donor 2" "MoMacro Donor 3" "MoMacro Donor 4"
[17] "MoMacro Donor 5" "MoMacro Donor 6"

## that will work
> samps <- as.data.frame(do.call(rbind, lapply(strsplit(pData(dat)$title, " "), "[", c(1,3))))
> names(samps) <- c("Cell","Subject")
> samps$Subject <- factor(samps$Subject)
> samps
      Cell Subject
1     ATDC       1
2     ATDC       2
3     ATDC       3
4     ATDC       4
5     ATDC       5
6     ATDC       6
7     MoDC       1
8     MoDC       2
9     MoDC       3
10    MoDC       4
11    MoDC       5
12    MoDC       6
13 MoMacro       1
14 MoMacro       2
15 MoMacro       3
16 MoMacro       4
17 MoMacro       5
18 MoMacro       6

## Much better

## now it's straightforward
> design <- model.matrix(~0+Cell + Subject, samps)
> contrast <- makeContrasts(ATDC - MoDC, ATDC - MoMacro, MoDC - MoMacro, levels = design)
> fit <- eBayes(contrasts.fit(lmFit(dat, design), contrast))

That's all pretty condensed code. I leave it to you to read the limma User's Guide to understand why I did what I did, and to play around with the code and data to understand what each line of code is doing.

ADD COMMENT
0
Entering edit mode

Thank you so much!

Also, I didn't realise it was easier to use GEOquery rather than actually reading the data file in - I will definitely remember that in future.

ADD REPLY
0
Entering edit mode

You can also use GEOquery to get the raw data if available. Unfortunately, what the submitter uploaded as 'raw' data in this instance is not actually the raw data, so it's a moot point.

ADD REPLY
0
Entering edit mode

So I have been working on this very slowly (its a side project), and working through understanding the code.

I am now at the line where we are making contrasts. When I run this code it is coming up with an error message saying "Error in eval(ej, envir = levelsenv) : object 'ATDC' not found".

I have been doing some reading and I think it is because my column name is Cell, and then ATDC/MoDC/MoMacro are under this. But I am unsure how to fix this.

Using the limma users guide, I tried running the following code but that didnt seem to help either:

samples$Cell <- factor(samples$Cell, levels = c("ATDC", "MoDC", "MoMacro"))
ADD REPLY
0
Entering edit mode

Your goal when asking questions should be to provide all the information possible about what you were doing when you got the error. Telling us what you did that didn't work is not useful. Ideally you would run your code and then post enough of the code that others could replicate, like what I did above, as well as the error message, and probably the output from running traceback() right after the error.

ADD REPLY
0
Entering edit mode

Hi James, as I mentioned that I had used the code you previously provided to help (found above) I didn't think to re-print the code. I have now provided it below:

library(dplyr)
library(tidyverse)
library(GEOquery)
library(BiocManager)
library(ggplot2)
library(DESeq2)
library(limma) 

GSEdata <- getGEO("GSE104438")[[1]]
GSEdata
view(GSEdata)

names(fData(GSEdata))
colnames(GSEdata)
summary(GSEdata)

fData(GSEdata)
fData(GSEdata) <- fData(GSEdata)[,c(1, 6:8)]

pData(GSEdata) 
pData(GSEdata)$title # Accesses the data under the variable, title

samples <- as.data.frame(do.call(rbind, lapply(strsplit(pData(GSEdata)$title, " "), "[", c(1,3))))
names(samples) <- c("Cell", "Subject")
samples$Subject <- factor(samples$Subject)
samples

design <- model.matrix(~0 + Cell + Subject, samples)
contrast <- makeContrasts(ATDC - MoDC, ATDC - MoMacro, MoDC - MoMacro, levels = design) # Error message here!

**Error Message**: Error in eval(ej, envir = levelsenv) : object 'ATDC' not found

As I receive the error message at the point of code where we try do the contrasts, I am unable to go any further. As I mentioned, I tried reading and trying some other lines of code but these are not working (so I will not repeat them so as to not cause any further confusion).

As per your suggestion, I ran the traceback code too, which I have put below:

traceback()
3: eval(ej, envir = levelsenv)
2: eval(ej, envir = levelsenv)
1: makeContrasts(ATDC - MoDC, ATDC - MoMacro, MoDC - MoMacro, levels = design)

As I am still getting the hang of R, I would appreciate any help!

ADD REPLY
0
Entering edit mode

Ah, my fault. Look at the colnames of your design. You will need to do one extra step

colnames(design) <- gsub("Cell", "", colnames(design))

And now the column names of your design should be OK. But check!

ADD REPLY

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6