Question: Using Limma to normalize data sets
0
gravatar for mahm
6 months ago by
mahm20
mahm20 wrote:

I want to read data from the CEL files of GSE53454 and GSE76896 ,both from same platform. Could someone suggest me the steps to be followed.

I had a chance to look at the documentation limmaUsersGuide(). I could find examples of other types other than affy. Could someone suggest me a tutorial on how to process the data from affy?

 

 

 

affy limma • 381 views
ADD COMMENTlink modified 6 months ago by thokall120 • written 6 months ago by mahm20
Answer: Using Limma to normalize data sets
1
gravatar for thokall
6 months ago by
thokall120
Swedish Museum of Natural History
thokall120 wrote:

 

The limma manual has information on importing and analysing Affymetrix data. The second example in section 3.2 contains some basic info. If you are struggling with something more specific please include the code that you have tried so far, as that makes it easier to help out.

 

ADD COMMENTlink written 6 months ago by thokall120

Many thanks for the response. I had gone through that section of the code in 3.2, which is illustrated for 2 color arrays. I want to first load the CEL files of one color array. I'm sorry, I don't have a code .I have worked only with GEO query parsing package. For instance in GEO library, there is getGEO('GSExxxx') command to automatically fetch  the GSE file from the database. How do we get started here? I couldn't really understand what the "targets.txt" (of example in section 3.2)is.

I'm a beginner. Excuse me for the naive questions

ADD REPLYlink modified 6 months ago • written 6 months ago by mahm20

The second examples uses the ReadAffy function that Will import Affymetrix data. The function need a Vector of filenames to read (in the exemple this information stored in target$FileName. You can hence download the files you are interested (the .cel files) and create a vector with these file names and then import your data with the ReadAffy function as follows

Affydata <- ReadAffy(fileNameVector)

 

ADD REPLYlink modified 6 months ago • written 6 months ago by thokall120

I have downloaded the RAW.tar file which contains the  .cdf.gz file and .CEL.gz file for each sample.

Can we give the GSExxxx_RAW.tar file as the names in the filename vector? Actually there are more than 50 .CEL files.

Secondly, I tried,

filename <- c(".../data/GSE76896_RAW.tar",".../data/GSE53454_RAW.tar")
> Affydata <- ReadAffy(filename)
Error: file names must be specified using a character vector, not a ‘list’

But the vector is of character type.
> is.character(filename)
[1] TRUE

Have I missed something?

 

ADD REPLYlink written 6 months ago by mahm20

you need to untar the downloaded object so that you can see the files that this archive contains. If you look at the command given in the limma manual it supplies the character vector to the argument filenames (sorry if I confused you earlier).

If your files are in "Data/GSE53454_RAW" you can list all zipped CEL files in that directory and then import this using the ReadAffy function.

Try this:

downloadedAffyFiles <- list.files(path = "Data/GSE53454_RAW", pattern = "CEL.gz$"
AffyData <- ReadAffy(filenames = downloadedAffyFiles)

 

 
ADD REPLYlink written 6 months ago by thokall120

I tried as you suggested,

The following error appears :(

> downloadedAffyFiles <- list.files(path = "../Data/GSE53454_RAW", pattern = "CEL.gz$")
> AffyData <- ReadAffy(filenames = downloadedAffyFiles)
Error: the following are not valid files:
    GSM1293805_10_4_Control_0h.CEL.gz
   GSM1293806_10_4_Control_12h.CEL.gz
   GSM1293807_10_4_Control_1h.CEL.gz
   GSM1293808_10_4_Control_24h.CEL.gz
   GSM1293809_10_4_Control_2h.CEL.gz
   GSM1293810_10_4_Control_36h.CEL.gz
   GSM1293811_10_4_Control_48h.CEL.gz
   GSM1293812_10_4_Control_4h.CEL.gz
   GSM1293813_10_4_Control_60h.CEL.gz
   GSM1293814_10_4_Control_72h.CEL.gz
   GSM1293815_10_4_Control_84h.CEL.gz
   GSM1293816_10_4_Control_8h.CEL.gz
   GSM1293817_10_4_Control_96h.CEL.gz
   GSM1293818_10_4_Cytok_04h.CEL.gz
   GSM1293819_10_4_Cytok_12h.CEL.gz
   GSM1293820_10_4_Cytok_1h.CEL.gz
   GSM1293821_10_4_Cytok_24h.CEL.gz
   GSM1293822_10_4_Cytok_2h.CEL.gz
   GSM1293823_10_4_Cytok_36h.CEL.gz
   GSM1293824_10_4_Cytok_48h.CEL.gz
   GSM1293825_10_4_Cytok_60h.CEL.gz
   GSM1293826_10_4_Cytok_72h.CEL.gz
   GSM1293827_10_4_Cytok_84h.CEL.gz
   GSM1293828_10_4_Cytok_96h.CEL.gz
   GSM1293829_19_10_Control_0h.CEL.gz
   GSM1293830_19_10_Control_108h.CE

 

ADD REPLYlink written 6 months ago by mahm20

at the help page of ReadAffy (found by ?ReadAffy) find the option 'compress':

compress: are the CEL files compressed?

 

Thus (assuming the last file in your list (GSM1293830_19_10_Control_108h.CE) has a wrong extension because of an incomplete copy/paste error):

AffyData <- ReadAffy(filenames = downloadedAffyFiles, compress=TRUE)
ADD REPLYlink modified 6 months ago • written 6 months ago by Guido Hooiveld2.4k

Check that your working directory contain the files of interest or modify your code to contain the complete path

downloadedAffyFiles <- list.files("~/Downloads/GSE53454_RAW/", pattern = "CEL.gz", full.names=TRUE)
 
ADD REPLYlink written 6 months ago by thokall120

The files are in the current working directory. Also, GSM1293830_19_10_Control_108h.CE was displayed in the terminal.The file in with the correct extension in the directory.Now ,I get a status that reads "Adjusting for non-specific binding.Killed" ? Is something wrong?

AffyBatch object
size of arrays=1164x1164 features (56 kb)
cdf=HG-U133_Plus_2 (54675 affyids)
number of samples=90
number of genes=54675
annotation=hgu133plus2
notes=
Warning messages:
1: replacing previous import ‘AnnotationDbi::tail’ by ‘utils::tail’ when loading ‘hgu133plus2cdf’
2: replacing previous import ‘AnnotationDbi::head’ by ‘utils::head’ when loading ‘hgu133plus2cdf’
Adjusting for optical effect..........................................................................................Done.
Computing affinitiesLoading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

.Done.
Adjusting for non-specific binding.Killed

I'm running the following,

library(gcrma)
library(limma)
downloadedAffyFiles <- list.files(path = "../Data/GSE53454_RAW/", pattern = "CEL.gz$",full.names=TRUE)
AffyData <- ReadAffy(filenames = downloadedAffyFiles)
AffyData
eset <- gcrma(AffyData)
eset
ADD REPLYlink modified 6 months ago • written 6 months ago by mahm20

I have tried your code and I do not have this issue. Below is the output from your code block on my machine. Since I can not reproduce your problem it is hard to know what is going on, but I suggest you double-check that you have the latest version of bioconductor and updated versions of all the packages you use.

> AffyData <- ReadAffy(filenames = downloadedAffyFiles)
> eset <- gcrma(tt)
Adjusting for optical effect.........................................Done.
Computing affinities[1] "Checking to see if your internet connection works..."
installing the source package 'hgu133plus2probe'

trying URL 'https://bioconductor.org/packages/3.7/data/annotation/src/contrib/hgu133plus2probe_2.18.0.tar.gz'
Content type 'application/x-gzip' length 8505171 bytes (8.1 MB)
==================================================
downloaded 8.1 MB

* installing *source* package 'hgu133plus2probe' ...
** R
** data
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (hgu133plus2probe)

The downloaded source packages are in
    '/private/var/folders/2t/bbthtm7j4tb5xdqls3yt61_r0000gn/T/RtmpVq2fpC/downloaded_packages'
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

.Done.
Adjusting for non-specific binding..........................................................................................Done.
Normalizing
Calculating Expression
>
ADD REPLYlink written 6 months ago by thokall120

I'm updated to R version 3.5.1 (2018-07-02) and Bioconductor 3.7.Now, I find this error

eset <- gcrma(AffyData)
Adjusting for optical effect..........................................................................................Done.
Computing affinities[1] "Checking to see if your internet connection works..."
trying URL 'https://bioconductor.org/packages/3.7/data/annotation/src/contrib/hgu133plus2probe_2.18.0.tar.gz'
Content type 'application/x-gzip' length 8505171 bytes (8.1 MB)
==================================================
downloaded 8.1 MB


The downloaded source packages are in
    ‘/tmp/RtmpnOAh1t/downloaded_packages’
Error in (function (package, help, pos = 2, lib.loc = NULL, character.only = FALSE,  :
  there is no package called ‘hgu133plus2probe’
In addition: Warning messages:
1: In system2(cmd0, args, env = env, stdout = outfile, stderr = outfile,  :
  system call failed: Cannot allocate memory
2: In system2(cmd0, args, env = env, stdout = outfile, stderr = outfile,  :
  error in running command
3: In install.packages(probepackage, lib = lib, repos = biocinstallRepos(),  :
  installation of package ‘hgu133plus2probe’ had non-zero exit status

 

Could you please help?

 

ADD REPLYlink written 6 months ago by mahm20

Since I can not reproduce your problems I can not really be of much help here, besides trying to install packages that fails on your own and then use the vignette of gcrma to see how to use this package efficiently.

Good luck!

ADD REPLYlink written 6 months ago by thokall120

Fixed things!!

Works perfect.

Thanks a lot for the tremendous support!

ADD REPLYlink written 6 months ago by mahm20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 332 users visited in the last hour