Dear List
I am a very starter in edgeR analyses.
When reading through the User Guide and homepage of edgeR, I didnot
find any examples of the importing data.
My RNA-seq data can be divided into two groups, which then could be
divided into two subgroups. And each subgroup has 8 replicates.
I am writing to ask if someone can give me a small example of the
data that could be red in edgeR.
I would appreciate your help a lot!
Thanks
Li
Hi! what you need to do is to prepare a set of files, one file per
sample,
with two
columns: gene identifier (eg RefSeq ID) and tag count - remember you
always
need to
start from tag count with edgeR
NM_01224 134
NM_86659 23
NM_34567 800
...
and so on
Let's assume that you have four files names A.txt, B.txt, C.txt, D.txt
You need to prepare in R a simple list
library(edgeR)
files <- c("A.txt","B.txt","C.txt","D.txt")
DG <- readDGE(files,header=FALSE)
And then you can proceed with the analysis - any subdivision can be
done
at the difflist level or beyond
HTH
Alessandro
-----------------------------------------------------
Alessandro Guffanti - Bioinformatics, Genomnia srl
Via Nerviano, 31 - 20020 Lainate, Milano, Italy
Ph: +39-0293305.702 Fax: +39-0293305.777
http://www.genomnia.com
"If you can dream it, you can do it" (Walt Disney)
-----Original Message-----
From: "Wang, Li" <li.wang@ttu.edu>
To: "bioconductor@r-project.org" <bioconductor@r-project.org>
Date: Wed, 11 Apr 2012 13:42:18 -0500
Subject: [BioC] edgeR reading data
Dear List
I am a very starter in edgeR analyses.
When reading through the User Guide and homepage of edgeR, I didnot
find
any examples of the importing data.
My RNA-seq data can be divided into two groups, which then could be
divided
into two subgroups. And each subgroup has 8 replicates.
I am writing to ask if someone can give me a small example of the
data
that could be red in edgeR.
I would appreciate your help a lot!
Thanks
Li
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
[https://stat.ethz.ch/mailman/listinfo/bioconductor]
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[http://news.gmane.org/gmane.science.biology.informatics.conductor]
-----------------------------------------------------------
Il Contenuto del presente messaggio potrebbe contenere informazioni
confidenziali a favore dei
soli destinatari del messaggio stesso. Qualora riceviate per errore
questo messaggio siete pregati
di cancellarlo dalla memoria del computer e di contattare i numeri
sopra indicati. Ogni utilizzo o
ritrasmissione dei contenuti del messaggio da parte di soggetti
diversi dai destinatari รจ da
considerarsi vietato ed abusivo.
The information transmitted is intended only for the
per...{{dropped:10}}
Dear Li,
It seems to me that there are four case studies in the edgeR User's
Guide
that give fully worked examples of reading data into R and into edgeR.
Perhaps you might explain where you're trying to import the data from,
i.e., what format the data are in now.
In his reply, Alessandro Guffanti has explained a good way to read
data
in, and there are others.
Best wishes
Gordon
> Date: Wed, 11 Apr 2012 13:42:18 -0500
> From: "Wang, Li" <li.wang at="" ttu.edu="">
> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Subject: [BioC] edgeR reading data
>
> Dear List
>
> I am a very starter in edgeR analyses. When reading through the User
> Guide and homepage of edgeR, I did not find any examples of the
> importing data. My RNA-seq data can be divided into two groups,
which
> then could be divided into two subgroups. And each subgroup has 8
> replicates. I am writing to ask if someone can give me a small
example
> of the data that could be red in edgeR.
>
> I would appreciate your help a lot!
>
> Thanks
> Li
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Dear Gordon
Thanks very much for your reply.
My data are now in txt format. They are separate files, each
representing a sample. In each file, I specify two columns, one for
gene Name, the other for expression value (total exon reads, no
transformation).
I am thinking of the readDGE function as suggested in the manual. I
assume that in the function, each time only one file can be red. Then
I did to do readDGE for couple of times.
And then I donot know how to combine these reads into one table.
Also I didnot give any information about library size. How could it be
computed from the counts?
I appreciate your help a lot!
Best wishes
Li
________________________________________
From: Gordon K Smyth [smyth@wehi.EDU.AU]
Sent: Thursday, April 12, 2012 7:06 PM
To: Wang, Li
Cc: Bioconductor mailing list
Subject: edgeR reading data
Dear Li,
It seems to me that there are four case studies in the edgeR User's
Guide
that give fully worked examples of reading data into R and into edgeR.
Perhaps you might explain where you're trying to import the data from,
i.e., what format the data are in now.
In his reply, Alessandro Guffanti has explained a good way to read
data
in, and there are others.
Best wishes
Gordon
> Date: Wed, 11 Apr 2012 13:42:18 -0500
> From: "Wang, Li" <li.wang at="" ttu.edu="">
> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Subject: [BioC] edgeR reading data
>
> Dear List
>
> I am a very starter in edgeR analyses. When reading through the User
> Guide and homepage of edgeR, I did not find any examples of the
> importing data. My RNA-seq data can be divided into two groups,
which
> then could be divided into two subgroups. And each subgroup has 8
> replicates. I am writing to ask if someone can give me a small
example
> of the data that could be red in edgeR.
>
> I would appreciate your help a lot!
>
> Thanks
> Li
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Hi,
On Fri, Apr 13, 2012 at 12:35 PM, Wang, Li <li.wang at="" ttu.edu=""> wrote:
> Dear Gordon
>
> Thanks very much for your reply.
> My data are now in txt format. They are separate files, each
representing a sample. In each file, I specify two columns, one for
gene Name, the other for expression value (total exon reads, no
transformation).
> I am thinking of the readDGE function as suggested in the manual. I
assume that in the function, each time only one file can be red. Then
I did to do readDGE for couple of times.
> And then I donot know how to combine these reads into one table.
If all of the rows are in the same order, I can imagine doing
something simple like:
R> dat <- lapply(file.paths, read.table, ...[[more stuff]])
## This has two columns (gene id and count), you might pick of the
second and cbind
R> cnts <- do.call(cbind, lapply(dat, '[[', 2))
If the rows aren't in the same order, you'll want to keep the gene ids
and counts together (in 2 column data.frames), then use `merge` or
something similar to recursively build an 'uber' count table by keying
on the gene/bin/whatever id's.
> Also I didnot give any information about library size. How could it
be computed from the counts?
Once you have a matrix (or data.frame) of counts, isn't this simply a
call to `colSums`?
Alternatively, if you want to use all aligned reads, you can pick that
off easily from the third column in a call to `samtools idxstats
YOURBAMFILE`
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Dear Li,
Have you read the documentation page for readDGE()? The whole purpose
of
readDGE() is to combine the files into one table for you. It also
computes the library sizes for you automatically. It seems to me that
the
documentation and User's Guide tells you these things plainly enough.
If the documentation isn't enough, could you please re-read Alessandro
Guffanti's email to you. He has explained even more clearly how
readDGE()
reads multiple files at a time.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.auhttp://www.statsci.org/smyth
On Fri, 13 Apr 2012, Wang, Li wrote:
> Dear Gordon
>
> Thanks very much for your reply.
> My data are now in txt format. They are separate files, each
> representing a sample. In each file, I specify two columns, one for
gene
> Name, the other for expression value (total exon reads, no
> transformation).
> I am thinking of the readDGE function as suggested in the manual. I
> assume that in the function, each time only one file can be red.
Then I
> did to do readDGE for couple of times.
> And then I donot know how to combine these reads into one table.
> Also I did not give any information about library size. How could it
be
> computed from the counts?
>
> I appreciate your help a lot!
>
> Best wishes
> Li
> ________________________________________
> From: Gordon K Smyth [smyth at wehi.EDU.AU]
> Sent: Thursday, April 12, 2012 7:06 PM
> To: Wang, Li
> Cc: Bioconductor mailing list
> Subject: edgeR reading data
>
> Dear Li,
>
> It seems to me that there are four case studies in the edgeR User's
Guide
> that give fully worked examples of reading data into R and into
edgeR.
> Perhaps you might explain where you're trying to import the data
from,
> i.e., what format the data are in now.
>
> In his reply, Alessandro Guffanti has explained a good way to read
data
> in, and there are others.
>
> Best wishes
> Gordon
>
>> Date: Wed, 11 Apr 2012 13:42:18 -0500
>> From: "Wang, Li" <li.wang at="" ttu.edu="">
>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
>> Subject: [BioC] edgeR reading data
>>
>> Dear List
>>
>> I am a very starter in edgeR analyses. When reading through the
User
>> Guide and homepage of edgeR, I did not find any examples of the
>> importing data. My RNA-seq data can be divided into two groups,
which
>> then could be divided into two subgroups. And each subgroup has 8
>> replicates. I am writing to ask if someone can give me a small
example
>> of the data that could be red in edgeR.
>>
>> I would appreciate your help a lot!
>>
>> Thanks
>> Li
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Dear Gordon and list
I am trying to do some analyses similar to the last case study in the
user guide of edgeR and confronted with some errors:
1. keep <- rowSums(cpm(x)>0) >= 4
Error in inherits(x, "data.frame") : could not find function "cpm"
2. > els <- y$samples$lib.size * y$samples$norm.factors
> aug.count <- 2*ncol(bals)*els/sum(els)
> logCPM <- log2( (t(bals)+aug.count))
> plotMDS(logCPM, main="logFC BCV distance")
Error in sort.int((x[, i] - x[, j])^2, partial = topindex) :
index -495 outside bounds
3.> design <- model.matrix(~group)
> logFC <- predFC(y,design)
Error: could not find function "predFC"
4. > y <- estimateGLMTrendedDisp(y, design)
Loading required package: splines
> y <- estimateGLMTagwiseDisp(y, design)
> plotBCV(y)
Error: could not find function "plotBCV"
5. > cpm(y)[top,order(y$samples$group)]
Error: could not find function "cpm"
Most of them are about "could not find function ***". The related
packages (edgeR, limma and qvalue) are all installed and loaded.
So, I am very curious to know why it cannot find functions.
I will appreciate your reply very much!
Best wishes
Li
Dear Li,
Please follow the posting guide:
http://www.bioconductor.org/help/mailing-list/posting-guide/
and provide output from sessionInfo(). My guess is that you are
reading
the User's Guide for edgeR that is part of Bioconductor 2.10 released
a
few weeks ago, but actually using a much older version of R and edgeR.
The function cpm() for example was introduced with Bioconductor
Release
2.9 more than six months ago:
http://www.bioconductor.org/news/bioc_2_9_release/
so the Bioconductor software you are using must be Release 2.8 or
earlier.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.auhttp://www.statsci.org/smyth
On Fri, 13 Apr 2012, Wang, Li wrote:
> Dear Gordon and list
>
> I am trying to do some analyses similar to the last case study in
the user guide of edgeR and confronted with some errors:
>
> 1. keep <- rowSums(cpm(x)>0) >= 4
> Error in inherits(x, "data.frame") : could not find function "cpm"
>
> 2. > els <- y$samples$lib.size * y$samples$norm.factors
>> aug.count <- 2*ncol(bals)*els/sum(els)
>> logCPM <- log2( (t(bals)+aug.count))
>> plotMDS(logCPM, main="logFC BCV distance")
> Error in sort.int((x[, i] - x[, j])^2, partial = topindex) :
> index -495 outside bounds
>
> 3.> design <- model.matrix(~group)
>> logFC <- predFC(y,design)
> Error: could not find function "predFC"
>
> 4. > y <- estimateGLMTrendedDisp(y, design)
> Loading required package: splines
>> y <- estimateGLMTagwiseDisp(y, design)
>> plotBCV(y)
> Error: could not find function "plotBCV"
>
> 5. > cpm(y)[top,order(y$samples$group)]
> Error: could not find function "cpm"
>
> Most of them are about "could not find function ***". The related
packages (edgeR, limma and qvalue) are all installed and loaded.
> So, I am very curious to know why it cannot find functions.
> I will appreciate your reply very much!
>
> Best wishes
> Li
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}