edgeR reading data

1

Entering edit mode

Wang, Li ▴ 180

@wang-li-5216

Last seen 11.2 years ago

Dear List I am a very starter in edgeR analyses. When reading through the User Guide and homepage of edgeR, I didnot find any examples of the importing data. My RNA-seq data can be divided into two groups, which then could be divided into two subgroups. And each subgroup has 8 replicates. I am writing to ask if someone can give me a small example of the data that could be red in edgeR. I would appreciate your help a lot! Thanks Li

edgeR edgeR • 6.7k views

ADD COMMENT • link updated 13.6 years ago by Gordon Smyth 53k • written 13.6 years ago by Wang, Li ▴ 180

2

Entering edit mode

alessandro.guffanti@genomnia.com ▴ 330

@alessandroguffantigenomniacom-4436

Last seen 11.2 years ago

Hi! what you need to do is to prepare a set of files, one file per sample, with two columns: gene identifier (eg RefSeq ID) and tag count - remember you always need to start from tag count with edgeR NM_01224 134 NM_86659 23 NM_34567 800 ... and so on Let's assume that you have four files names A.txt, B.txt, C.txt, D.txt You need to prepare in R a simple list library(edgeR) files <- c("A.txt","B.txt","C.txt","D.txt") DG <- readDGE(files,header=FALSE) And then you can proceed with the analysis - any subdivision can be done at the difflist level or beyond HTH Alessandro ----------------------------------------------------- Alessandro Guffanti - Bioinformatics, Genomnia srl Via Nerviano, 31 - 20020 Lainate, Milano, Italy Ph: +39-0293305.702 Fax: +39-0293305.777 http://www.genomnia.com "If you can dream it, you can do it" (Walt Disney) -----Original Message----- From: "Wang, Li" <li.wang@ttu.edu> To: "bioconductor@r-project.org" <bioconductor@r-project.org> Date: Wed, 11 Apr 2012 13:42:18 -0500 Subject: [BioC] edgeR reading data Dear List I am a very starter in edgeR analyses. When reading through the User Guide and homepage of edgeR, I didnot find any examples of the importing data. My RNA-seq data can be divided into two groups, which then could be divided into two subgroups. And each subgroup has 8 replicates. I am writing to ask if someone can give me a small example of the data that could be red in edgeR. I would appreciate your help a lot! Thanks Li _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor [https://stat.ethz.ch/mailman/listinfo/bioconductor] Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [http://news.gmane.org/gmane.science.biology.informatics.conductor] ----------------------------------------------------------- Il Contenuto del presente messaggio potrebbe contenere informazioni confidenziali a favore dei soli destinatari del messaggio stesso. Qualora riceviate per errore questo messaggio siete pregati di cancellarlo dalla memoria del computer e di contattare i numeri sopra indicati. Ogni utilizzo o ritrasmissione dei contenuti del messaggio da parte di soggetti diversi dai destinatari è da considerarsi vietato ed abusivo. The information transmitted is intended only for the per...{{dropped:10}}

ADD COMMENT • link 13.6 years ago alessandro.guffanti@genomnia.com ▴ 330

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 13 hours ago

WEHI, Melbourne, Australia

Dear Li, It seems to me that there are four case studies in the edgeR User's Guide that give fully worked examples of reading data into R and into edgeR. Perhaps you might explain where you're trying to import the data from, i.e., what format the data are in now. In his reply, Alessandro Guffanti has explained a good way to read data in, and there are others. Best wishes Gordon > Date: Wed, 11 Apr 2012 13:42:18 -0500 > From: "Wang, Li" <li.wang at="" ttu.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR reading data > > Dear List > > I am a very starter in edgeR analyses. When reading through the User > Guide and homepage of edgeR, I did not find any examples of the > importing data. My RNA-seq data can be divided into two groups, which > then could be divided into two subgroups. And each subgroup has 8 > replicates. I am writing to ask if someone can give me a small example > of the data that could be red in edgeR. > > I would appreciate your help a lot! > > Thanks > Li ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.6 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Gordon Thanks very much for your reply. My data are now in txt format. They are separate files, each representing a sample. In each file, I specify two columns, one for gene Name, the other for expression value (total exon reads, no transformation). I am thinking of the readDGE function as suggested in the manual. I assume that in the function, each time only one file can be red. Then I did to do readDGE for couple of times. And then I donot know how to combine these reads into one table. Also I didnot give any information about library size. How could it be computed from the counts? I appreciate your help a lot! Best wishes Li ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU] Sent: Thursday, April 12, 2012 7:06 PM To: Wang, Li Cc: Bioconductor mailing list Subject: edgeR reading data Dear Li, It seems to me that there are four case studies in the edgeR User's Guide that give fully worked examples of reading data into R and into edgeR. Perhaps you might explain where you're trying to import the data from, i.e., what format the data are in now. In his reply, Alessandro Guffanti has explained a good way to read data in, and there are others. Best wishes Gordon > Date: Wed, 11 Apr 2012 13:42:18 -0500 > From: "Wang, Li" <li.wang at="" ttu.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR reading data > > Dear List > > I am a very starter in edgeR analyses. When reading through the User > Guide and homepage of edgeR, I did not find any examples of the > importing data. My RNA-seq data can be divided into two groups, which > then could be divided into two subgroups. And each subgroup has 8 > replicates. I am writing to ask if someone can give me a small example > of the data that could be red in edgeR. > > I would appreciate your help a lot! > > Thanks > Li ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.6 years ago Wang, Li ▴ 180

0

Entering edit mode

Hi, On Fri, Apr 13, 2012 at 12:35 PM, Wang, Li <li.wang at="" ttu.edu=""> wrote: > Dear Gordon > > Thanks very much for your reply. > My data are now in txt format. They are separate files, each representing a sample. In each file, I specify two columns, one for gene Name, the other for expression value (total exon reads, no transformation). > I am thinking of the readDGE function as suggested in the manual. I assume that in the function, each time only one file can be red. Then I did to do readDGE for couple of times. > And then I donot know how to combine these reads into one table. If all of the rows are in the same order, I can imagine doing something simple like: R> dat <- lapply(file.paths, read.table, ...[[more stuff]]) ## This has two columns (gene id and count), you might pick of the second and cbind R> cnts <- do.call(cbind, lapply(dat, '[[', 2)) If the rows aren't in the same order, you'll want to keep the gene ids and counts together (in 2 column data.frames), then use `merge` or something similar to recursively build an 'uber' count table by keying on the gene/bin/whatever id's. > Also I didnot give any information about library size. How could it be computed from the counts? Once you have a matrix (or data.frame) of counts, isn't this simply a call to `colSums`? Alternatively, if you want to use all aligned reads, you can pick that off easily from the third column in a call to `samtools idxstats YOURBAMFILE` -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Dear Li, Have you read the documentation page for readDGE()? The whole purpose of readDGE() is to combine the files into one table for you. It also computes the library sizes for you automatically. It seems to me that the documentation and User's Guide tells you these things plainly enough. If the documentation isn't enough, could you please re-read Alessandro Guffanti's email to you. He has explained even more clearly how readDGE() reads multiple files at a time. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Fri, 13 Apr 2012, Wang, Li wrote: > Dear Gordon > > Thanks very much for your reply. > My data are now in txt format. They are separate files, each > representing a sample. In each file, I specify two columns, one for gene > Name, the other for expression value (total exon reads, no > transformation). > I am thinking of the readDGE function as suggested in the manual. I > assume that in the function, each time only one file can be red. Then I > did to do readDGE for couple of times. > And then I donot know how to combine these reads into one table. > Also I did not give any information about library size. How could it be > computed from the counts? > > I appreciate your help a lot! > > Best wishes > Li > ________________________________________ > From: Gordon K Smyth [smyth at wehi.EDU.AU] > Sent: Thursday, April 12, 2012 7:06 PM > To: Wang, Li > Cc: Bioconductor mailing list > Subject: edgeR reading data > > Dear Li, > > It seems to me that there are four case studies in the edgeR User's Guide > that give fully worked examples of reading data into R and into edgeR. > Perhaps you might explain where you're trying to import the data from, > i.e., what format the data are in now. > > In his reply, Alessandro Guffanti has explained a good way to read data > in, and there are others. > > Best wishes > Gordon > >> Date: Wed, 11 Apr 2012 13:42:18 -0500 >> From: "Wang, Li" <li.wang at="" ttu.edu=""> >> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >> Subject: [BioC] edgeR reading data >> >> Dear List >> >> I am a very starter in edgeR analyses. When reading through the User >> Guide and homepage of edgeR, I did not find any examples of the >> importing data. My RNA-seq data can be divided into two groups, which >> then could be divided into two subgroups. And each subgroup has 8 >> replicates. I am writing to ask if someone can give me a small example >> of the data that could be red in edgeR. >> >> I would appreciate your help a lot! >> >> Thanks >> Li ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.6 years ago Gordon Smyth 53k

0

Entering edit mode

Dear Gordon and list I am trying to do some analyses similar to the last case study in the user guide of edgeR and confronted with some errors: 1. keep <- rowSums(cpm(x)>0) >= 4 Error in inherits(x, "data.frame") : could not find function "cpm" 2. > els <- y$samples$lib.size * y$samples$norm.factors > aug.count <- 2*ncol(bals)*els/sum(els) > logCPM <- log2( (t(bals)+aug.count)) > plotMDS(logCPM, main="logFC BCV distance") Error in sort.int((x[, i] - x[, j])^2, partial = topindex) : index -495 outside bounds 3.> design <- model.matrix(~group) > logFC <- predFC(y,design) Error: could not find function "predFC" 4. > y <- estimateGLMTrendedDisp(y, design) Loading required package: splines > y <- estimateGLMTagwiseDisp(y, design) > plotBCV(y) Error: could not find function "plotBCV" 5. > cpm(y)[top,order(y$samples$group)] Error: could not find function "cpm" Most of them are about "could not find function ***". The related packages (edgeR, limma and qvalue) are all installed and loaded. So, I am very curious to know why it cannot find functions. I will appreciate your reply very much! Best wishes Li

ADD REPLY • link 13.6 years ago Wang, Li ▴ 180

0

Entering edit mode

Dear Li, Please follow the posting guide: http://www.bioconductor.org/help/mailing-list/posting-guide/ and provide output from sessionInfo(). My guess is that you are reading the User's Guide for edgeR that is part of Bioconductor 2.10 released a few weeks ago, but actually using a much older version of R and edgeR. The function cpm() for example was introduced with Bioconductor Release 2.9 more than six months ago: http://www.bioconductor.org/news/bioc_2_9_release/ so the Bioconductor software you are using must be Release 2.8 or earlier. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Fri, 13 Apr 2012, Wang, Li wrote: > Dear Gordon and list > > I am trying to do some analyses similar to the last case study in the user guide of edgeR and confronted with some errors: > > 1. keep <- rowSums(cpm(x)>0) >= 4 > Error in inherits(x, "data.frame") : could not find function "cpm" > > 2. > els <- y$samples$lib.size * y$samples$norm.factors >> aug.count <- 2*ncol(bals)*els/sum(els) >> logCPM <- log2( (t(bals)+aug.count)) >> plotMDS(logCPM, main="logFC BCV distance") > Error in sort.int((x[, i] - x[, j])^2, partial = topindex) : > index -495 outside bounds > > 3.> design <- model.matrix(~group) >> logFC <- predFC(y,design) > Error: could not find function "predFC" > > 4. > y <- estimateGLMTrendedDisp(y, design) > Loading required package: splines >> y <- estimateGLMTagwiseDisp(y, design) >> plotBCV(y) > Error: could not find function "plotBCV" > > 5. > cpm(y)[top,order(y$samples$group)] > Error: could not find function "cpm" > > Most of them are about "could not find function ***". The related packages (edgeR, limma and qvalue) are all installed and loaded. > So, I am very curious to know why it cannot find functions. > I will appreciate your reply very much! > > Best wishes > Li > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 13.6 years ago Gordon Smyth 53k

Login before adding your answer.