Help With RNA-seq

0

Entering edit mode

Tina Asante Boahene ▴ 100

@tina-asante-boahene-5065

Last seen 11.5 years ago

Hi all, I am conducting some analysis using the Marioni et al data. However, I am having a bit of trouble using my data to conduct the analysis based on the baySeq package. And I was wondering if you could stir me in the right direction. I have already used edgeR to find the library sizes for the ten libraries I have for my data as well as for the groups (Liver and Kidney) as stated below. library(baySeq) library(edgeR) library(limma) library(snow) cl <- makeCluster(4, "SOCK") ##Calculating normalization factors## D=MA.subsetA$M head(D) names(D) dim(D) g <- gsub("R[1-2]L[1-8]", "", colnames(D)) d <- DGEList(counts = D, group = substr(colnames(D), 5, 30)) d$samples names(d) dim(d) I will like to know how to model my code in order to produce the MA plot for count data This is what I have, however it runs with the wrong response. Can someone help me fix this please. CD <- new("countData", data = as.matrix(MA.subsetA$M), libsizes = as.integer(d$samples$lib.size)) plotMA.CD(CD, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", 100), rep("black", 900))) How can I get it to recognise the "groups" as "g" (Library and Kidney) This is the output for the groups [1] "Kidney" "Liver" "Kidney" "Liver" "Liver" "Kidney" "Liver" "Kidney" "Liver" "Kidney" thank you. Kind Regards Tina

Normalization edgeR baySeq Normalization edgeR baySeq • 1.8k views

ADD COMMENT • link updated 14.1 years ago by Valerie Obenchain ★ 6.8k • written 14.1 years ago by Tina Asante Boahene ▴ 100

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 4.1 years ago

United States

Hi Tina, It's difficult to help without knowing what your data look like or what error message you are seeing. Both pieces of information would be helpful. For starters I think you need to provide 'replicate' and 'groups' arguments when you create your new "countData" object. Depending on what order your data are in you need something like, groups <- list(NDE = c(1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,2,1,2,1,2,1)) replicates <- c("Kidney", "Liver", "Kidney", "Liver", "Liver', "Kidney", "Liver", "Kidney", "Liver", "Kidney") Then create your "countData" with these variables, CD <- new("countData", data = as.matrix(MA.subsetA$M), libsizes = as.integer(d$samples$lib.size), replicates = replicates, groups = groups) Now look at the CD object and make sure the columns are labeled as they should be and the other slot values make sense. The MA plot call would look something like, plotMA.CD(CD, samplesA = "Kidney", samplesB = "Liver") The author used the red and black colors for the vignette plot because there was a known structure to the data; the first 100 counts showed differential expression and the last 900 did not. You probably have a different situation in your data so using the same color scheme may not make sense. Valerie On 01/23/2012 06:13 AM, Tina Asante Boahene wrote: > Hi all, > > I am conducting some analysis using the Marioni et al data. > > However, I am having a bit of trouble using my data to conduct the analysis based on the baySeq package. > > And I was wondering if you could stir me in the right direction. > > I have already used edgeR to find the library sizes for the ten libraries I have for my data as well as for the groups (Liver and Kidney) as stated below. > > > library(baySeq) > library(edgeR) > library(limma) > library(snow) > > cl<- makeCluster(4, "SOCK") > > > ##Calculating normalization factors## > D=MA.subsetA$M > head(D) > names(D) > dim(D) > > g<- gsub("R[1-2]L[1-8]", "", colnames(D)) > d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) > d$samples > names(d) > dim(d) > > > I will like to know how to model my code in order to produce the MA plot for count data > > > This is what I have, however it runs with the wrong response. > > Can someone help me fix this please. > > CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = as.integer(d$samples$lib.size)) > > plotMA.CD(CD, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", > 100), rep("black", 900))) > > > How can I get it to recognise the "groups" as "g" (Library and Kidney) > > This is the output for the groups [1] "Kidney" "Liver" "Kidney" "Liver" "Liver" "Kidney" "Liver" "Kidney" "Liver" "Kidney" > > thank you. > > > > > > > Kind Regards > > Tina > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.1 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Hi all, I am still having problems with bayseq Having followed the pdf document associated with it and also tailoring it to the Marioni et al data I am using, it seems that the code has been running for over two days without any results. I am wondering this code be down to my code. I have therefore attached my code to this email hoping that someone can help me solve this problem, thank you. library(baySeq) library(edgeR) library(limma) library(snow) cl <- makeCluster(4, "SOCK") ##Calculating normalization factors## D=MA.subsetA$M head(D) names(D) dim(D) g <- gsub("R[1-2]L[1-8]", "", colnames(D)) d <- DGEList(counts = D, group = substr(colnames(D), 5, 30)) d$samples names(d) dim(d) CD <- new("countData", data = as.matrix(MA.subsetA$M), libsizes = as.integer(d$samples$lib.size), replicates = g) groups(CD) <- list(rep(1, ncol(CD)), g) CD at libsizes <- getLibsizes(CD) plotMA.CD(CD, samplesA = c(1,3,6,8,10), samplesB = c(2,4,5,7,9), col = c(rep("red", 100), rep("black", 900))) ## Optionally adding annotation details to the @annotation slot of the countData object. ## CD at annotation <- data.frame(name = paste("gene", 1:1000, sep = "_")) ### Poisson-Gamma Approach ### CDP.Poi <- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl = cl) CDP.Poi at priors ## This takes time ### CDPost.Poi <- getLikelihoods.Pois(CDP.Poi, pET = "BIC", cl = cl) CDPost.Poi at estProps CDPost.Poi at posteriors[1:10, ] ## A list of the posterior likelihoods each model for the first 10 genes ## CDPost.Poi at posteriors[101:110, ] ## A list of the posterior likelihoods each model for the genes from 101 to 110 ## ### Negative-Binomial Approach ### CDP.NBML <- getPriors.NB(CD, samplesize = 1000, estimation = "QL", cl = cl) CDPost.NBML <- getLikelihoods.NB(CDP.NBML, pET = 'BIC', cl = cl) CDPost.NBML at estProps CDPost.NBML at posteriors[1:10,] CDPost.NBML at posteriors[101:110,] Kind Regards Tina ________________________________________ From: Valerie Obenchain [vobencha@fhcrc.org] Sent: 25 January 2012 18:14 To: Tina Asante Boahene Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Help With RNA-seq Hi Tina, It's difficult to help without knowing what your data look like or what error message you are seeing. Both pieces of information would be helpful. For starters I think you need to provide 'replicate' and 'groups' arguments when you create your new "countData" object. Depending on what order your data are in you need something like, groups <- list(NDE = c(1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,2,1,2,1,2,1)) replicates <- c("Kidney", "Liver", "Kidney", "Liver", "Liver', "Kidney", "Liver", "Kidney", "Liver", "Kidney") Then create your "countData" with these variables, CD <- new("countData", data = as.matrix(MA.subsetA$M), libsizes = as.integer(d$samples$lib.size), replicates = replicates, groups = groups) Now look at the CD object and make sure the columns are labeled as they should be and the other slot values make sense. The MA plot call would look something like, plotMA.CD(CD, samplesA = "Kidney", samplesB = "Liver") The author used the red and black colors for the vignette plot because there was a known structure to the data; the first 100 counts showed differential expression and the last 900 did not. You probably have a different situation in your data so using the same color scheme may not make sense. Valerie On 01/23/2012 06:13 AM, Tina Asante Boahene wrote: > Hi all, > > I am conducting some analysis using the Marioni et al data. > > However, I am having a bit of trouble using my data to conduct the analysis based on the baySeq package. > > And I was wondering if you could stir me in the right direction. > > I have already used edgeR to find the library sizes for the ten libraries I have for my data as well as for the groups (Liver and Kidney) as stated below. > > > library(baySeq) > library(edgeR) > library(limma) > library(snow) > > cl<- makeCluster(4, "SOCK") > > > ##Calculating normalization factors## > D=MA.subsetA$M > head(D) > names(D) > dim(D) > > g<- gsub("R[1-2]L[1-8]", "", colnames(D)) > d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) > d$samples > names(d) > dim(d) > > > I will like to know how to model my code in order to produce the MA plot for count data > > > This is what I have, however it runs with the wrong response. > > Can someone help me fix this please. > > CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = as.integer(d$samples$lib.size)) > > plotMA.CD(CD, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", > 100), rep("black", 900))) > > > How can I get it to recognise the "groups" as "g" (Library and Kidney) > > This is the output for the groups [1] "Kidney" "Liver" "Kidney" "Liver" "Liver" "Kidney" "Liver" "Kidney" "Liver" "Kidney" > > thank you. > > > > > > > Kind Regards > > Tina > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.1 years ago Tina Asante Boahene ▴ 100

0

Entering edit mode

On 01/29/12 07:49, Tina Asante Boahene wrote: > Hi all, > > I am still having problems with bayseq > > Having followed the pdf document associated with it and also tailoring it to the Marioni et al data I am using, it seems that the code has been running for over two days without any results. > > I am wondering this code be down to my code. > > I have therefore attached my code to this email hoping that someone can help me solve this problem, thank you. > > > library(baySeq) > library(edgeR) > library(limma) > library(snow) > > cl<- makeCluster(4, "SOCK") > > > ##Calculating normalization factors## > D=MA.subsetA$M > head(D) > names(D) > dim(D) Please provide the output of these (i.e., head, names, dim). > > g<- gsub("R[1-2]L[1-8]", "", colnames(D)) > d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) > d$samples > names(d) > dim(d) These values would be helpful too. > > > CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = > as.integer(d$samples$lib.size), replicates = g) > groups(CD)<- list(rep(1, ncol(CD)), g) > > CD at libsizes<- getLibsizes(CD) > > plotMA.CD(CD, samplesA = c(1,3,6,8,10), samplesB = c(2,4,5,7,9), col = c(rep("red", > 100), rep("black", 900))) Did this work? Your original question was about plotMA.CD not recognizing your groups. Does the plot work for you now? > > ## Optionally adding annotation details to the @annotation slot of the countData object. ## > CD at annotation<- data.frame(name = paste("gene", 1:1000, sep = "_")) > > > > > ### Poisson-Gamma Approach ### > > CDP.Poi<- getPriors.Pois(CD, samplesize = 2, takemean = TRUE, cl = cl) > > CDP.Poi at priors ## This takes time ### Is the call to getPriors.Pois() that has been running for over 2 days? If not, please specify which function call is taking so long. Did the rest of the code below work for you? Valerie > > CDPost.Poi<- getLikelihoods.Pois(CDP.Poi, pET = "BIC", cl = cl) > CDPost.Poi at estProps > > CDPost.Poi at posteriors[1:10, ] ## A list of the posterior likelihoods each model for the first 10 genes ## > CDPost.Poi at posteriors[101:110, ] ## A list of the posterior likelihoods each model for the genes from 101 to 110 ## > > > ### Negative-Binomial Approach ### > > CDP.NBML<- getPriors.NB(CD, samplesize = 1000, estimation = "QL", cl = cl) > > CDPost.NBML<- getLikelihoods.NB(CDP.NBML, pET = 'BIC', cl = cl) > > CDPost.NBML at estProps > > CDPost.NBML at posteriors[1:10,] > > CDPost.NBML at posteriors[101:110,] > > > Kind Regards > > Tina > ________________________________________ > From: Valerie Obenchain [vobencha at fhcrc.org] > Sent: 25 January 2012 18:14 > To: Tina Asante Boahene > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Help With RNA-seq > > Hi Tina, > > It's difficult to help without knowing what your data look like or what > error message you are seeing. Both pieces of information would be helpful. > > For starters I think you need to provide 'replicate' and 'groups' > arguments when you create your new "countData" object. Depending on what > order your data are in you need something like, > > groups<- list(NDE = c(1,1,1,1,1,1,1,1,1,1), DE = > c(1,2,1,2,2,1,2,1,2,1)) > replicates<- c("Kidney", "Liver", "Kidney", "Liver", "Liver', > "Kidney", "Liver", "Kidney", "Liver", "Kidney") > > Then create your "countData" with these variables, > > CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = > as.integer(d$samples$lib.size), > replicates = replicates, groups = groups) > > Now look at the CD object and make sure the columns are labeled as they > should be and the other slot values make sense. The MA plot call would > look something like, > > plotMA.CD(CD, samplesA = "Kidney", samplesB = "Liver") > > The author used the red and black colors for the vignette plot because > there was a known structure to the data; the first 100 counts showed > differential expression and the last 900 did not. You probably have a > different situation in your data so using the same color scheme may not > make sense. > > Valerie > > > On 01/23/2012 06:13 AM, Tina Asante Boahene wrote: >> Hi all, >> >> I am conducting some analysis using the Marioni et al data. >> >> However, I am having a bit of trouble using my data to conduct the analysis based on the baySeq package. >> >> And I was wondering if you could stir me in the right direction. >> >> I have already used edgeR to find the library sizes for the ten libraries I have for my data as well as for the groups (Liver and Kidney) as stated below. >> >> >> library(baySeq) >> library(edgeR) >> library(limma) >> library(snow) >> >> cl<- makeCluster(4, "SOCK") >> >> >> ##Calculating normalization factors## >> D=MA.subsetA$M >> head(D) >> names(D) >> dim(D) >> >> g<- gsub("R[1-2]L[1-8]", "", colnames(D)) >> d<- DGEList(counts = D, group = substr(colnames(D), 5, 30)) >> d$samples >> names(d) >> dim(d) >> >> >> I will like to know how to model my code in order to produce the MA plot for count data >> >> >> This is what I have, however it runs with the wrong response. >> >> Can someone help me fix this please. >> >> CD<- new("countData", data = as.matrix(MA.subsetA$M), libsizes = as.integer(d$samples$lib.size)) >> >> plotMA.CD(CD, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", >> 100), rep("black", 900))) >> >> >> How can I get it to recognise the "groups" as "g" (Library and Kidney) >> >> This is the output for the groups [1] "Kidney" "Liver" "Kidney" "Liver" "Liver" "Kidney" "Liver" "Kidney" "Liver" "Kidney" >> >> thank you. >> >> >> >> >> >> >> Kind Regards >> >> Tina >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.1 years ago Valerie Obenchain ★ 6.8k

Login before adding your answer.