Search
Question: StringTie + Ballgown: handling biological replicates
1
gravatar for bhawley1991
20 months ago by
bhawley199110
bhawley199110 wrote:

Hi all,

I've been trying to analyse an RNA-seq dataset, and I decided to try the newer HISAT2>StringTie>Ballgown approach instead of Tophat2>Cufflinks>CummeRbund etc.

I'm having real trouble working out how to handle my biological replicates, as there doesn't seem to be much documentation or discussion on these newer tools. It seems like most people would use Cuffnorm and it's easy to see why as you can very easily specify what are your repeats for each sample. I'm sure there's a way to do this in Ballgown but I'm far to inexperienced to spot it so any help would be fantastic.

Thanks in advance.

ADD COMMENTlink modified 8 months ago by linda.boshans0 • written 20 months ago by bhawley199110
1
gravatar for Alyssa Frazee
20 months ago by
Alyssa Frazee110
San Francisco, CA, USA
Alyssa Frazee110 wrote:

Ballgown handles biological replicates. The idea is to run StringTie on each replicate (either biological or technical) separately using the -B option (for "ballgown"), constructing the output directoy structure as specified in which will give you a separate output directory for each replicate, which should look something like this: https://github.com/alyssafrazee/ballgown#loading-data-into-rWhen the data is loaded into R from there, ballgown and the associated statistical tests (in "stattest") assumes only that each sample (each separate output directory) is independent of the others. (So they can either be a set of technical replicates from one biological sample, or a set of biological replicates). 

If you have both biological and technical replicates, one way to handle this with ballgown is to read in the data as you normally would (one directory per bio/tech rep), but include a column in "pData" denoting bio rep ID. Then you could combine expression values across tech reps (e.g. using average expression) to get a data set with one row per bio rep, and you could use that data set with the stattest function. 

 

ADD COMMENTlink written 20 months ago by Alyssa Frazee110

Hi Alyssa,

I have a similar question to what was posted here, except I have 6 biological replicates (2 samples, 3 replicates each) and 4 technical replicates per biological replicates (for a total of 24). I have done as you stated for denoting the replicates in pData. How do I go about combining the expression values and getting the average expression? And at what step of the analysis do I do that for? 

Thanks. 

ADD REPLYlink written 8 months ago by linda.boshans0
0
gravatar for jnpitt
19 months ago by
jnpitt0
jnpitt0 wrote:

Alyssa, can you please demonstrate how you would add the bio rep ID to your built in extdata, to say treat your 20 provided samples as 10 independent biological replicates from 2 different treatments?   and then use stattest to look at the statistically significant changes between the 2 treatments.

ADD COMMENTlink written 19 months ago by jnpitt0
0
gravatar for jnpitt
19 months ago by
jnpitt0
jnpitt0 wrote:

just to answer my own question from the ballgown docs:  

pData(bg) = data.frame(id=sampleNames(bg), group=rep(c(1,0), each=10))

 

here group= assigns the samples to either group 1 or 0, subsequent stattest calls compare groups 0 and 1.

 

 

ADD COMMENTlink written 19 months ago by jnpitt0
0
gravatar for Alyssa Frazee
18 months ago by
Alyssa Frazee110
San Francisco, CA, USA
Alyssa Frazee110 wrote:

Yep, the above is the correct answer. You can edit pData directly. Each column of the data frame is a covariate and each row is a sample; the group each sample belongs to should be denoted by a covariate (column) exactly as you wrote. 

ADD COMMENTlink written 18 months ago by Alyssa Frazee110
0
gravatar for jnpitt
18 months ago by
jnpitt0
jnpitt0 wrote:

another thing that wasn't clear is that ballgown also requires that the sample ids be independent, for example, a samples vector

filelist <-c("/data/wildtype/sample1", "/data/wildtype/sample2","/data/wildtype/sample3", "/data/mutant/sample1","/data/mutant/sample2", "/data/mutant/sample3") 

when loaded into ballgown thus:

bg = ballgown(samples= filelist,meas='all')

will NOT be treated as independent samples...however renaming the directories thus will:

filelist <-c("/data/wildtype/sample1", "/data/wildtype/sample2","/data/wildtype/sample3", "/data/mutant/sample4","/data/mutant/sample5", "/data/mutant/sample6") 

 

 

 

ADD COMMENTlink modified 18 months ago • written 18 months ago by jnpitt0
0
gravatar for linda.boshans
8 months ago by
linda.boshans0 wrote:

Hi Alyssa,

I have a similar question to what was posted here, except I have 6 biological replicates (2 samples, 3 replicates each) and 4 technical replicates per biological replicates (for a total of 24). I have done as you stated for denoting the replicates in pData. How do I go about combining the expression values and getting the average expression? And at what step of the analysis do I do that for? 

Thanks. 

ADD COMMENTlink written 8 months ago by linda.boshans0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 256 users visited in the last hour