Handling both technical and biological replicates in Ballgown
4
0
Entering edit mode
@lindaboshans-12526
Last seen 7.7 years ago

Hello,

I am using the Hisat2 - Stringtie - Ballgown pipeline that was published in nature protocols in 2016.  When I upload my ballgown data in to r, I have 4 lane technical replicates per biological sample, in addition to 3 biological samples per condition (2 conditions total). This leads to 24 "samples" as Ballgown calls it. I have denoted which sample are tech_reps and biol_reps using the pData function. However, I'd like to collapse the technical replicates through average expression so that I'm left with 6 samples, 3 biological replicates per condition. I am not familiar working with S4 objects. I was able to get the average expression by using  rowMeans(subset(bg@expr$trans, select = c(my columns)). I had averaged all technical replicates for each biological replicate, and then took a subset of that to eliminate the tech_rep columns. However, when  I fed that bg object into stattest, I got the following error: 

Error in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected

Which leads me to believe I am doing this incorrectly. Any help would be greatly appreciated!! 

ballgown biologicalreplicates technical replicates stringtie • 2.8k views
ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.8 years ago
United States
Hello I think the way to handle this would be to indicate the technical reps in the model matrix. If you include them as factor terms in the matrix then you will get a similar result to if you averaged them before analyzing. Hope that helps! Jeff On Thu, Mar 9, 2017, 7:16 PM linda.boshans [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User linda.boshans <https: support.bioconductor.org="" u="" 12526=""/> wrote Question: > Handling both technical and biological replicates in Ballgown > <https: support.bioconductor.org="" p="" 93661=""/>: > > Hello, > > I am using the Hisat2 - Stringtie - Ballgown pipeline that was published > in nature protocols in 2016. When I upload my ballgown data in to r, I > have 4 lane technical replicates per biological sample, in addition to 3 > biological samples per condition (2 conditions total). This leads to 24 > "samples" as Ballgown calls it. I have denoted which sample are tech_reps > and biol_reps using the pData function. However, I'd like to collapse the > technical replicates through average expression so that I'm left with 6 > samples, 3 biological replicates per condition. I am not familiar working > with S4 objects. I was able to get the average expression by using > rowMeans(subset(bg@expr$trans, select = c(my columns)). I had averaged > all technical replicates for each biological replicate, and then took a > subset of that to eliminate the tech_rep columns. However, when I fed that > bg object into stattest, I got the following error: > > Error in `[.data.frame`(x, r, vars, drop = drop) : > undefined columns selected > > Which leads me to believe I am doing this incorrectly. Any help would be > greatly appreciated!! > ------------------------------ > > Post tags: ballgown, biologicalreplicates, technical replicates, stringtie > > You may reply via email or visit Handling both technical and biological replicates in Ballgown >
ADD COMMENT
0
Entering edit mode

Hi Jeff,

Thank you for your response, but I don't quite understand. Could you please elaborate? How do I include the technical replicates as factors and how would that address the replicate issue? Do I assign the tech reps as a separate column in pData? 

I have the following pData table:

      id condition replicate
1  D1_01      Dlx2        D1
2  D1_02      Dlx2        D1
3  D1_03      Dlx2        D1
4  D1_04      Dlx2        D1
5  D2_01      Dlx2        D2
6  D2_02      Dlx2        D2
7  D2_03      Dlx2        D2
8  D2_04      Dlx2        D2
9  D3_01      Dlx2        D3
10 D3_02      Dlx2        D3
11 D3_03      Dlx2        D3
12 D3_04      Dlx2        D3
13 V1_01    Vector        V1
14 V1_02    Vector        V1
15 V1_03    Vector        V1
16 V1_04    Vector        V1
17 V2_01    Vector        V2
18 V2_02    Vector        V2
19 V2_03    Vector        V2
20 V2_04    Vector        V2
21 V4_01    Vector        V4
22 V4_02    Vector        V4
23 V4_03    Vector        V4
24 V4_04    Vector        V4

So when i run the stattest : results_transcripts = stattest(bg_filt, feature="transcript", covariate="condition", adjustvars = "replicate", getFC=TRUE, meas="FPKM")

I get the following error: Coefficients not estimable: replicateV4 
Error in solve.default(t(mod) %*% mod) : 
  system is computationally singular: reciprocal condition number = 5.58641e-27

which makes me believe I am doing something incorrect. I've perused through ALL topics that address tech reps in ballgown and it all leads back to denoting a replicate column in pData(bg), which I have already done. 

ADD REPLY
0
Entering edit mode

Probably a bit late, but just for future reference. The `adjustvars` option takes a vector, not a string. If you change it to

adjustvars=c("replicate")

It should work.

ADD REPLY
0
Entering edit mode

I Linda, I am facing same error which you are facing here. Error is

Coefficients not estimable: technicalreplicateNBM9 
Error in solve.default(t(mod) %*% mod) : 
  system is computationally singular: reciprocal condition number = 1.51496e-27
Calls: stattest -> f.pvalue -> solve -> solve.default
In addition: Warning message:
Partial NA coefficients for 47686 probe(s) 
Execution halted

 

 

Can you please tell me how did you solved it?

 

Thanks,

Sandeep

 

ADD REPLY
0
Entering edit mode
Alyssa Frazee ▴ 210
@alyssa-frazee-6710
Last seen 4.1 years ago
San Francisco, CA, USA

The "system is computationally singular" error generally means that either one of the variables (in either "covariate" or "adjustvars") has the same value for every sample, or that two variables that are multiples of each other are in that list. Hope this helps!

ADD COMMENT
0
Entering edit mode
jnorth • 0
@jnorth-14485
Last seen 7.1 years ago

Hi Alyssa,

Thank you for all your feedback and help over the years. 

How do you exclude genes that possess this property, if possible? I am having difficulty finding the whereabouts for the line/s of code to achieve this.

Kind regards, Julian

ADD COMMENT
0
Entering edit mode
Alyssa Frazee ▴ 210
@alyssa-frazee-6710
Last seen 4.1 years ago
San Francisco, CA, USA

Hey Julian, the issue isn't that genes need to be excluded. If you get the "system is computationally singular" error, it means that one of your adjustvars" is perfectly correlated with either another adjustvar or the "covariate" variable (you can see this in the pData -- if there's any combination of adjustvars + covariate where there's only one example of that combination in your pData, you'll need to either gather more data or define a different set of adjustment variables).

ADD COMMENT

Login before adding your answer.

Traffic: 450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6