Search
Question: baySeq issues and questions
0
gravatar for lordbleys
3.0 years ago by
lordbleys0
European Union
lordbleys0 wrote:

Hello,

 

I am trying to analyze a dataset of bacterial community sequences. The experimental design is the following : 6 patients were each sampled at 4 sequential time points. I am looking for differential abundance in OTUs, and on paper baySeq sounds like the perfect tool for the job. But is it really possible to use it on 4 repeated measures ?

 

The following question is much more practical : I have been trying to use baySeq on the provided training dataset, but each time I run into errors :

whn running the script

pairCD <- new("countData", data = list(pairData[,1:4], pairData[,5:8]),
                 replicates = c(1,1,2,2),
                 groups = list(NDE = c(1,1,1,1), DE = c(1,1,2,2)),
              densityFunction = bbDensity)

I have an error message saying

"Error in .nextMethod(.Object, ...) : object 'bbDensity' not found"

I feel like I am missing something really obvious, but I can´t see what...

Any suggestions ?

Cheers !

Sébastien

ADD COMMENTlink modified 2.8 years ago • written 3.0 years ago by lordbleys0
1
gravatar for Thomas J Hardcastle
3.0 years ago by
United Kingdom
Thomas J Hardcastle170 wrote:
I have an error message saying

"Error in .nextMethod(.Object, ...) : object 'bbDensity' not found"

I feel like I am missing something really obvious, but I can´t see what...

 

You're using an old version of baySeq. Update to 2.0.50 from Bioconductor and this should work.

This will also allow you to use a densityFunction called mdDensity - this is an extension to the beta-binomial analysis of paired data that allows multiple matched samples to be analysed.

Example code; here cdat is a list, each element of which is a matrix corresponding to all the data for a single timepoint. Columns are matched; i.e., cdat[[1]][,1] refers to the same patient as cdat[[2]][,1].

library(baySeq)

cl <- makeCluster(24)

cD <- new("countData", data = cdat[1:10], replicates = c(1,1,1,1,2,2,2,2), groups =\
 list(NDE = c(1,1,1,1,1,1,1,1), DE = c(1,1,1,1,2,2,2,2)))

densityFunction(cD) <- mdDensity
libsizes(cD) <- getLibsizes(cD)

cD <- getPriors(cD, cl = cl)

cD <- getLikelihoods(cD, cl = cl)

 

This works reasonably well, though if the number of timepoints is too great than the empirical prior estimation becomes too sparse. There will be an improved version released to the Bioconductor development thread shortly which overcomes this problem.

Best wishes,

 

Tom Hardcastle

 

-- 
Dr. Thomas J. Hardcastle
Senior Research Associate

Department of Plant Sciences
University of Cambridge
Downing Street
Cambridge, CB2 3EA
United Kingdom

 

ADD COMMENTlink written 3.0 years ago by Thomas J Hardcastle170
0
gravatar for lordbleys
3.0 years ago by
lordbleys0
European Union
lordbleys0 wrote:

Thank you very much, I´ll try that right away !

Cheers !

Sébastien

ADD COMMENTlink written 3.0 years ago by lordbleys0
0
gravatar for lordbleys
2.9 years ago by
lordbleys0
European Union
lordbleys0 wrote:

Hello again and happy new year everyone,

 

After some unforseen complications, I am back on the dataset analysis using baySeq. However I am still getting errors. I think it comes from the way I input the data, but I tried several combinations without finding the correct one.

 

As previously, my dataset is composed of 24 samples : 6 patients sampled at 4 time points (d0, d2, d5, d60) for OTU abundance.

All the patients are in the same treatment group, but we expect differences between d0/d60 and d2/d5 (meaning d2 and d5 should cluster together and differently from d0 and d60).

 

Here is my code so far :

cl <- makeCluster(24)

otus <-  read.table(file='otus_relat_abund.txt',header=T,as.is=T,sep='\t')

#create timepoint matrixes
d00 <- data.matrix(otus[, 2:7])
d02 <- data.matrix(otus[, 8:13])
d05 <- data.matrix(otus[, 14:19])
d60 <- data.matrix(otus[, 20:25])

Note: each different column of the matrixes correspond to the same patient and have the same name (p01, p02 -> p06) : col 1 of matrix d00, d02, d05 and d60 refers to patient 1 .

cdat <- list(d00,d02,d05,d60)

cD <- new("countData", data = cdat[1:6], replicates = c(1,1,1,1,1,1), groups = list(NDE = c(1,1,1,1,1,1), DE = c(1,1,2,2,1,1)))

Here I am pretty sure the data are not presented in the correct way. But I don't see what combination is correct. I want to indicate that all samples are from the same treatment group (hence the replicates = c(1,1,1,1,1,1) ), but I would also like to indicate that matrixes p00 and p60 is a pair, and matrixes p02 and p05 is the second pair.

If I run this script with these values I can continue but when I arrive to the command:

cD <- getPriors(cD, cl = cl)

I get the error : "24 nodes produced errors; first error: function cannot be evaluated at initial parameters".

Any hint on how to solve this problem would be greatly appreciated.

 

Seb

ADD COMMENTlink written 2.9 years ago by lordbleys0
0
gravatar for Thomas J Hardcastle
2.8 years ago by
United Kingdom
Thomas J Hardcastle170 wrote:

Looking at this, the groups structure does seem to be wrong for what you want; 

groups = list(NDE = c(1,1,1,1,1,1), DE = c(1,1,2,2,1,1)))

indicates that the two models under consideration are one in which all samples behave identically, and one in which the third and fourth samples behave differently to the first, second, fifth and sixth.

You're also missing a libsizes declaration (use the getLibsizes function, if you don't have an alternative way of filling this slot).

However, you want to find differences between the timepoints, with no differences between individuals. There are a couple of ways to do this in baySeq.

The simplest way is to ignore any patient specific effects and treat each timepoint as a separate replicate group. (This is the correct approach for destructive sampling of samples at different timepoints, which is often the case). Then your code would look like:

cD <- new("countData", data = otus[,2:25], replicates = rep(1:4, each = 6), 
groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1), 
DE = c(1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1)))

and then proceed as usual with a negative binomial distribution (the default). Alternatively, you could use the 'allModels' function and consensus priors approach to check diverse patterns behaviour over timepoints; see section 3.4 of the preprint http://biorxiv.org/content/early/2014/11/28/011890.

The second way is to try and incorporate patient specific effects into a multinomial-dirichlet distribution. Here the code would look more like what you already have, though the groups slot changes:

cD <- new("countData", data = cdat, replicates = c(1,1,1,1,1,1), groups = list(NDE = c(1,1,1,1,1,1))

libsizes(cD) <- getLibsizes(cD)

densityFunction(cD) <- mdDensity # use multinomial-dirichlet distribution

cD <- getPriors(cD, cl = cl)

cD <- getLikelihoods(cD, nullData = TRUE, cl = cl)

The 'nullData = TRUE' option here is the bit that will treat separately those genes that show equivalent expression across timepoints, leaving those with high likelihood in the NDE group those that do not.

topCounts(cD, group = "NDE")

However, your model suggests that you believe there are two possible patterns to describe the data; one in which we have equivalent expression across all timepoints, and one in which "d2 and d5 should cluster together and differently from d0 and d60". To specifically model this, you'd need to modify the distributional assumptions in the densityFunction slot of the cD object. This is possible, but may be overkill - let me know if you'd like to see a discussion of this.

Cheers,

Tom

 

ADD COMMENTlink written 2.8 years ago by Thomas J Hardcastle170
0
gravatar for lordbleys
2.8 years ago by
lordbleys0
European Union
lordbleys0 wrote:

Thank you Thomas,

 

I tried ignoring patient effect and treating each timepoint as a separate replicate group. However my patients are so different at baseline (d0 and d60) from one another that I end up looking at background noise with no significant effect. That's why I want to have a patient effect taken into account. That's the whole point of this study : we designed a longitudinal study in order to compare each patient with himself at several times during infection and baseline, in order to reduce the noise coming from the individual differences. And I am having a lot of difficulties finding a way to analyse simultaneously the variations during infection (d2 + d5 vs d0 + d60) and the patient effect. The model I tried to use was an attempt to test my null hypothesis: there are no differences in OTU abundance between d0/d60 and d2/d5.

 

I tried the second solution you suggested. However I still run into the same error after the script: 

cD <- getPriors(cD, cl = cl)
4 nodes produced errors; first error: function cannot be evaluated at initial parameters

And then if I try to continue despite that I run into the next error:

> cD <- getLikelihoods(cD, nullData=TRUE, cl = cl)
Error in weights[[ii]] : subscript out of bounds

 

Any suggestion on what could be wrong ?

 

Thanks !

Seb

ADD COMMENTlink written 2.8 years ago by lordbleys0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 264 users visited in the last hour