detecting oscillating genes from RNA-seq data
2
0
Entering edit mode
mali salmon ▴ 370
@mali-salmon-4532
Last seen 6.2 years ago
Israel
Hello List I have RNA-seq data from different time points and I would like to find oscillating genes. I thought of using the "cycle" package (which is based on Fourier score) , but I'm not sure what values to use: FPKM or DESeq/edgeR normalized values. Any suggestion what would be more appropriate? Thanks Mali [[alternative HTML version deleted]]
• 1.7k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 6 days ago
United States
hi Mali, You would certainly want to normalize each sample based on library size, which happens in all methods, although the DESeq and edgeR methods for library size normalization use a robust estimator, while the 'sum of mapped reads' is is strongly influenced by the features with highest counts. I can't tell from the documentation if the standardise() function in the cycle package is creating mean 0, variance 1 along the rows or columns, but this would be relevant to your question. If you were calculating simple gene-gene distances and looking for genes which cluster together, it would be important to center the log-scale counts along the genes, as you would probably like to find genes which rise and fall together regardless of the base level. (see the sweep() function in base R) Another factor is the dependence of the variance of log-transformed counts on the mean. From a quick read of the vignette, the model in cycle package has normally-distributed error with a variance that depends on time, but not explicitly on the mean. You might try first using one of the transformations described in the DESeq2 vignette and then working with the variance stabilized values. Mike On Tue, Apr 1, 2014 at 8:33 AM, mali salmon <shalmom1@gmail.com> wrote: > Hello List > I have RNA-seq data from different time points and I would like to find > oscillating genes. > I thought of using the "cycle" package (which is based on Fourier score) , > but I'm not sure what values to use: FPKM or DESeq/edgeR normalized values. > Any suggestion what would be more appropriate? > Thanks > Mali > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia
Dear Mali, It is easy to do such an analysis directly in edgeR. Suppose for example that you are looking for a daily cycle, and that you have at least half a dozen time points spanning more than a full day. The method is to create basis vectors for a periodic expression pattern. Suppose that 'time' is time in hours: s1 <- sin(2*pi*time/24) c1 <- cos(2*pi*time/24) design <- model.matrix(~s1+c1) This will give a design matrix with three columns: one for the intercept and two for the periodic signal. To test for a periodic cycle: fit <- glmFit(y,design) lrt <- glmLRT(fit,coef=2:3) If you have lots of time points and you want to make a more complex curve you can add a harmonic: s2 <- sin(4*pi*time/24) c2 <- cos(4*pi*time/24) design <- model.matrix(~s1+c1+s2+c2) There are now four coefficients parametrizing the periodic curve. (These can be viewed as representing amplitudes and phase shifts for the two harmonics.) I think you would seldom want more than two harmonics for RNA-seq data. This method can easily be adapted to find cell-cycle genes and so on. Best wishes Gordon > Date: Tue, 1 Apr 2014 14:33:27 +0200 > From: mali salmon <shalmom1 at="" gmail.com=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] detecting oscillating genes from RNA-seq data > > Hello List > I have RNA-seq data from different time points and I would like to find > oscillating genes. > I thought of using the "cycle" package (which is based on Fourier score) , > but I'm not sure what values to use: FPKM or DESeq/edgeR normalized values. > Any suggestion what would be more appropriate? > Thanks > Mali ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Dear Prof. Smith Thank you so much for the explanation of how to do the analysis. If you don't mind I'll describe the design of our experiment just to be sure that I'm not doing any mistake. We work on a sea animal, and we are interested with day/night oscillating genes (24 hours cycle) and with tidal genes (12.4 h cycle). In the first week we have sacrified animals every 4 hours across 48 h, and then repeated the same procedure the week after. So in total we have 24 samples, 4 "replicates" of 6 time points (2am,6am,10am,2pm,6pm,10pm) >From your mail I understand that time is a vector of time points in hours, but I'm not sure how to define it, it can be: 1) time <- rep(seq(2,46,4),2) 2) time <- seq(2,94,4) 3) time <- rep(seq(2,22,4),4) I saw this matter for the cos vector. I suppose that for detecting tidal genes I just have to repeat the analysis with 12.4 cycle. Thanks again Mali On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Mali, > > It is easy to do such an analysis directly in edgeR. > > Suppose for example that you are looking for a daily cycle, and that you > have at least half a dozen time points spanning more than a full day. The > method is to create basis vectors for a periodic expression pattern. > Suppose that 'time' is time in hours: > > s1 <- sin(2*pi*time/24) > c1 <- cos(2*pi*time/24) > design <- model.matrix(~s1+c1) > > This will give a design matrix with three columns: one for the intercept > and two for the periodic signal. To test for a periodic cycle: > > fit <- glmFit(y,design) > lrt <- glmLRT(fit,coef=2:3) > > If you have lots of time points and you want to make a more complex curve > you can add a harmonic: > > s2 <- sin(4*pi*time/24) > c2 <- cos(4*pi*time/24) > design <- model.matrix(~s1+c1+s2+c2) > > There are now four coefficients parametrizing the periodic curve. (These > can be viewed as representing amplitudes and phase shifts for the two > harmonics.) I think you would seldom want more than two harmonics for > RNA-seq data. > > This method can easily be adapted to find cell-cycle genes and so on. > > Best wishes > Gordon > > > Date: Tue, 1 Apr 2014 14:33:27 +0200 >> From: mali salmon <shalmom1@gmail.com> >> To: "bioconductor@r-project.org" <bioconductor@r-project.org> >> Subject: [BioC] detecting oscillating genes from RNA-seq data >> >> Hello List >> I have RNA-seq data from different time points and I would like to find >> oscillating genes. >> I thought of using the "cycle" package (which is based on Fourier score) , >> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized >> values. >> Any suggestion what would be more appropriate? >> Thanks >> Mali >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
On Thu, 3 Apr 2014, mali salmon wrote: > Dear Prof. Smith > Thank you so much for the explanation of how to do the analysis. > If you don't mind I'll describe the design of our experiment just to be > sure that I'm not doing any mistake. > We work on a sea animal, and we are interested with day/night oscillating > genes (24 hours cycle) and with tidal genes (12.4 h cycle). > In the first week we have sacrified animals every 4 hours across 48 h, and > then repeated the same procedure the week after. > So in total we have 24 samples, 4 "replicates" of 6 time points > (2am,6am,10am,2pm,6pm,10pm) > From your mail I understand that time is a vector of time points in hours, > but I'm not sure how to define it, it can be: > 1) time <- rep(seq(2,46,4),2) > 2) time <- seq(2,94,4) > 3) time <- rep(seq(2,22,4),4) Makes no difference. All three choices will give exactly the same results. > I saw this matter for the cos vector. > I suppose that for detecting tidal genes I just have to repeat the analysis > with 12.4 cycle. You need to carefully identify the tidal stage applicable to each of the 24 samples. Clearly the tidal phase will have moved in the second week. For example you might code 'tide' to be between 0 and 1, with 0/1 for high tide and 0.5 for low tide. Then: tides <- sin(2*pi*tide) tidec <- cos(2*pi*tide) Then you need to fit both day/night and tidal cycles to your data at the same time: ~s1+c1+tides+tidec It may be difficult to distinguish a tidal cycle from a 12-hour cycle. You might try to check check this by: ~s1+c1+s2+c2+tides+tidec Gordon > Thanks again > Mali > > > On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Mali, >> >> It is easy to do such an analysis directly in edgeR. >> >> Suppose for example that you are looking for a daily cycle, and that you >> have at least half a dozen time points spanning more than a full day. The >> method is to create basis vectors for a periodic expression pattern. >> Suppose that 'time' is time in hours: >> >> s1 <- sin(2*pi*time/24) >> c1 <- cos(2*pi*time/24) >> design <- model.matrix(~s1+c1) >> >> This will give a design matrix with three columns: one for the intercept >> and two for the periodic signal. To test for a periodic cycle: >> >> fit <- glmFit(y,design) >> lrt <- glmLRT(fit,coef=2:3) >> >> If you have lots of time points and you want to make a more complex curve >> you can add a harmonic: >> >> s2 <- sin(4*pi*time/24) >> c2 <- cos(4*pi*time/24) >> design <- model.matrix(~s1+c1+s2+c2) >> >> There are now four coefficients parametrizing the periodic curve. (These >> can be viewed as representing amplitudes and phase shifts for the two >> harmonics.) I think you would seldom want more than two harmonics for >> RNA-seq data. >> >> This method can easily be adapted to find cell-cycle genes and so on. >> >> Best wishes >> Gordon >> >> >> Date: Tue, 1 Apr 2014 14:33:27 +0200 >>> From: mali salmon <shalmom1 at="" gmail.com=""> >>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>> Subject: [BioC] detecting oscillating genes from RNA-seq data >>> >>> Hello List >>> I have RNA-seq data from different time points and I would like to find >>> oscillating genes. >>> I thought of using the "cycle" package (which is based on Fourier score) , >>> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized >>> values. >>> Any suggestion what would be more appropriate? >>> Thanks >>> Mali >>> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY
0
Entering edit mode
Thank you so much Prof. Smyth for your reply. I have been trying to follow your suggestions, and encountered an error when trying to run estimateGLMTagwiseDisp. Below is the code I used and the design matrix. Many thanks Mali >library(edgeR) >x <- read.delim("count.matrix", row.names=1, stringsAsFactors=FALSE) >x<-round(x) // since I have RSEM values >y <- DGEList(counts=x) >keep <- rowSums(cpm(y)>1) >= 4 >y <- y[keep,] >y$samples$lib.size <- colSums(y$counts) >y <- calcNormFactors(y) >time <- rep(seq(2,22,4),4) >s1 <- sin(2*pi*time/24) >c1 <- cos(2*pi*time/24) >tide<-c(1,0,0.5,1,0,0.5,0.5,1,0,0.5,1,0.5,1,0,0.5,1,0.5,0.5,0.5,1,0,0 .5,1,0) *// I used 1 for high tide, 0 for low tide, and 0.5 in-between (rising or falling)* >tides <- sin(2*pi*tide) >tidec <- cos(2*pi*tide) >design <- model.matrix(~s1+c1+tides+tidec) >design (Intercept) s1 c1 tides tidec 1 1 0.5 8.660254e-01 -2.449294e-16 1 2 1 1.0 6.123234e-17 0.000000e+00 1 3 1 0.5 -8.660254e-01 1.224647e-16 -1 4 1 -0.5 -8.660254e-01 -2.449294e-16 1 5 1 -1.0 -1.836970e-16 0.000000e+00 1 6 1 -0.5 8.660254e-01 1.224647e-16 -1 7 1 0.5 8.660254e-01 1.224647e-16 -1 8 1 1.0 6.123234e-17 -2.449294e-16 1 9 1 0.5 -8.660254e-01 0.000000e+00 1 10 1 -0.5 -8.660254e-01 1.224647e-16 -1 11 1 -1.0 -1.836970e-16 -2.449294e-16 1 12 1 -0.5 8.660254e-01 1.224647e-16 -1 13 1 0.5 8.660254e-01 -2.449294e-16 1 14 1 1.0 6.123234e-17 0.000000e+00 1 15 1 0.5 -8.660254e-01 1.224647e-16 -1 16 1 -0.5 -8.660254e-01 -2.449294e-16 1 17 1 -1.0 -1.836970e-16 1.224647e-16 -1 18 1 -0.5 8.660254e-01 1.224647e-16 -1 19 1 0.5 8.660254e-01 1.224647e-16 -1 20 1 1.0 6.123234e-17 -2.449294e-16 1 21 1 0.5 -8.660254e-01 0.000000e+00 1 22 1 -0.5 -8.660254e-01 1.224647e-16 -1 23 1 -1.0 -1.836970e-16 -2.449294e-16 1 24 1 -0.5 8.660254e-01 0.000000e+00 1 y <- estimateGLMCommonDisp(y,design) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) *Error in dispCoxReidInterpolateTagwise(y, design, offset = offset, dispersion, : * * design matrix must be full column rank* > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] edgeR_3.4.2 limma_3.18.12 loaded via a namespace (and not attached): [1] tools_3.0.2 On Fri, Apr 4, 2014 at 2:19 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > > On Thu, 3 Apr 2014, mali salmon wrote: > > Dear Prof. Smith >> Thank you so much for the explanation of how to do the analysis. >> If you don't mind I'll describe the design of our experiment just to be >> sure that I'm not doing any mistake. >> We work on a sea animal, and we are interested with day/night oscillating >> genes (24 hours cycle) and with tidal genes (12.4 h cycle). >> In the first week we have sacrified animals every 4 hours across 48 h, and >> then repeated the same procedure the week after. >> So in total we have 24 samples, 4 "replicates" of 6 time points >> (2am,6am,10am,2pm,6pm,10pm) >> From your mail I understand that time is a vector of time points in hours, >> but I'm not sure how to define it, it can be: >> 1) time <- rep(seq(2,46,4),2) >> 2) time <- seq(2,94,4) >> 3) time <- rep(seq(2,22,4),4) >> > > Makes no difference. All three choices will give exactly the same results. > > > I saw this matter for the cos vector. >> I suppose that for detecting tidal genes I just have to repeat the >> analysis >> with 12.4 cycle. >> > > You need to carefully identify the tidal stage applicable to each of the > 24 samples. Clearly the tidal phase will have moved in the second week. > For example you might code 'tide' to be between 0 and 1, with 0/1 for high > tide and 0.5 for low tide. Then: > > tides <- sin(2*pi*tide) > tidec <- cos(2*pi*tide) > > Then you need to fit both day/night and tidal cycles to your data at the > same time: > > ~s1+c1+tides+tidec > > It may be difficult to distinguish a tidal cycle from a 12-hour cycle. You > might try to check check this by: > > ~s1+c1+s2+c2+tides+tidec > > Gordon > > > Thanks again >> Mali >> >> >> On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: >> >> Dear Mali, >>> >>> It is easy to do such an analysis directly in edgeR. >>> >>> Suppose for example that you are looking for a daily cycle, and that you >>> have at least half a dozen time points spanning more than a full day. >>> The >>> method is to create basis vectors for a periodic expression pattern. >>> Suppose that 'time' is time in hours: >>> >>> s1 <- sin(2*pi*time/24) >>> c1 <- cos(2*pi*time/24) >>> design <- model.matrix(~s1+c1) >>> >>> This will give a design matrix with three columns: one for the intercept >>> and two for the periodic signal. To test for a periodic cycle: >>> >>> fit <- glmFit(y,design) >>> lrt <- glmLRT(fit,coef=2:3) >>> >>> If you have lots of time points and you want to make a more complex curve >>> you can add a harmonic: >>> >>> s2 <- sin(4*pi*time/24) >>> c2 <- cos(4*pi*time/24) >>> design <- model.matrix(~s1+c1+s2+c2) >>> >>> There are now four coefficients parametrizing the periodic curve. (These >>> can be viewed as representing amplitudes and phase shifts for the two >>> harmonics.) I think you would seldom want more than two harmonics for >>> RNA-seq data. >>> >>> This method can easily be adapted to find cell-cycle genes and so on. >>> >>> Best wishes >>> Gordon >>> >>> >>> Date: Tue, 1 Apr 2014 14:33:27 +0200 >>> >>>> From: mali salmon <shalmom1@gmail.com> >>>> To: "bioconductor@r-project.org" <bioconductor@r-project.org> >>>> Subject: [BioC] detecting oscillating genes from RNA-seq data >>>> >>>> Hello List >>>> I have RNA-seq data from different time points and I would like to find >>>> oscillating genes. >>>> I thought of using the "cycle" package (which is based on Fourier >>>> score) , >>>> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized >>>> values. >>>> Any suggestion what would be more appropriate? >>>> Thanks >>>> Mali >>>> >>>> >>> ______________________________________________________________________ >>> The information in this email is confidential and intended solely for the >>> addressee. >>> You must not disclose, forward, print or use it without the permission of >>> the sender. >>> ______________________________________________________________________ >>> >>> >> > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD REPLY
0
Entering edit mode
On Sun, 6 Apr 2014, mali salmon wrote: > Thank you so much Prof. Smyth for your reply. > I have been trying to follow your suggestions, and encountered an error > when trying to run estimateGLMTagwiseDisp. > > Below is the code I used and the design matrix. > > Many thanks > Mali > >> library(edgeR) >> x <- read.delim("count.matrix", row.names=1, stringsAsFactors=FALSE) >> x<-round(x) // since I have RSEM values >> y <- DGEList(counts=x) >> keep <- rowSums(cpm(y)>1) >= 4 >> y <- y[keep,] >> y$samples$lib.size <- colSums(y$counts) >> y <- calcNormFactors(y) >> time <- rep(seq(2,22,4),4) >> s1 <- sin(2*pi*time/24) >> c1 <- cos(2*pi*time/24) >> tide<-c(1,0,0.5,1,0,0.5,0.5,1,0,0.5,1,0.5,1,0,0.5,1,0.5,0.5,0.5,1,0 ,0.5,1,0) *// > I used 1 for high tide, 0 for low tide, and 0.5 in-between (rising or > falling)* No, this isn't going to work, for several reasons. First, you have equated low and high tide as the same value, because a periodic function returns to the same value at 1 as it had at 0. You need to do instead what I suggested, which was to use 0.5 for low tide and 0 or 1 (makes no difference which) for high tide. Second, you can't simply put every in-between stage to the same value. This is far, far too rough. The rising and falling stages are not the same. The successive values for 'tide' have to be at intervals of 4hr/12.4hr = 0.323 for observations that are four hours apart. Third, you have coded the second week as being in exactly the same tidal phase as for the first week at each clock time. This cannot be true, and is again far too rough. If the tidal cycle has a period of 12.4 hours, then there must be a gradual phase shift in the tidal cycle by about 0.8/12.4 each day. You have to code the tidal values more precisely, and you have to determine accurately the phase change of the tidal cycle between the first week and the second. For the analysis to have any hope of working you have to be able to interpolate all the tidal values accurately to at least one or two decimal places. If you can't do it as accurately as that, then it won't be worth doing it all. It may also be good idea to consult a mathematician or statistician at your university to help you interpret the results. Good luck Gordon >> tides <- sin(2*pi*tide) >> tidec <- cos(2*pi*tide) >> design <- model.matrix(~s1+c1+tides+tidec) > >> design > (Intercept) s1 c1 tides tidec > 1 1 0.5 8.660254e-01 -2.449294e-16 1 > 2 1 1.0 6.123234e-17 0.000000e+00 1 > 3 1 0.5 -8.660254e-01 1.224647e-16 -1 > 4 1 -0.5 -8.660254e-01 -2.449294e-16 1 > 5 1 -1.0 -1.836970e-16 0.000000e+00 1 > 6 1 -0.5 8.660254e-01 1.224647e-16 -1 > 7 1 0.5 8.660254e-01 1.224647e-16 -1 > 8 1 1.0 6.123234e-17 -2.449294e-16 1 > 9 1 0.5 -8.660254e-01 0.000000e+00 1 > 10 1 -0.5 -8.660254e-01 1.224647e-16 -1 > 11 1 -1.0 -1.836970e-16 -2.449294e-16 1 > 12 1 -0.5 8.660254e-01 1.224647e-16 -1 > 13 1 0.5 8.660254e-01 -2.449294e-16 1 > 14 1 1.0 6.123234e-17 0.000000e+00 1 > 15 1 0.5 -8.660254e-01 1.224647e-16 -1 > 16 1 -0.5 -8.660254e-01 -2.449294e-16 1 > 17 1 -1.0 -1.836970e-16 1.224647e-16 -1 > 18 1 -0.5 8.660254e-01 1.224647e-16 -1 > 19 1 0.5 8.660254e-01 1.224647e-16 -1 > 20 1 1.0 6.123234e-17 -2.449294e-16 1 > 21 1 0.5 -8.660254e-01 0.000000e+00 1 > 22 1 -0.5 -8.660254e-01 1.224647e-16 -1 > 23 1 -1.0 -1.836970e-16 -2.449294e-16 1 > 24 1 -0.5 8.660254e-01 0.000000e+00 1 > > > y <- estimateGLMCommonDisp(y,design) > y <- estimateGLMTrendedDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > > *Error in dispCoxReidInterpolateTagwise(y, design, offset = offset, > dispersion, : * > * design matrix must be full column rank* > > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] edgeR_3.4.2 limma_3.18.12 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > > > On Fri, Apr 4, 2014 at 2:19 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> >> On Thu, 3 Apr 2014, mali salmon wrote: >> >> Dear Prof. Smith >>> Thank you so much for the explanation of how to do the analysis. >>> If you don't mind I'll describe the design of our experiment just to be >>> sure that I'm not doing any mistake. >>> We work on a sea animal, and we are interested with day/night oscillating >>> genes (24 hours cycle) and with tidal genes (12.4 h cycle). >>> In the first week we have sacrified animals every 4 hours across 48 h, and >>> then repeated the same procedure the week after. >>> So in total we have 24 samples, 4 "replicates" of 6 time points >>> (2am,6am,10am,2pm,6pm,10pm) >>> From your mail I understand that time is a vector of time points in hours, >>> but I'm not sure how to define it, it can be: >>> 1) time <- rep(seq(2,46,4),2) >>> 2) time <- seq(2,94,4) >>> 3) time <- rep(seq(2,22,4),4) >>> >> >> Makes no difference. All three choices will give exactly the same results. >> >> >> I saw this matter for the cos vector. >>> I suppose that for detecting tidal genes I just have to repeat the >>> analysis >>> with 12.4 cycle. >>> >> >> You need to carefully identify the tidal stage applicable to each of the >> 24 samples. Clearly the tidal phase will have moved in the second week. >> For example you might code 'tide' to be between 0 and 1, with 0/1 for high >> tide and 0.5 for low tide. Then: >> >> tides <- sin(2*pi*tide) >> tidec <- cos(2*pi*tide) >> >> Then you need to fit both day/night and tidal cycles to your data at the >> same time: >> >> ~s1+c1+tides+tidec >> >> It may be difficult to distinguish a tidal cycle from a 12-hour cycle. You >> might try to check check this by: >> >> ~s1+c1+s2+c2+tides+tidec >> >> Gordon >> >> >> Thanks again >>> Mali >>> >>> >>> On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>> Dear Mali, >>>> >>>> It is easy to do such an analysis directly in edgeR. >>>> >>>> Suppose for example that you are looking for a daily cycle, and that you >>>> have at least half a dozen time points spanning more than a full day. >>>> The >>>> method is to create basis vectors for a periodic expression pattern. >>>> Suppose that 'time' is time in hours: >>>> >>>> s1 <- sin(2*pi*time/24) >>>> c1 <- cos(2*pi*time/24) >>>> design <- model.matrix(~s1+c1) >>>> >>>> This will give a design matrix with three columns: one for the intercept >>>> and two for the periodic signal. To test for a periodic cycle: >>>> >>>> fit <- glmFit(y,design) >>>> lrt <- glmLRT(fit,coef=2:3) >>>> >>>> If you have lots of time points and you want to make a more complex curve >>>> you can add a harmonic: >>>> >>>> s2 <- sin(4*pi*time/24) >>>> c2 <- cos(4*pi*time/24) >>>> design <- model.matrix(~s1+c1+s2+c2) >>>> >>>> There are now four coefficients parametrizing the periodic curve. (These >>>> can be viewed as representing amplitudes and phase shifts for the two >>>> harmonics.) I think you would seldom want more than two harmonics for >>>> RNA-seq data. >>>> >>>> This method can easily be adapted to find cell-cycle genes and so on. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> >>>> Date: Tue, 1 Apr 2014 14:33:27 +0200 >>>> >>>>> From: mali salmon <shalmom1 at="" gmail.com=""> >>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] detecting oscillating genes from RNA-seq data >>>>> >>>>> Hello List >>>>> I have RNA-seq data from different time points and I would like to find >>>>> oscillating genes. >>>>> I thought of using the "cycle" package (which is based on Fourier >>>>> score) , >>>>> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized >>>>> values. >>>>> Any suggestion what would be more appropriate? >>>>> Thanks >>>>> Mali >>>>> >>>>> >>>> ______________________________________________________________________ >>>> The information in this email is confidential and intended solely for the >>>> addressee. >>>> You must not disclose, forward, print or use it without the permission of >>>> the sender. >>>> ______________________________________________________________________ >>>> >>>> >>> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ >> > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 461 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6