Entering edit mode
Jill Pleasance
▴
10
@jill-pleasance-5512
Last seen 10.3 years ago
Hi
I am writing as I am trying to analyse NGS data from public data (GEO)
specifically datasets such as one sample per time point. The raw
(somewhat
processed data) is 3 samples at different time points where The read
count
at exon, splice-junction, transcript and gene levels were summarized
and
normalized to relative abundance in Fragments Per Kilobase of exon
model
per Million (FPKM) in order to compare transcription level among
samples.
The authors of this paper then used The differentially expressed
transcripts were identified using M-A based random sampling method
implemented in DEGseq package in BioConductor (
http://bioconductor.org/packages/2.5/bioc/html/DEGseq.html). The
transcripts were further filtered at > 2-fold change and a minimum
read
count of 50 in either condition.
I have read through some of your posts where Gordon suggested using a
simple excel formula to achieve fold changes when you dont have
replicates
*lib.size1 <- sum(y1)*
>>* lib.size2 <- sum(y2)*
>>* logFC <-
log2((y1+0.5)/(lib.size1+0.5)/(y2+0.5)*(lib.size2+0.5))*
* *
Is this something I could apply to the current analysis? I have 3
files -
with gene ID and counts (one for each sample) and if genes are not
listed
in the sample files I assume the counts are zero. Would you have
any
suggestions as to what to do with these zero count reads?
I am trying to avoid learning how to script write at the moment to see
if
this analysis works and obviously when I come to more complicated
public
data with replicates I will have to invest some time in learning the
bioconductor program!
Many thanks
JILL
[[alternative HTML version deleted]]