Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.4 years ago
Hi
I am writing as I am trying to analyse NGS data from public data (GEO)
specifically datasets such as one sample per time point. The raw
(somewhat processed data) is 3 samples at different time points where
???The read count at exon, splice-junction, transcript and gene levels
were summarized and normalized to relative abundance in Fragments Per
Kilobase of exon model per Million (FPKM) in order to compare
transcription level among samples.???
The authors of this paper then used The differentially expressed
transcripts were identified using M-A based random sampling method
implemented in DEGseq package in BioConductor
(http://bioconductor.org/packages/2.5/bioc/html/DEGseq.html). The
transcripts were further filtered at > 2-fold change and a minimum
read count of 50 in either condition.
I have read through some of your posts where Gordon suggested using a
simple excel formula to achieve fold changes when you don???t have
replicates
lib.size1 <- sum(y1)
>> lib.size2 <- sum(y2)
>> logFC <- log2((y1+0.5)/(lib.size1+0.5)/(y2+0.5)*(lib.size2+0.5))
Is this something I could apply to the current analysis? I have 3
files - with gene ID and counts (one for each sample) and if genes are
not listed in the sample files ??? I assume the counts are zero.
Would you have any suggestions as to what to do with these zero count
reads?
I am trying to avoid learning how to script write at the moment to see
if this analysis works and obviously when I come to more complicated
public data with replicates I will have to invest some time in
learning the bioconductor program!
Many thanks
JILL
-- output of sessionInfo():
w
--
Sent via the guest posting facility at bioconductor.org.