Entering edit mode
Kenlee Nakasugi
▴
30
@kenlee-nakasugi-6076
Last seen 9.6 years ago
Hi,
I was hoping someone would be able to provide me with some general
advice on using EdgeR for some sRNA datasets I have received.
I have 3 sRNA datasets, and I have calculated all abundances (just
read counts) of every sequence in each dataset. Unfortunately, there
are no replicates.
The goal is to find specific sRNA sequences that are higher in
abundance in dataset1 and dataset2 compared to dataset3. As there are
no replicates, I understand that no stats analyses with confidence can
be done on them, and so just want to first get a 'general' indication
of what sequences may be higher in abundance in datasets 1 and 2, and
follow up with other experiments.
I have already generated a subset of 'common' sRNA sequences that are
present in dataset1, 2 and 3, along with their counts. Because the
original library sizes are different between the three, and also there
will be high level of duplicate sequences as these are sRNA sequences,
1. I am not sure if I should just use the edgeR setting to calculate
the library sizes via the sum of the column of the read counts, or use
the actual library size of each dataset, prior to normalization.
Because I am working on just the 'common' subset of sRNA sequences
between the datasets, there may be highly abundant sRNA sequences
unique to each dataset that are missing, and which may have skewed the
distribution of sRNA abundances within each dataset.
2. what dispersion value should I use - these are plant sRNA
sequences, so from experience, can someone suggest a number and I will
go from there
Apart from this, are there any other issues I need to be concerned
about when analyzing such data in edgeR?
Any advice greatly appreciated!
Best regards,
Ken
---
School of Molecular Biosciences
University of Sydney