Hello sir/madam,
I am working on LUAD data obtained from FireBrowse. Its a RNSseqV2 RSEM normalized data of gene expression
(file name: LUAD.rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.data, renamed this file to
LUAD_RNA_seq)
I want to find Differential gene expression between Stages of tumor samples, i.e. in between stage I and Stage IV samples. but i'm not getting any differentially expressed genes.
My questions are,
1) How to select samples specific to stage of tumor? referring to the code below i could extract samples specific to stage I and stage IV but i'm not sure if that is the right method to do so.
2) referring to my R code what is the wrong step i'm following?
3) If my complete analysis to find the DEG is wrong, please tell me what steps I'm suppose to follow or the R packages for the FireBrowse data analysis? or any tutorial which i can follow to get the differentially expression genes.
(if there is no wrong in the analysis please let me know that too.)
Below is the R code i have used to find out the DEG,
rna <- read.table("RNA/LUAD_RNA_seq.txt",nrows = 20533, header = T, row.names = 1, sep = "\t")
rem <- function(x){
x <- as.matrix(x)
x<- t(apply(x,1,as.numeric))
r <- as.numeric(apply(x,1,function(i)sum(i == 0)))
remove <- which(r > dim(x)[2]*0.5)
return(remove)
}
remove <- rem(rna)
rna <- rna[-remove,]
dim(rna)
x <- t(apply(rna,1,as.numeric))
dim(x)
#removing probes without symbol
row.has.na <- apply(x, 1, function(x){any(is.na(x))})
sum(row.has.na)
x <- x[!row.has.na,]
dim(x)
# Filter lowly expressed genes
expr <- x[rowSums(x)>0,] # Remove genes with 0 RPKM in all samples
lowexp <- expr <= 0.1
nsamples <- ncol(expr)
nsamples
lowexpr <- rowSums(lowexp) > (0.95 * nsamples)
log2expr <- log2(expr[!lowexpr,]+1)
head(log2expr)
colnames(rna)
colnames(log2expr)<- colnames(rna)
head(log2expr)
colnames(log2expr) <- gsub('\\.','-',substr(colnames(log2expr),1,28))
#read files of tumor stage
library(xlsx)
Tumor_stage <- read.xlsx(file = "Clinical_merged_picked/Tumor_stage_data.xlsx", sheetIndex = 1)
colnames(Tumor_stage)
Tumor_stage$Stage_I_Barcode <- toupper(Tumor_stage$Stage_I_Barcode)
Tumor_stage$Stage_IV_Barcode <- toupper(Tumor_stage$Stage_IV_Barcode)
#select only tumor samples
Tumor_Samples <- subset(log2expr, select = substr(colnames(log2expr),14,14) == '0')
dim(Tumor_Samples)
#subset the stages (I and IV) of tumor samples from Tumor_samples
Stage_I <- subset(Tumor_Samples, select = substr(colnames(Tumor_Samples),1,12) %in% Tumor_stage$Stage_I_Barcode)
dim(Stage_I)
Stage_IV <- subset(Tumor_Samples, select = substr(colnames(Tumor_Samples),1,12) %in% Tumor_stage$Stage_IV_Barcode)
dim(Stage_IV)
#remove duplicate colnames
Stage_I <- Stage_I[, !duplicated(colnames(Stage_I))]
Stage_IV <- Stage_IV[, !duplicated(colnames(Stage_IV))]
stage_I_IV <- cbind(Stage_I, Stage_IV)
dim(stage_I_IV)
head(stage_I_IV)
#design matrix
groups <- c(rep("Stage_I",277), rep("STage_IV",26))
group <- factor(groups)
group
#design matrix
design <- model.matrix(~ 0 + group)
design
colnames(design)<- c("Stage_I","Stage_IV" )
tail(design)
dim(design)
library(limma)
library(edgeR)
y <- DGEList(stage_I_IV, remove.zeros=T)
v <- voom(y, design)
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(I_IV = Stage_I - Stage_IV, levels = design)
cont.matrix
contrast.fit <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(contrast.fit)
fit2$nsamples <- ncol(stage_I_IV)
fit2
fit2$nsamples
summa.fit <- decideTests(fit2)
summary(summa.fit)
#this is how i get when i check for the DEG
I_IV
Down 0
NotSig 17762
Up 0
deg <- topTable(fit2, coef = "I_IV", p.value = 0.05, lfc = 2, number = nrow(v$E))
The topTable result is zero rows and zero columns
Your answers would really help me to clear my doubts,
Thank you all,
Ankita.
