MBMethPred introduction

MBMethPred is a user-friendly package developed for the accurate prediction of medulloblastoma subgroups using DNA methylation beta values. It incorporates seven machine learning models, including Random Forest, K-Nearest Neighbors, Support Vector Machine, Linear Discriminant Analysis, Extreme Gradient Boosting, Naive Bayes, and a neural network model specifically designed for the complexities of medulloblastoma data. The package provides streamlined workflows for data preprocessing, feature selection, model training, cross-validation, and prediction. This vignette offers detailed explanations, examples, and resulting outputs for each functionality. The MBMethPred package was tested on an Ubuntu machine equipped with an Intel Core i5-6200U processor and 16GB RAM.

Input file for prediction

The ReadMethylFile is a function for reading DNA methylation beta values files and use them as new data for prediction by every model. The input for this function should be either CSV or TSV file format. Please uncomment the following lines and run the function.

Usage

# set.seed(1234)
# fac <- ncol(Data1)
# NewData <- sample(data.frame(t(Data1[,-fac])),10)
# NewData <- cbind(rownames(NewData), NewData)
# colnames(NewData)[1] <- "ID"
# write.csv(NewData, "NewData.csv", quote = FALSE, row.names = FALSE)
# methyl <- ReadMethylFile(File = "NewData.csv")

This function has only one argument, the File. The first column of the File is the CpG methylation probe that starts with cg characters and is followed by a number (e.g., cg100091). Other columns are samples with methylation beta values. All columns in the data frame should have a name.

Box plot

The BoxPlot function draws a box plot out of DNA methylation beta values or other data frames.

Usage


data <- Data2[1:20,]
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
BoxPlot(File = data, Projname = NULL)

plot of chunk unnamed-chunk-3

This function has two arguments as follow:

t-SNE 3D plot

The TSNEPlot function draws a 3D t-SNE plot for DNA methylation dataset using the K-means clustering technique. This function has two arguments File (any matrices) and NCluster ( number of clusters for K-Means clustering).

Usage

data <- data.frame(t(Data2[1:100,]))
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
TSNEPlot(File = data, NCluster = 4)

An R window will appear with a 3D projection of the t-SNE result. The plot object can be saved with the next line of code (uncomment).

# rgl.snapshot('tsne3d.png', fmt = 'png')

Input file for similarity network fusion (SNF)

Using ReadSNFData function, one can read files (any matrices with CSV or TSV format) and feed them into the similarity network fusion (SNF) function (from the SNFtools package). Please uncomment the following lines and run the function.

Usage

# data(Data2) # Gene expression 
# Data2 <- cbind(rownames(Data2), Data2)
# colnames(Data2)[1] <- "ID"
# write.csv(Data2, "Data2.csv", row.names = FALSE)
# Data2 <- ReadSNFData(File = "Data2.csv")

Similarity network fusion (SNF)

The SimilarityNetworkFusion is a function to perform SNF function (from SNFtool package) and output clusters.

Usage

data(RLabels) # Real labels
data(Data2) # Methylation
data(Data3) # Gene expression
snf <- SimilarityNetworkFusion(Files = list(Data2, Data3),
                               NNeighbors  = 13,
                               Sigma = 0.75,
                               NClusters = 4,
                               CLabels = c("Group4", "SHH", "WNT", "Group3"),
                               RLabels = RLabels,
                               Niterations = 60)

plot of chunk unnamed-chunk-7

snf
#>  [1] SHH    Group3 Group4 Group4 Group4 SHH    SHH    Group3 Group4 SHH   
#> [11] WNT    SHH    SHH    WNT    SHH    WNT    Group3 Group3 Group3 Group4
#> [21] Group4 Group3 Group3 Group3 Group4 Group4 Group4 Group3 Group3 SHH   
#> [31] SHH    SHH    SHH    SHH    Group4 Group3 SHH    Group4 Group4 Group3
#> [41] Group4 Group4 WNT    Group3 Group4 Group4 Group4 Group4 SHH    Group4
#> Levels: Group4 SHH WNT Group3

This function has several arguments as follow:

Support vector machine model

The SupportVectorMachineModel is a function to train a support vector machine model to classify medulloblastoma subgroups using DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Model metrics, including accuracy, precision, sensitivity F1-Score, specificity, and AUC_average can be calculated for the test dataset using the ModelMetrics function, which calculates the average of the above parameters from the result of the ConfusionMatrix function.

The prediction result on new data can be accessed through the NewDataPredictionResult function, which calculates every prediction’s mode across the number of cross-validation folds.

Usage

set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"

svm <- SupportVectorMachineModel(SplitRatio = 0.8, 
                                 CV = 10, 
                                 NCores = 1, 
                                 NewData = NewData)
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  41   0      0
#>   WNT         1   0  37      0
#>   Group4      1   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     28   0   0      0
#>   SHH         0  42   0      0
#>   WNT         1   0  36      0
#>   Group4      1   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     27   0   0      0
#>   SHH         0  43   0      0
#>   WNT         1   0  35      0
#>   Group4      1   0   0     60
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  39   0      0
#>   WNT         0   0  38      0
#>   Group4      1   0   0     57
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  42   0      0
#>   WNT         1   0  37      0
#>   Group4      1   0   0     56
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     27   0   0      0
#>   SHH         0  40   0      0
#>   WNT         1   0  37      0
#>   Group4      2   0   0     52
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     24   0   0      0
#>   SHH         0  40   0      0
#>   WNT         1   0  38      0
#>   Group4      1   0   0     55
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      0
#>   SHH         0  42   0      0
#>   WNT         1   0  34      0
#>   Group4      0   0   0     61
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      1
#>   SHH         0  43   0      0
#>   WNT         1   0  39      0
#>   Group4      1   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  42   0      0
#>   WNT         1   0  38      0
#>   Group4      1   0   0     57
ModelMetrics(Model = svm)
#> $ConfusionMatrix
#>         y_pred
#>          Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  41   0      0
#>   WNT         1   0  37      0
#>   Group4      1   0   0     59
#> 
#> $ModelPerformance
#>        Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3    0.988     0.932       0.996    0.963       0.986       0.985
#> SHH       1.000     1.000       1.000    1.000       1.000       0.985
#> WNT       0.995     1.000       0.976    0.988       1.000       0.985
#> Group4    0.993     0.998       0.983    0.990       0.999       0.985
NewDataPredictionResult(Model = svm)
#>            Subgroup
#> GSM2261711   Group3
#> X78             WNT
#> GSM2261640   Group4
#> GSM2261575   Group4
#> X135            WNT
#> GSM2262184   Group3
#> GSM2261613   Group3
#> X130            WNT
#> GSM2261922   Group4
#> GSM2261980   Group3

This function has the following arguments:

K nearest neighbor model

The KNearestNeighborModel is a function to train a K nearest neighbor model to classify medulloblastoma subgroups using DNA methylation beta values.

Usage

set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"

knn <- KNearestNeighborModel(SplitRatio = 0.8, 
                             CV = 10, 
                             K = 3, 
                             NCores = 1, 
                             NewData = NewData)
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      1
#>   SHH         0  41   0      0
#>   WNT         0   0  38      0
#>   Group4      1   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     27   0   0      1
#>   SHH         0  42   0      0
#>   WNT         0   0  37      0
#>   Group4      1   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      1
#>   SHH         0  43   0      0
#>   WNT         0   0  36      0
#>   Group4      0   0   0     61
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  38   1      0
#>   WNT         0   0  38      0
#>   Group4      0   0   0     58
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      1
#>   SHH         0  42   0      0
#>   WNT         0   0  38      0
#>   Group4      0   0   0     57
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     27   0   0      0
#>   SHH         0  40   0      0
#>   WNT         0   0  38      0
#>   Group4      1   0   0     53
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     23   0   0      1
#>   SHH         0  40   0      0
#>   WNT         0   0  39      0
#>   Group4      1   0   0     55
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     24   0   0      1
#>   SHH         0  42   0      0
#>   WNT         0   0  35      0
#>   Group4      1   0   0     60
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  43   0      0
#>   WNT         0   0  40      0
#>   Group4      0   0   0     60
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      1
#>   SHH         0  42   0      0
#>   WNT         0   0  39      0
#>   Group4      0   0   0     58
ModelMetrics(Model = knn)
#> $ConfusionMatrix
#>         knnclass_pred
#>          Group3 SHH WNT Group4
#>   Group3     25   0   0      1
#>   SHH         0  41   0      0
#>   WNT         0   0  38      0
#>   Group4      1   0   0     59
#> 
#> $ModelPerformance
#>        Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3    0.993     0.981       0.973    0.977       0.996       0.985
#> SHH       0.999     1.000       0.997    0.999       1.000       0.985
#> WNT       0.999     0.997       1.000    0.999       0.999       0.985
#> Group4    0.993     0.988       0.991    0.990       0.993       0.985
NewDataPredictionResult(Model = knn)
#>            Subgroup
#> GSM2261711   Group3
#> X78             WNT
#> GSM2261640   Group4
#> GSM2261575   Group4
#> X135            WNT
#> GSM2262184   Group3
#> GSM2261613   Group3
#> X130            WNT
#> GSM2261922   Group4
#> GSM2261980   Group3

This function has the following arguments:

Random forest model

The RandomForestModel is a function to train a random forest model to classify medulloblastoma subgroups using DNA methylation beta values.

Usage

set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"

rf <- RandomForestModel(SplitRatio = 0.8, 
                        CV = 10, 
                        NTree = 100, 
                        NCores = 1, 
                        NewData = NewData)
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  41   0      0
#>   WNT         0   0  38      0
#>   Group4      0   0   0     60
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     28   0   0      0
#>   SHH         0  42   0      0
#>   WNT         0   0  37      0
#>   Group4      0   0   0     60
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     27   0   0      0
#>   SHH         0  43   0      0
#>   WNT         0   0  36      0
#>   Group4      0   0   0     61
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  39   0      0
#>   WNT         0   0  38      0
#>   Group4      0   0   0     58
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  42   0      0
#>   WNT         0   0  38      0
#>   Group4      0   0   0     57
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      1
#>   SHH         0  40   0      0
#>   WNT         0   0  38      0
#>   Group4      0   0   0     54
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     24   0   0      0
#>   SHH         0  40   0      0
#>   WNT         0   0  39      0
#>   Group4      0   0   0     56
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      0
#>   SHH         0  42   0      0
#>   WNT         0   0  35      0
#>   Group4      0   0   0     61
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  43   0      0
#>   WNT         0   0  40      0
#>   Group4      0   0   0     60
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  42   0      0
#>   WNT         0   0  39      0
#>   Group4      0   0   0     58
ModelMetrics(Model = rf)
#> $ConfusionMatrix
#>         y_pred
#>          Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  41   0      0
#>   WNT         0   0  38      0
#>   Group4      0   0   0     60
#> 
#> $ModelPerformance
#>        Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3    0.999     1.000       0.996    0.998       1.000       0.998
#> SHH       1.000     1.000       1.000    1.000       1.000       0.998
#> WNT       1.000     1.000       1.000    1.000       1.000       0.998
#> Group4    0.999     0.998       1.000    0.999       0.999       0.998
NewDataPredictionResult(Model = rf)
#>            Subgroup
#> GSM2261711   Group3
#> X78             WNT
#> GSM2261640   Group4
#> GSM2261575   Group4
#> X135            WNT
#> GSM2262184   Group3
#> GSM2261613   Group3
#> X130            WNT
#> GSM2261922   Group4
#> GSM2261980   Group3

This function has the following arguments:

XGBoost model

The XGBoostModel is a A function to train an XGBoost model to classify medulloblastoma subgroups using DNA methylation beta values.

Usage

set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"

xgboost <- XGBoostModel(SplitRatio = 0.8, 
                        CV = 10, 
                        NCores = 1, 
                        NewData = NewData)
#> [1]	train-mlogloss:0.390594 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.177861 
#> [3]	train-mlogloss:0.087035 
#> [4]	train-mlogloss:0.043112 
#> [5]	train-mlogloss:0.022536 
#> [6]	train-mlogloss:0.012486 
#> [7]	train-mlogloss:0.007278 
#> [8]	train-mlogloss:0.004395 
#> [9]	train-mlogloss:0.002879 
#> [10]	train-mlogloss:0.002457 
#>       y_pred
#> y_true  0  1  2  3
#>      0 24  1  0  1
#>      1  2 58  0  0
#>      2  0  0 41  0
#>      3  3  0  0 35
#> 
#> [1]	train-mlogloss:0.388419 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.177664 
#> [3]	train-mlogloss:0.085746 
#> [4]	train-mlogloss:0.043333 
#> [5]	train-mlogloss:0.022637 
#> [6]	train-mlogloss:0.012444 
#> [7]	train-mlogloss:0.007140 
#> [8]	train-mlogloss:0.004413 
#> [9]	train-mlogloss:0.002823 
#> [10]	train-mlogloss:0.002431 
#>       y_pred
#> y_true  0  1  2  3
#>      0 28  0  0  0
#>      1  0 60  0  0
#>      2  0  1 41  0
#>      3  3  0  0 34
#> 
#> [1]	train-mlogloss:0.388072 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.176992 
#> [3]	train-mlogloss:0.085394 
#> [4]	train-mlogloss:0.043119 
#> [5]	train-mlogloss:0.022323 
#> [6]	train-mlogloss:0.012245 
#> [7]	train-mlogloss:0.006953 
#> [8]	train-mlogloss:0.004304 
#> [9]	train-mlogloss:0.002808 
#> [10]	train-mlogloss:0.002544 
#>       y_pred
#> y_true  0  1  2  3
#>      0 27  0  0  0
#>      1  0 61  0  0
#>      2  0  1 42  0
#>      3  3  0  0 33
#> 
#> [1]	train-mlogloss:0.386945 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.175823 
#> [3]	train-mlogloss:0.085418 
#> [4]	train-mlogloss:0.042969 
#> [5]	train-mlogloss:0.022146 
#> [6]	train-mlogloss:0.012049 
#> [7]	train-mlogloss:0.006975 
#> [8]	train-mlogloss:0.004246 
#> [9]	train-mlogloss:0.002766 
#> [10]	train-mlogloss:0.002319 
#>       y_pred
#> y_true  0  1  2  3
#>      0 25  1  0  0
#>      1  0 58  0  0
#>      2  0  0 39  0
#>      3  1  0  0 37
#> 
#> [1]	train-mlogloss:0.387957 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.177210 
#> [3]	train-mlogloss:0.085601 
#> [4]	train-mlogloss:0.043317 
#> [5]	train-mlogloss:0.022903 
#> [6]	train-mlogloss:0.012530 
#> [7]	train-mlogloss:0.007282 
#> [8]	train-mlogloss:0.004478 
#> [9]	train-mlogloss:0.002934 
#> [10]	train-mlogloss:0.002514 
#>       y_pred
#> y_true  0  1  2  3
#>      0 26  0  0  0
#>      1  2 55  0  0
#>      2  0  0 42  0
#>      3  2  0  0 36
#> 
#> [1]	train-mlogloss:0.390082 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.177320 
#> [3]	train-mlogloss:0.085780 
#> [4]	train-mlogloss:0.043300 
#> [5]	train-mlogloss:0.022592 
#> [6]	train-mlogloss:0.012513 
#> [7]	train-mlogloss:0.007264 
#> [8]	train-mlogloss:0.004434 
#> [9]	train-mlogloss:0.002923 
#> [10]	train-mlogloss:0.002552 
#>       y_pred
#> y_true  0  1  2  3
#>      0 27  0  0  0
#>      1  1 53  0  0
#>      2  0  0 39  1
#>      3  3  0  0 35
#> 
#> [1]	train-mlogloss:0.391327 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.178573 
#> [3]	train-mlogloss:0.086585 
#> [4]	train-mlogloss:0.043456 
#> [5]	train-mlogloss:0.022623 
#> [6]	train-mlogloss:0.012235 
#> [7]	train-mlogloss:0.007101 
#> [8]	train-mlogloss:0.004310 
#> [9]	train-mlogloss:0.002876 
#> [10]	train-mlogloss:0.002484 
#>       y_pred
#> y_true  0  1  2  3
#>      0 24  0  0  0
#>      1  0 56  0  0
#>      2  0  0 40  0
#>      3  3  0  0 36
#> 
#> [1]	train-mlogloss:0.387270 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.176343 
#> [3]	train-mlogloss:0.085810 
#> [4]	train-mlogloss:0.042979 
#> [5]	train-mlogloss:0.022061 
#> [6]	train-mlogloss:0.011726 
#> [7]	train-mlogloss:0.006691 
#> [8]	train-mlogloss:0.004024 
#> [9]	train-mlogloss:0.002633 
#> [10]	train-mlogloss:0.002421 
#>       y_pred
#> y_true  0  1  2  3
#>      0 25  0  0  0
#>      1  0 61  0  0
#>      2  0  0 41  1
#>      3  2  0  0 33
#> 
#> [1]	train-mlogloss:0.385785 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.175053 
#> [3]	train-mlogloss:0.084670 
#> [4]	train-mlogloss:0.042764 
#> [5]	train-mlogloss:0.022229 
#> [6]	train-mlogloss:0.011936 
#> [7]	train-mlogloss:0.006921 
#> [8]	train-mlogloss:0.004281 
#> [9]	train-mlogloss:0.002846 
#> [10]	train-mlogloss:0.002425 
#>       y_pred
#> y_true  0  1  2  3
#>      0 25  1  0  0
#>      1  0 60  0  0
#>      2  0  0 43  0
#>      3  2  0  0 38
#> 
#> [1]	train-mlogloss:0.388743 
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#> 
#> [2]	train-mlogloss:0.176686 
#> [3]	train-mlogloss:0.086097 
#> [4]	train-mlogloss:0.043017 
#> [5]	train-mlogloss:0.022624 
#> [6]	train-mlogloss:0.012143 
#> [7]	train-mlogloss:0.007023 
#> [8]	train-mlogloss:0.004200 
#> [9]	train-mlogloss:0.002729 
#> [10]	train-mlogloss:0.002506 
#>       y_pred
#> y_true  0  1  2  3
#>      0 25  1  0  0
#>      1  0 58  0  0
#>      2  0  0 42  0
#>      3  3  0  0 36
ModelMetrics(Model = xgboost)
#> $ConfusionMatrix
#>         y_pred
#> y_truth  Group3 Group4 SHH WNT
#>   Group3     24      1   0   1
#>   Group4      2     58   0   0
#>   SHH         0      0  41   0
#>   WNT         3      0   0  35
#> 
#> $ModelPerformance
#>        Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3    0.979     0.896       0.981    0.936       0.978       0.968
#> Group4    0.993     0.990       0.991    0.991       0.994       0.968
#> SHH       0.998     1.000       0.990    0.995       1.000       0.968
#> WNT       0.983     0.992       0.934    0.962       0.998       0.968
NewDataPredictionResult(Model = xgboost)
#>            Subgroup
#> GSM2261711   Group3
#> X78             WNT
#> GSM2261640   Group4
#> GSM2261575   Group4
#> X135            WNT
#> GSM2262184   Group3
#> GSM2261613   Group3
#> X130            WNT
#> GSM2261922   Group4
#> GSM2261980   Group3

This function has the following arguments:

Linear discriminant analysis model

The LinearDiscriminantAnalysisModel is a function to train a linear discriminant analysis model to classify medulloblastoma subgroups using DNA methylation beta values.

Usage

set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"

lda <- LinearDiscriminantAnalysisModel(SplitRatio = 0.8, 
                                       CV = 10, 
                                       NCores = 1, 
                                       NewData = NewData)
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     22   1   0      3
#>   SHH         0  41   0      0
#>   WNT         0   0  38      0
#>   Group4      4   1   0     55
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      2
#>   SHH         0  42   0      0
#>   WNT         1   0  36      0
#>   Group4      6   0   0     54
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     22   0   1      4
#>   SHH         0  43   0      0
#>   WNT         1   0  35      0
#>   Group4      7   0   1     53
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     24   0   1      1
#>   SHH         0  38   0      1
#>   WNT         0   0  38      0
#>   Group4      9   0   0     49
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      1
#>   SHH         0  42   0      0
#>   WNT         0   0  37      1
#>   Group4      2   0   0     55
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     24   0   0      3
#>   SHH         1  38   0      1
#>   WNT         0   0  38      0
#>   Group4      3   0   0     51
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     22   0   0      2
#>   SHH         1  39   0      0
#>   WNT         1   0  38      0
#>   Group4      6   0   0     50
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     23   0   0      2
#>   SHH         0  41   0      1
#>   WNT         1   0  34      0
#>   Group4     11   0   1     49
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     22   1   0      3
#>   SHH         0  42   0      1
#>   WNT         1   0  39      0
#>   Group4      6   0   0     54
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     22   1   1      2
#>   SHH         0  42   0      0
#>   WNT         1   0  38      0
#>   Group4      6   0   0     52
ModelMetrics(Model = lda)
#> $ConfusionMatrix
#>         y_pred
#>          Group3 SHH WNT Group4
#>   Group3     22   1   0      3
#>   SHH         0  41   0      0
#>   WNT         0   0  38      0
#>   Group4      4   1   0     55
#> 
#> $ModelPerformance
#>        Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3    0.941     0.778       0.889    0.828       0.951        0.91
#> SHH       0.994     0.991       0.985    0.988       0.997        0.91
#> WNT       0.993     0.986       0.981    0.984       0.996        0.91
#> Group4    0.945     0.949       0.893    0.920       0.973        0.91
NewDataPredictionResult(Model = lda)
#>            Subgroup
#> GSM2261711   Group3
#> X78             WNT
#> GSM2261640   Group4
#> GSM2261575   Group4
#> X135            WNT
#> GSM2262184   Group3
#> GSM2261613   Group3
#> X130            WNT
#> GSM2261922   Group4
#> GSM2261980   Group3

This function has the following arguments:

Naive bayes model

The NaiveBayesModel is a function to train a Naive Bayes model to classify medulloblastoma subgroups using DNA methylation beta values.

Usage

set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"

nb <- NaiveBayesModel(SplitRatio = 0.8, 
                      CV = 10, 
                      Threshold = 0.8, 
                      NCores = 1, 
                      NewData = NewData)
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  41   0      0
#>   WNT         3   0  35      0
#>   Group4      1   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     28   0   0      0
#>   SHH         0  42   0      0
#>   WNT         3   0  34      0
#>   Group4      1   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     27   0   0      0
#>   SHH         0  43   0      0
#>   WNT         3   0  33      0
#>   Group4      2   0   0     59
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  39   0      0
#>   WNT         1   0  37      0
#>   Group4      2   0   0     56
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  42   0      0
#>   WNT         2   0  36      0
#>   Group4      2   0   0     55
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     27   0   0      0
#>   SHH         0  40   0      0
#>   WNT         3   0  35      0
#>   Group4      2   0   0     52
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     24   0   0      0
#>   SHH         0  40   0      0
#>   WNT         3   0  36      0
#>   Group4      2   0   0     54
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     25   0   0      0
#>   SHH         0  42   0      0
#>   WNT         3   0  32      0
#>   Group4      1   0   0     60
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  43   0      0
#>   WNT         3   0  37      0
#>   Group4      2   0   0     58
#> 
#>         y_pred
#> y_true   Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  42   0      0
#>   WNT         3   0  36      0
#>   Group4      1   0   0     57
ModelMetrics(Model = nb)
#> $ConfusionMatrix
#>         y_pred
#>          Group3 SHH WNT Group4
#>   Group3     26   0   0      0
#>   SHH         0  41   0      0
#>   WNT         3   0  35      0
#>   Group4      1   0   0     59
#> 
#> $ModelPerformance
#>        Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3    0.974     0.859       1.000    0.924       0.969       0.971
#> SHH       1.000     1.000       1.000    1.000       1.000       0.971
#> WNT       0.984     1.000       0.928    0.963       1.000       0.971
#> Group4    0.990     1.000       0.972    0.986       1.000       0.971
NewDataPredictionResult(Model = nb)
#>            Subgroup
#> GSM2261711   Group3
#> X78             WNT
#> GSM2261640   Group4
#> GSM2261575   Group4
#> X135            WNT
#> GSM2262184   Group3
#> GSM2261613   Group3
#> X130            WNT
#> GSM2261922   Group4
#> GSM2261980   Group3

This function has the following arguments:

Artificial neural network model

The NeuralNetworkModel is a function to train an artificial neural network model to classify medulloblastoma subgroups using DNA methylation beta values. Please uncomment the following lines and run the function. If it is the first time you run this function, set the InstallTensorFlow parameter to TRUE. It will automatically install the Python and TensorFlow library (version 2.10-cpu) in a virtual environment then set the parameter to FALSE.

Usage

# set.seed(1234)
# fac <- ncol(Data1)
# NewData <- sample(data.frame(t(Data1[,-fac])),10)
# NewData <- cbind(rownames(NewData), NewData)
# colnames(NewData)[1] <- "ID"
# ann <- NeuralNetworkModel(Epochs = 100, 
#                           NewData = NewData,
#                           InstallTensorFlow = TRUE)
# ModelMetrics(Model = ann)
# NewDataPredictionResult(Model = ann)

This function has the following arguments: