clevr implements functions for evaluating link prediction and clustering algorithms in R. It includes efficient implementations of common performance measures, such as:
While the current focus is on supervised (a.k.a. external) performance measures, unsupervised (internal) measures are also in scope for future releases.
You can install the latest release from CRAN by entering:
install.packages("clevr")
The development version can be installed from GitHub using
devtools
:
# install.packages("devtools")
::install_github("cleanzr/clevr") devtools
Several functions are included which transform between different clustering representations.
library(clevr)
# A clustering of four records represented as a membership vector
<- c("Record1" = 1, "Record2" = 1, "Record3" = 1, "Record4" = 2)
pred_membership
# Represent as a set of record pairs that appear in the same cluster
<- membership_to_pairs(pred_membership)
pred_pairs print(pred_pairs)
#> [,1] [,2]
#> [1,] "Record1" "Record2"
#> [2,] "Record1" "Record3"
#> [3,] "Record2" "Record3"
# Represent as a list of record clusters
<- membership_to_clusters(pred_membership)
pred_clusters print(pred_clusters)
#> $`1`
#> [1] "Record1" "Record2" "Record3"
#>
#> $`2`
#> [1] "Record4"
Performance measures are available for evaluating linked pairs:
<- rbind(c("Record1", "Record2"), c("Record3", "Record4"))
true_pairs
<- precision_pairs(true_pairs, pred_pairs)
pr print(pr)
#> [1] 0.3333333
<- recall_pairs(true_pairs, pred_pairs)
re print(re)
#> [1] 0.5
and for evaluating clusterings:
<- c("Record1" = 1, "Record2" = 1, "Record3" = 2, "Record4" = 2)
true_membership
<- adj_rand_index(true_membership, pred_membership)
ari print(ari)
#> [1] 0
<- variation_info(true_membership, pred_membership)
vi print(vi)
#> [1] 0.8239592