Last update before CRAN submission.
set.seed
prior to
generate_settings_matrix
instead.estimate_nclust_given_graph()
occasionally
yielded incorrect number of cluster estimates as a result of improper
scaling in metasnf v0.7.0. The scaling should be corrected now.mc_manhattan_plot()
with a
data list containing duplicate feature namesmc_manhattan_plot()
parameter rep_solution
replaced with more accurate name extended_solutions_matrix
(solutions matrix with _pval columns)SNFtool::estimateNumberOfClustersGivenGraph()
could
occasionally error out on the basis of calculating eigenvectors
(eigengap heuristic) for a Laplacian with floating point values that
were too small. Adapted function
estimate_nclust_given_graph()
slightly scales up Laplacian
to reduce the risk of encountering this error (presumably without any
change to resulting cluster number estimate)get_matrix_order
has arguments allowing users to
control which distance metric and agglomerative hierarchical clustering
methods are used to sort matricesget_complete_uids
quickly pulls UIDs of observations
with complete data from a list of dataframesextend_solutions
doesn’t crash on multi-feature target
listsgenerate_data_list()
remove_missing
parameter for
generate_data_list
allowing subjects with incomplete data
to remain in the data listlp_solutions_matrix
error message when
training set is not subset of full data listgenerate_data_list
list elements now are named after
their componentsmerge_data_lists
functionality to horizontally
merge data listsextend_solutions()
will no longer crash when a
data_list has the UID column in non-first position.generate_data_list()
enforces the UID column to be in
first position of each dataframe.auto_plot()
will automatically generate bar and/or
jitter plots showing how features in a data_list/target_list are
distributed across a single cluster solutionshiny_annotator()
function can be used to identify
indices of meta clusters within an
adjusted_rand_index_heatmap
adjusted_rand_index_heatmap()
now has a
split_vector
parameter that will slice a heatmap into meta
clustersrename_dl()
can be used to rename features in a
data_listmanhattan_plot
has been split into
var_manhattan_plot
(key variable - all variables),
esm_manhattan_plot
(cluster solutions in an extended
solutions matrix to all variables), and mc_manhattan_plot
(like esm_manhattan_plot
, but at the meta-cluster
level)get_representative_solutions
extracts max-ARI solutions
from an extended solutions matrix based on a split_vector
containing meta cluster boundariesbatch_nmi
calculates NMI scores (see
https://branchlab.github.io/metasnf/articles/nmi_scores.html)extend_solutions
will only calculate p-value summary
measures (min/max/mean) for data_list passed in as a
target_list
parameter, but will also accept and calculate
p-values for a data_list passed in through the data_list
parameteradjusted_rand_index_heatmap
and
assoc_pval_heatmap
have updated parameters to improve ease
of use and flexibility (including easier colour control)get_clustered_subs
has been removed (does the same
thing as get_cluster_df
)get_cluster_pval
deprecated for
calc_assoc_pval
generate_data_list()
and its corresponding functionsremove_signal
has been renamed to
linear_adjust
to better reflect its functionsummarize_distance_metrics_list
has been shortened to
summarize_dml
correlation_pval_heatmap
has been renamed to
assoc_pval_heatmap
calc_om_aris
has been renamed to
calc_aris
extend_solutions
p-value calculation
warnings are now suppressed_pval
instead of a mix of p_val
,
pval
, and p
.pval_select
,
p_val_select
, top_oms_per_cluster
,
check_subj_orders_for_lp
, get_p
,
chi_sq_pval
,pval_summaries
, which would calculate
min/max/mean p-values, has been replaced with
summarize_pvals
train_test_assign
now provides results as named list of
subject vectors instead of a data.frame. keep_split
function has been removed accordingly.sort_subjects
parameter added to
generate_data_list
to allow for sorting of subjects in the
data_listextend_solutions
can now also be parallelized (see
?extend_solutions)remove_signal
function has sig_digs
parameter that can be used to restrict how many significant figures are
returned in the resulting residualscalc_om_aris
is now MUCH faster after removing
excessive calls to as.numeric
and enabling parallel
processing with future.apply
. Thanks for the idea,
Alper.extend_solutions
to better handle
extreme p-values (e.g. infinity)p_val_select
with
pval_select
which can also return negative-log
p-valuesgenerate_data_list
correctly errors when components are
only partially named (resolves
https://github.com/BRANCHlab/metasnf/issues/10)lp_row
function has been replaced by
lp_solutions_matrix
. The new function is order agnostic:
full data lists can be constructed without any restriction on how
training and testing set subjects are sorted. Subjects present in the
provided solutions matrix to propagate are assumed to be the training
subjects.calc_om_aris
now has progress
parameter.
When set to true and used in conjunction with
progressr::with_progress()
, a progress bar is shown for the
calculations. Learn more with ?calc_om_aris
.grepl
instead of grep
used in
extend_solutions
to reduce errors when no chi-squared
warning occurskeep_split
will preserve observations who were assigned
a split but were not present in the dataframe being split. Instead of
being removed, those observations will have NA values.fraction_clustered_together
crashing when a
cluster was assigned to only a single observationfraction_clustered_together
not running due to
bracket typo when evaluating length of the data_listcorrelation_pval_heatmap
function can have significance
stars disabled with significance_stars
parameterestimateNumberOfClustersGivenGraph
has been used up to this
point without specifying a parameter for NUMC
.
Consequently, final similarity matrices clustered with the default
methods (spectral clustering based on eigen-gap or rotation cost
heuristics) were not capable of resulting in more than 5 clusters. The
default functions have been updated to span 2 clusters to 10 clusters.
Users will likely see different clustering results as a result of this
change. To replicate the behaviour of default spectral clustering prior
to v0.3.0, users should copy the following code prior to the batch_snf
command:clust_algs_list <- generate_clust_algs_list(
"spectral_eigen" = spectral_eigen_classic,
"spectral_rot" = spectral_rot_classic
)
# Adapt below as necessary
solutions_matrix <- batch_snf(
data_list,
settings_matrix,
clust_algs_list = clust_algs_list
)
fisher_exact_pval
function to avoid “FEXACT” error (like here
https://github.com/Lagkouvardos/Rhea/issues/17). Impact on results is
expected to be negligible.remove_signal()
enables correcting a data_list
linearly for confounders / unwanted signal. Vignette is available: https://branchlab.github.io/metasnf/articles/confounders.html.batch_snf()
has new parameter
automatic_standard_normalize
to switch out the default
numeric distance measures (euclidean) with standard normalized
variants.NEWS.md
file to track changes to the
package.