create_diagnostic_plots() renamed to plot_mcmc_diagnostics() and its parameter res renamed to dpi for naming consistencyMajor Feature: Added the optional opt_subsample parameter to key optimization functions, enabling efficient parameter optimization on large datasets while maintaining final embedding quality. Parameter optimization still works reliably with subsampling, because likelihoods of samples of the same size are comparable, allowing us to choose the optimal parameter values.
New Functions:
check_matrix_connectivity(): Validates that a dissimilarity matrix forms a connected graphsubsample_dissimilarity_matrix(): Creates random subsamples with automatic connectivity validation and adaptive size adjustmentsanity_check_subsample(): Validates subsample suitability for cross-validationprune_sparse_matrix(): Prunes sparse dissimilarity matrices to a well-connected subsetEnhanced Functions:
initial_parameter_optimization(): Now accepts opt_subsample parameterrun_adaptive_sampling(): Now accepts opt_subsample parameteradaptive_MC_sampling(): Now accepts opt_subsample parameter (internal)Euclidify(): Now accepts opt_subsample parameterWhen opt_subsample is specified:
The opt_subsample parameter is optional (default: NULL = use full data).
opt_subsample = NULL)opt_subsample = 200-500opt_subsample >= folds for reliable cross-validation#' The function performs these steps in an epoch-based evolutionary strategy:
#' 1. Initialization: Starts with the user-provided parameter ranges.
#' 2. Epoch Loop: For each epoch:
#' a. Generates num_samples using LHS within the current parameter ranges.
#' b. If opt_subsample is specified, each evaluation uses a random subsample.
#' c. Evaluates parameter sets via cross-validation (in parallel batches).
#' d. Range Update (after all but the final epoch):
#' - Sorts results by NLL and keeps the top 50%.
#' - Updates parameter ranges for the next epoch based on survivors:
#' New Min = 0.75 * Min(Survivors), New Max = 1.25 * Max(Survivors).
#' - This allows the search to drift and zoom in on optimal regions.
#' 3. Finalization: Automatically log-transforms the results from the final epoch
#' for direct use with adaptive sampling.
#' @param epochs Integer. Number of optimization epochs. In each epoch, parameters are sampled, #' evaluated, and the best 50% are used to refine the search space for the next epoch. #' Default: 3.
Major Performance Enhancement: The core optimization loop in euclidean_embedding() has been rewritten in C++ using Rcpp and RcppArmadillo, providing significant speedups for large datasets. All for loops in the core function euclidean_embedding() have been replaced with vector operations.
New Algorithm: Negative Sampling (not used)
n_negative_samples (default: 5) controls the approximation quality vs. speed tradeoffNew Parameters for euclidean_embedding():
n_negative_samples: Number of negative samples per edge endpoint (default: 5). Higher values better approximate the original O(N²) algorithm but increase computation time.convergence_check_freq: How often to check for convergence in iterations (default: 10). Lower values give more precise stopping but add overhead.Implementation Details:
std::mt19937) for stochastic edge ordering, critical for escaping local optimaReturn Value Enhancement: The convergence field in the returned topolow object now includes:
achieved: Boolean indicating whether convergence was reachederror: Final MAE on active constraintsfinal_k: Final spring constant value after coolingDependencies: Added RcppArmadillo to LinkingTo (compile-time only, no runtime dependency added)
| Dataset Size | Sparsity | R (v2.0) | C++ (v2.1) | Speedup | |--------------|----------|----------|------------|---------| | 100 points | 50% | ~2s | ~0.3s | ~7× | | 500 points | 80% | ~45s | ~4s | ~11× | | 1000 points | 95% | ~180s | ~12s | ~15× |
Benchmarks on 1000 iterations, 3 dimensions. Actual speedup varies with data characteristics.
euclidean_embedding() will work without modificationEuclidify(), parameter optimization, etc.)Included figures in the vignette.
The wizard function Euclidify was added to run all the workflow needed to get the main output automatically.
create_topolow_map() is now deprecated in favor of euclidean_embedding(). The old function will be removed in version 3.0.0.
distance_matrix --> dissimilarity_matrixcreate_topolow_map() --> euclidean_embedding()initial_parameter_optimization(): Parameter distance_matrix renamed to dissimilarity_matrix
distance_matrix = your_matrix with dissimilarity_matrix = your_matrixrun_adaptive_sampling(): Parameter distance_matrix renamed to dissimilarity_matrix
distance_matrix = your_matrix with dissimilarity_matrix = your_matrixadaptive_MC_sampling():
distance_matrix renamed to dissimilarity_matrixbatch_size from adaptive_MC_sampling(); its value had no effect in the processes anywaynum_parallel_jobs from run_adaptive_sampling; set max_cores to define the number of cores and parallel jobsdistance_matrix = your_matrix with dissimilarity_matrix = your_matrix and remove batch_size argumentscreate_cv_folds(): Parameter names and return structure changed
truth_matrix --> dissimilarity_matrix, no_noise_truth --> ground_truth_matrix$truth, $train) instead of indexed elementsresult[[1]][[1]] to result[[1]]$truth, result[[1]][[2]] to result[[1]]$traintake_log parameter in clean_data() is deprecated
analyze_network_structure(): Parameter distance_matrix renamed to dissimilarity_matrix for consistency with other functionscalculate_diagnostics(): Return class changed from topolow_amcs_diagnostics to topolow_diagnostics for naming consistencyplot_network_structure(): Removed aesthetic_config and layout_config parameters
width, height, dpi parametersscatterplot_fitted_vs_true(): Parameter names updated for consistency
distance_matrix --> dissimilarity_matrix, p_dist_mat --> p_dissimilarity_matsave_plot changed from TRUE to FALSElinewidth instead of deprecated sizeerror_calculator_comparison(): Parameter names changed for consistency
p_dist_mat --> predicted_dissimilaritiestruth_matrix --> true_dissimilaritiesinput_matrix --> input_dissimilarities (now optional, defaults to NULL)calculate_prediction_interval(): Parameter names changed for consistency
distance_matrix --> dissimilarity_matrixp_dist_mat --> predicted_dissimilarity_matrixlong_to_matrix was renamed to titers_list_to_matrix since it is specific to viral titer data processing.process_antigenic_data accepts a data frame as input, instead of the previous form of a file path.process_antigenic_data, is_titer became is_similarity for clearity for broader audience. Parameter id_prefix was removed.euclidean_embedding() function with enhanced performance and features:
parameter_sensitivity function to use modern ggplot2 syntaxsize parameter with linewidth in plotscreate_cv_folds()input_dissimilarities parameter now optional in error_calculator_comparison()initial_parameter_optimization saves/returns the parameters in log scale, consistent with other functioncreate_topolow_map() deprecated, issues warningcreate_topolow_map() will be removedTo update your code:
# Old (deprecated):
result <- create_topolow_map(distance_matrix = my_matrix,
# ... other parameters
)
# New (recommended):
result <- euclidean_embedding(dissimilarity_matrix = my_matrix, # parameter name changed
# ... other parameters (unchanged)
)
\value documentation describing the output's class, structure, and meaning.\dontrun{} wrappers have been removed5. Slower examples are now wrapped in \donttest{} as appropriate.