Towards a Measure of Algorithm Similarity
2510.27063v1
cs.LG, cs.AI, cs.CL, cs.IT, cs.SE, math.IT, 68Qxx, 03Dxx, 90C29, I.2.6; F.4.1; D.2.4
2025-11-04
Авторы:
Shairoz Sohail, Taher Ali
Abstract
Given two algorithms for the same problem, can we determine whether they are
meaningfully different? In full generality, the question is uncomputable, and
empirically it is muddied by competing notions of similarity. Yet, in many
applications (such as clone detection or program synthesis) a pragmatic and
consistent similarity metric is necessary. We review existing equivalence and
similarity notions and introduce EMOC: An
Evaluation-Memory-Operations-Complexity framework that embeds algorithm
implementations into a feature space suitable for downstream tasks. We compile
PACD, a curated dataset of verified Python implementations across three
problems, and show that EMOC features support clustering and classification of
algorithm types, detection of near-duplicates, and quantification of diversity
in LLM-generated programs. Code, data, and utilities for computing EMOC
embeddings are released to facilitate reproducibility and future work on
algorithm similarity.