Duplication Isn't Always an Anti-Pattern

Duplication isn't always bad. It's often rational. I wrote an academic paper explaining why, and offering a solution:

https://invisible.college/toomim/toomim-linked-editing.pdf

> Abstractions can be costly, and it is often in a programmer’s best interest to leave code duplicated instead. Specifically, we have identified the following general costs of abstraction that lead programmers to duplicate code (supported by a literature survey, programmer interviews, and our own analysis). These costs apply to any abstraction mechanism based on named, parameterized definitions and uses, regardless of the language.

> 1. *Too much work to create.* In order to create a new programming abstraction from duplicated code, the programmer has to analyze the clones’ similarities and differences, research their uses in the context of the program, and design a name and sequence of named parameters that account for present and future instantiations and represent a meaningful “design concept” in the system. This research and reasoning is thought-intensive and time-consuming.

> 2. *Too much overhead after creation.* Each new programming abstraction adds textual and cognitive overhead: the abstraction’s interface must be declared, maintained, and kept consistent, and the program logic (now decoupled) must be traced through additional interfaces and locations to be understood and managed. In a case study, Balazinska et. al reported that the removal of clones from the JDK source code actually increased its overall size [4].

> 3. *Too hard to change.* It is hard to modify the structure of highly-abstracted code. Doing so requires changing abstraction definitions and all of their uses, and often necessitates re-ordering inheritance hierarchies and other restructuring, requiring a new round of testing to ensure correctness. Programmers may duplicate code instead of restructuring existing abstractions, or in order to reduce the risk of restructuring in the future.

> 4. *Too hard to understand.* Some instances of duplicated code are particularly difficult to abstract cleanly, e.g. because they have a complex set of differences to parameterize or do not represent a clear design concept in the system. Furthermore, abstractions themselves are cognitively difficult. To quote Green & Blackwell: “Thinking in abstract terms is difficult: it comes late in children, it comes late to adults as they learn a new domain of knowledge, and it comes late within any given discipline.” [20]

> 5. *Impossible to express.* A language might not support direct abstraction of some types of clones: for instance those differing only by types (float vs. double) or keywords (if vs. while) in Java. Or, organizational issues may prevent refactoring: the code may be fragile, “frozen”, private, performance-critical, affect a standardized interface, or introduce illegal binary couplings between modules [41].

> Programmers are stuck between a rock and hard place. Traditional abstractions can be too costly, causing rational programmers to duplicate code instead—but such code is viscous and prone to inconsistencies. Programmers need a flexible, lightweight tool to complement their other options.