Set theory is incredibly intuitive and has many practical applications in software engineering. In fact, any professional programmer without an understanding is at a disadvantage. Unfortunately, many in the industry relegate it to the purview of mathematicians. This is understandable because most material on the subject delineates set theory with first order logic as a basis for math. The good news is that it doesn’t have to be this way. As this series demonstrates, it is accessible to anyone regardless of background.
The three articles in this series aim to introduce set theory, expound upon set operations, and demonstrate the learnings using JavaScript (ES6). The goal is to provide the reader with actionable knowledge to improve his/her software skills without a surfeit of superfluous details. This first installment describes the theory in order to provide a firm foundation for future practical application.
NOTE: All code samples are written in ES6 and are therefore not likely to execute directly in a browser. The best option is to use Node or transpile the excerpts using either Babel or TypeScript. The working code is available on GitHub along with execution instructions.
What is Set Theory
The inception of set theory dates back to the nineteenth century with Georg Cantor. On the surface, it’s brilliantly simple. A set is simply a collection of unordered objects. In mathematical parlance, objects contained in a set are known as members or elements. An element can be literally anything, including another set. Sets are typically depicted as objects inside curly braces and are denoted by capital letters. For instance, \(A= \{ 1,2,3 \}\) is the mathematical representation of the set \(A\) with the members \(1\), \(2\), and \(3\). Set membership is signified as: \(1 \in A\). Figure One – Sets illustrates these symbols.
Set theory relies on FOPL (First Order Predicate Logic) to construct sets. Expanding on the definition above, sets are a collection of objects that satisfy a predicate. A predicate is a function that accepts a single argument and returns a Boolean (true
or false
) value. For instance, the set of all dogs has the predicate \(IsDog(n)\). In other words, elements of a set share some arbitrary property. FOPL is fascinating, but not particularly relevant to this article. A general acumen of predicates is sufficient for comprehension of this material. A cursory web search for First Order Logic will present sufficient resources for the curious reader.
Set Mapping
There are a few interesting operations that can be performed on sets, most of which are covered in the next installment. However, mapping from one set to another is germane to a foundational understanding of set theory. A set is transformed, or mapped, into another related set via the use of a function.
A mathematical function is analogous to a software function with added constraints. They are similar in that they accept an input and return an output. The difference is that a mathematical function can only accept a single input, must return an output, are determinate, and side effects are impermissible. Sources often refer to functions as relations between sets because they map a member of a set to member of another set. While mathematical functions are relevant to the understanding of set theory, programmers need not be particularly concerned with this concept. The significant notion is that of a function in general, which should be apparent to most software professionals. As an aside, further understanding of mathematical functions is particularly useful for other programming concepts.
Mapping works by applying a function to each member of a set and placing the output into another set. Figure Two – Set Mapping illustrates the concept. This is particularly applicable to programming, so understanding is imperative.
Given the information above, the impetus of the map method of arrays in JavaScript (ES6) is obvious. Arrays are a convenient analog to sets. See the code sample below.
const wholeNumbers = [1, 2, 3];
const evenNumbers = wholeNumbers.map(n => n * 2);
The above isn’t exactly a realistic scenario: generating an array of doubled numbers isn’t auspicious. A more real world use of the map method is to modify complex objects. See the code below.
const people = [{id: 1, name: "Ada Lovelace"}, {id:2, name: "Charles Babbage"}];
const names = people.map(p => p.name);
Map is exceedingly suitable for many use cases. Understanding set theory elucidates its utility.
Warning
As a fair warning, the remainder of this post provides a prospectus of the areas of set theory that aren’t directly applicable to everyday programming activities. Although intriguing, the uninterested reader should feel free to skip to the conclusion.
To Infinity and Beyond
The conception of sets isn’t exactly revolutionary. Kindergarten pedagogy teaches children to categorize objects into sets. It’s simple and intuitive. The innovation is revealed by examining sets of infinite size.
Conceptually, there are two methods for comparing the sizes of sets. The first is to enumerate the members and compare the resulting counts. This is blindingly obvious; however, it has a substantial flaw. It isn’t possible to calculate the number of members in an infinite set. As a second option, Cantor postulated that if it is possible to create a function that maps the first set to the second set without skipping members, then the sets must be of equal size.
The canonical example is to compare the set of natural numbers (whole numbers excluding zero) to the set of even natural numbers. Figure Three – Counting Sets demonstrates the concept. Although it’s not exactly intuitive, and is often controversial, this establishes that the two infinite sets are equally sized. This might lead one to believe infinity is simply infinity. However, it’s a bit more abstruse.
Consider the set of real numbers (natural, rational, irrational, and transcendental) between one and two. Think back to the number lines that are an inexorable part of preparatory education and envision a set encompassing all numbers on the line between one and two. Regardless of the placement of two distinct points on the line, it is possible to find a smaller number between them. The interesting thing about this infinite set is that it is not possible to create a function that maps the set of natural numbers to this set without skipping members. This implies that although both sets are infinite, the set of real numbers between one and two is actually larger than the set of all natural numbers. Cantor verified this in a beautifully elegant proof known as Cantors Diagonalization.
While theoretically straightforward, the notion of multiple sizes of infinity is a bit vexatious. John von Neumann once said, “in mathematics, you don't understand things. You just get used to them.". This concept holds true to his conjecture. The good news is that the notion of different sizes of infinity is only applicable in the most esoteric areas of computer science. The majority of programmers need not concern themselves with it.
Don’t be Naïve
Set theory took the mathematical world by storm with its simplicity and elegance. Many foundational theories are built on the cornerstone of set theory. However, it contains a substantial flaw which could have spelled doom except that mathematicians couldn’t deny its utility. Therefore, it split into two separate theories known as naïve and axiomatic set theory. It’s similar to how general and special relativity exist simultaneously.
Naïve set theory is sufficient for many applications. In fact, it is adequate for almost all software engineering use cases. Axiomatic set theory does apply to some esoteric areas of computability and logic. However, it is far removed from the greatest majority of programming tasks.
As for axiomatic set theory, it is an extension of the original theory that introduces several axioms that address flaws. The underlying issue with naïve set theory is that a paradox can arise when defining predicates. The most popular demonstration of the defect is Russell’s Paradox. Succinctly stated: does the set of all sets that do not include themselves include itself? If the answer is yes, then the definition is contradictory because it does contain itself. If the answer is no, then the predicate is likewise inconsistent because it cannot contain all sets that do not contain themselves. Don’t worry if this seems perplexing, it often requires reflection.
The finer points of axiomatic set theory are beyond the scope of this article. However, the intrigued reader should perform a web search for Zermelo–Fraenkel set theory to learn more. Regardless of its applicability to programming, it’s quite captivating.
Conclusion
The most pertinent programming related concepts detailed in this post are sets and set mapping. A set is simply a collection of objects. Set mapping is applying a function to each member of a set to produce a related set. The following pieces in this series expound on how these concepts are applicable.
Set theory is surprisingly simple yet it reveals some mystifying truths such as the fact that there are multiple sizes of infinity. There are essentially two branches of set theory: naïve and axiomatic. Naïve set theory is sufficient for the majority of software engineering applications.
Make sure to come back for the next article. With the foundational concepts out of the way, the post delves into set operations which provide valuable mental models for programmers. These are concepts that will improve your development abilities.