Set Theory

Sets, Relations and Logic

Yet a bit more Introduction to Foundational Mathematics

Michael Prantl

01 Oct 2020 — 19 min read

I know I said, I would like to conclude the topic of Formal Logic in Mathematics — despite there remaining a lot to be said. Turns out that there not only remains a lot of very interesting things to be said. But also, that if we let ourselves in on this angle of looking at things, we are being offered a wonderfully coherent perspective at concepts and ideas that will resurface later in Maths over and over again.

Each mathematical statement can be reexpressed as a logic formula — a syntactic form of their inner logic structure. Objects in statements are defined in terms of their properties. Statements then describe whether any given object has that property or not. This may sound trivial enough, but statements and the properties they describe can become arbitrarily complex. Sets provide a neat notation for grouping objects according to their properties and keeping track of which object has or does what. Set notation lends itself very nicely for writing proofs — you find more information on proofs in my last article Hard Proofs and Logic — and expressing definitions of properties of objects. And it builds very generously on the logic notation introduced in my first article Formal Truth and Logic.

Set Theory is a deep field, invaluable to express fundamental ideas in Mathematics. One article is wildly insufficient to provide a clean introduction to Set Theory by gradually constructing it from the ground up. But, because it is so invaluable as a syntactic vehicle, I would like to introduce enough notation and operations on sets to get ourselves started. The objective is to provide sufficient fodder to reexpress definitions and arguments in terms of logic expressions and acquaint you with relations on sets.

Sets and Set Notation

Sets are collections of objects. They list the objects that they contain, where each element is contained exactly once. Sets allow to group objects neatly into distinct categories according to specific properties. Crucial is not the arrangement of objects within the set, but the elementhood of objects. Consequently, the main interaction between objects and sets is the element-of operator “∈”. We use the symbol “≔” to define a new set and assign it to a name.

There are at least three ways to construct a set (s. [1] p. 28):

If the number of elements in the set is finite and manageable, it may be possible to simply list all the set elements exhaustively:
(1) S≔ { Alice, Bob, Link, Zelda },
could be the set S of all siblings of a person. The number of elements in the set is most certainly finite and in many instances can be counted on two hands.
Else, if the elements in the list follow an obvious pattern, then regardless of whether the number of elements contained in the set is finite or not, you may just list enough elements until the pattern emerges clearly and add an ellipsis:
(2) E ≔ { 0, 2, 4, 6, … },
where E is the set of all even natural numbers. The pattern for how the set is generated, should be obvious enough from the elements that are listed. But given the lack of explicitness, this notation still leaves room for ambiguity.
The most concise way to define a set, is to write out the criterion that objects need to fulfill to be considered an element of the set:
(3) E ≔ { x ∈ ℕ | x is even },
where E is the same set as defined in (2), but this time it is defined unambiguously. In fact, we are defining E here in terms of another set, the set of the natural numbers ℕ. Every element x in ℕ is also an element of E, if and only if x makes the statement “x is even” come true.

The statement to the right of the vertical bar in (3) should be seen as a condition, a test that is applied to every canditate object. The variable x is instantiated successively with each candidate object. If the object makes the statement true, it passes the elementhood test. If P(x) is the statement “x is even”, then E is the set of all elements, which makes the statement P(x) true. We have defined the set Ein terms the property P(x) and we call E the truth set of P(x). We write: { x | P(x) } (s. [1] p. 31).

This should be reminiscent of how we write predicates in many programming languages. In Python for example, we would write the predicate “x is even” as:

Or given the set of natural numbers N, we would create the set E with the succinct set comprehension syntax:

Operations on Sets

We have learnt sets can be defined as truth sets of objects, which test positively on a given predicate. As seen in (3), sets can also be defined in terms of another set. One can take a set as a starting point and filter all elements by the predicate to generate a new set.

Set operations combine two sets in a certain way to define a new set. This new set can also be regarded as the truth set of a specific predicate where elements of both original sets are tested.

Unions

Definition: The union M ∪ N of two sets M and N is the set, which contains all objects that are either elements of M or elements of N or elements of both M and N (s. [2] p. 34).

Fig. 1 The union of two sets are all the elements in each set including the elements shared by both of them.

The union of two sets is therefore defined as

(4) M ∪ N ≔ { x| x ∈ M ∨ x ∈ N },

where M ∪ N is a set itself. Thus, elements of a union can be elements of both, either M or N. As mentioned, the union can be regarded as the truth set of the predicate to the right of the vertical bar.

But, what if M and N overlap in elements? The set of natural numbers ℕ and the set of integers ℤ have many many elements in common. Because sets only define the existence of an element in the set, every element only appears once within the set. Duplicate objects are coerced to the same element during construction of the set. If M ≔ { 0, 2, 5, 6, 10 } and N ≔ { 2, 3, 4, 5, 10 }, then the union M ∪ N = { 0, 2, 3, 4, 5, 6, 10 }.

Intersections

Definition: The intersection M ∩ N of two sets M and N is the set, which contains all objects that are elements of both M and N (s. [2] p. 34).

Fig. 2 The intersection of two sets are only those elements that appear in both sets.

The intersection of two sets is therefore defined as

(5) M ∩ N ≔ { x| x ∈ M ∧ x ∈ N },

where M ∩ N is again a set itself. Note, how elements that are only contained in one of the sets M or N, are not elements of the intersection M ∩ N. Suppose again M ≔ { 0, 2, 5, 6, 10 } and N ≔ { 2, 3, 4, 5, 10 }. The intersection M ∩ N is { 2, 5,10 }.

Difference

Definition: The difference M ∖ N of two sets M and N is the set, which contains all elements of M, which are not elements of N (s. [2] p. 34).

The wording in this definition may be a bit more confusing by itself alone. The logic notation of the definition should hopefully make it clearer: The difference of two sets is

(6) M ∖ N ≔ { x| x ∈ M ∧ x ∉ N }.

Again, M ∖ N defines a new set, but this time the relation is not symmetric between the two sets used for the construction. Starting with set M, all elements of M which are simultaneously elements of N are removed. One could say M ∖ Nis the set M without the intersection M ∩ N. Suppose one more time M ≔ { 0, 2, 5, 6, 10 } and N ≔ { 2, 3, 4, 5, 10 }. The difference M ∖ N is all elements of Mwithout the elements of N: { 0, 6, 10 }.

Equivalence between Sets

The construction of sets does not imply an intrinsic ordering. The sets defined as

(7) A ≔ { 0, 1, 2, 3, 4, 5 },

(8) B ≔ { 5, 4, 3, 2, 1, 0 }, and

(9) C ≔ { 0, 2, 4, 1, 3, 5 }

do not have any different meaning. All three notations define precisely the same set. But how can one be sure that the sets are equal?

Equivalence is one of those concepts that relies very heavily on our intuition, but which is not that simple to accurately pin down, given that equality is defined on so many different types of objects. Luckily, for sets it is defined straight forwardly as the special case of a more general relationship between two sets: The subset.

Subsets

Definition: A set M is a subset of another set N, if each element in M is also an element in N (s. [2] p. 33).

For example the set E defined in (3) is a subset of the natural numbers ℕ, because every element in E is also and element of ℕ. The reverse is not true. There are many elements in ℕ, which are not elements of E. So, subsets define a potentially (but not necessarily) assymetric relationship between two sets

(10) M ⊆ N ⇔ ∀x(x ∈ M ⇒ x ∈ N).

In other words: If x is in M, then x is certainly in N, too. Suppose M ≔ { 0, 2, 5, 10 } and N ≔ { 0, 1, 2, …, 10}, i.e. all integers from 0 to 10. We can see that all elements in M are also contained in N. Thus, per our definition M ⊆ N.

Equality

Definition: Two sets M and N are equal if each element in M is also an element of N and if each element in N is also an element of M (s. [2] p. 33).

Roughly speaking, two sets are defined as equal, if they contain the exact same elements. Admittedly, for the sets defined in (7), (8), and (9) it is easy enough to eyeball the three sets and conclude that they are indeed all the same. But the definition of equality gives a more precise method to test for it. If you paid close attention, then you have noticed that this definition leverages the same wording as the definition of subsets. The difference is that two sets M and N are equal if and only if M is a subset of N and if N is a subset of M. Hence, in logic notation

(11) M = N ⇔ M ⊆ N ∧ N⊆ M.

Given what we already know about subsets, we can drop (10) into (11):

(12) M = N ⇔ ∀x(x ∈ M ⇒ x ∈ N) ∧ ∀y(y ∈ N ⇒ y ∈ M).

If we would like to test the hypothesis that the set in (7) and (8) are equal, we could use the equivalence (12). For example, we could identify M with A and Nwith B, and start instantiating the variables x and y with elements of A and Brespectively:

(13) (0∈ A⇒ 0∈ B)∧ (5∈ B⇒ 5∈ A) is true,

(14) (1∈ A⇒ 1∈ B)∧ (4∈ B⇒ 4∈ A) is true, too,

(15) (1∈ A⇒ 1∈ B)∧ (1∈ B⇒ 1∈ A) as well, …

Even if the number of tests in an example like this would grow large very quickly, it would probably still be feasible to prove the equality of the two sets by listing all instantiations of the test exhaustively. For infinite sets, we would be better advised to take a closer look at the properties of the objects involved, and how we could make clever use of them to logically derive the elementhood for these objects in the set.

Relations on Sets

We can use sets and predicates to describe properties of objects. But how can we describe relationships between objects? We need another form of notation to describe two different objects being related in some sort to one another. Those two objects are explicitly allowed to be elements of different sets.

Definition: Suppose M and N are two non-empty sets. The Carthesian product M × Nof M and N is the set of all ordered pairs (m, n) with m ∈ M and n ∈ N (s. [1] p. 174; [2] p. 35).

That the pair (m, n) is ordered is an important aspect. The position of each element in the pair defines which role each variable takes in the truth set defined by the Cartesian product

(16) M × N ≔ { (m, n) | m ∈ M ∧ n ∈ N }.

Why is it called a product? The cardinality of the set M × N — the number of elements in it — is the number of possible first elements of the pair to choose from set M times the number of possible elements from set N with which to pair it up.

You may be familiar with Carthesian products from… function graphs! A Carthesian coordinate system is a graphic visualization of all the pairs (x, f(x)) of an input value x of a function f and its output value f(x). The pair is regarded as a pair of coordinates in the plane, which specifies their location. That is why — regardless of whether we are talking about function graphs or any other relation — if (m, n) is an arbitrary ordered pair we call m the first coordinate and n the second coordinate (s. [1] p. 174).

The Carthesian product M × N is the set of all possible pairs that you can get by choosing your first coordinate from set M and your second coordinate from set N. A relation is a subset of M × N.

Definition: Suppose M and N are sets. A binary relation R from M to N is a subset of their Carthesian product R ⊆ M × N (s. [1] p. 182; s. [5] p. 59).

If a pair (m, n) is an element of the relation R, we say “m is related to n”. We also write mRn. mRn is entirely equivalent to saying (m, n) ∈ R (s. [1] p. 191; s. [5] ibid.). It is entirely possible for M and N to refer to the same set. If M = N, then we sometimes also write M² instead of M × M.

The definition of relations is very generic and allows for a lot of freedom to use relations as a concept to define relationships between sets. That is: Relations can be entirely unstructured, meaning that the only way to properly define the entire relation, one has to list exhaustively any pair (m, n) for which mRn. Or they can be defined as the truth set of a predicate given as a closed formula.

Ex. 1 Functions (s. [1] p. 229; s. [2] p. 37).

Let M and N be sets and f be a relation with f ⊆ M × N. Suppose that

(i) for all m ∈ M, there exists an n ∈ N, such that (m, n) ∈ f, and

(ii) that for (m, n) ∈ f and (m, n') ∈ f it follows that n = n'.

Then we call f a function from M to N and write f: M → N.

A function is a subset of the Carthesian product of two sets. It is therefore a special type of relation. It is a relation where the first coordinate is paired always with the same second coordinate. Functions are often defined with a closed formula, which provides a means to derive the definite second coordinate for any given first coordinate. For x ∈ M and y ∈ Nwe write f(x) = y. The truth set defining f is then

(17) f ≔ { (x, y) ∈ M × N | y = f(x) }.

If for example M = N = ℝ and f(x) = ½ ∙ x + 2, then we can write function f as the set { (x, y) ∈ ℝ² | y = ½ ∙ x + 2 }.

You see now clearly, why the order here matters: It is easy to concieve of a relation, which is directional, i.e. where (m, n) ∈ R and (n, m) ∈ R mean different things and are not necessarily both true at the same time.

Properties of Relations

In fact, suppose mRn and nRm were both simultaneously true, with R ⊆ M × N for any m ∈ M and any n ∈ N. Then we give this special property of the relation R a name: We say “R is symmetric”.

Symmetry

When we say a relation is symmetric, we mean that we can change the statement mRn to nRm, without making the statement false. The order of the coordinates is here not relevant for the interpretation of the statement. The relation is so to say bidirectional.

Definition: A relation R on a set M is called symmetric, if for all x, y ∈ M, if x is related to y, then y is also related to x under R (s. [1] p. 194; s. [5] p. 60).

Rewritten in logic notation: If R is symmetric, then

(18) ∀x ∈ M ∀y ∈ M(xRy ⇒ yRx).

Antisymmetry

Antisymmetry restricts the symmetry between elements of a set to only specific elements. In fact, it means that if two objects are related symmetrically under a relation R, then those two objects must be the same object.

Definition: A relation R on a set M is called antisymmetric, if for all x, y ∈ M, if x is related to y and y is related to x under R, then it follows that x = y (s. [1] p. 200; s. [5] p. 60).

Written as a logic formula: If R is antisymmetric, then

(19) ∀x ∈ M ∀y ∈ M((xRy ∧ yRx) ⇒ x = y).

Antisymmetry is not the inverse of the symmetric property. For very specific corner cases, it is possible for a relation to be symmetric and antisymmetric at the same time.

Reflexivity

Definition: A relation R on a set M is called reflexive, if for all x ∈ M, x is related to xunder R (s. [1] p. 194; s. [5] p. 60).

So, if the relation R is reflexive, it means that every element x∈ M is related to itself under R. In other words: If R is reflexive, then

(20) ∀x ∈ M(xRx).

Transitivity

Definition: A relation R on a set M is called transitive, if for all x, y, z ∈ M, if x is related to y under R and y is related to z, then it follows that x is related to z (s. [1] p. 194; s. [5] p. 60).

Transitivity means, that for any two pairs of a relation, where the second coordinate of one pair (x, y) equals the first coordinate of the other pair (y, z), then the relation is also defined on the outer coordinates (x, z). If R is transitive, then

(21) ∀x ∈ M ∀y ∈ M ∀z ∈ M((xRy ∧ yRz) ⇒ xRz).

Ex. 2 Graphs (s. [5] p. 65f).

Graphs are a collection of nodes — or vertices — and edges. They are versatile and at first glance there is a lot of information required to describe the graph. But once we understand how to represent a graph mathematically, we see that the abstraction of graphs hides in other seemingly unrelated problems, if we would only rephrase the problem in terms of this abstraction. Let us take the graph in Fig. 4 as an example.

Fig. 4 A simple graph with two components.

There is a set of labeled vertices V ≔ { A, B, C, D, E } connected by a set of edges. We write the graph G as G = (V, E), with V being the mentioned set of vertices and E a relation defined on these vertices. The relation in this case is not given by a closed formula. It is a listing of all edges E ⊆ V² with E ≔ { (B, C), (B, D), (B, E), (C, D), (D, E) }.

The graph in Fig. 4 is an undirected graph, meaning that the edges (B, C) and (C, B)refer to the same edge. In a directed graph the direction of the edge would matter.

Ordering Relations

We have used the symbol R so far to stand for an unspecified relation and M to stand for an arbitrary set. When we write xRy or mRn, you may have been reminded of the notation we use for binary operations, like x ≤ y or m ⊆ n. That is relations generalize beautifully over many operations and statements. With the above properties, we can group certain important types of relations. One such type of relation is the ordering relation.

Partial Ordering

As mentioned earlier, sets do not imply an ordering. Two sets are equal if they contain the same elements, regardless of the order in which the elements appear. But that does not mean that there is no ordering relation defined on the elements of the set. An ordering is a relation between these set elements.

Definition: If a relation R is reflexive, transitive, and antisymmetric, then R is called a partial order (s. [1] p. 200f; s. [5] p. 63ff).

We can replace the symbol R with any other symbol and define a relation on a set using this symbol. If the relation fulfills the three criteria in the definition, it is a partial order. With orderings, we often use the symbol “≤” to stand for an ordering relation and P to stand for the set, on which the relation is defined. We write (P, ≤) to mean the partial order defined on the set P.

Total Ordering

A partial order is deliberately lax about one requirement, one might have implicitly assumed to be given. Say (P, ≤) were a partial order, then (P, ≤) is emphatically not required for any two x, y ∈ P that x and y are in any way related to one another. What that means, is that neither x ≤ y, nor y ≤ x. A partial order, where for any two elements x, y ∈ P it is always either x ≤ y or y ≤ x is a total order.

Definition: A partial order R defined on a set P is called a total order, if for any m, n ∈ P, it is either mRn or nRm (s. [1] p. 200f; s. [5] p. 63ff).

In a total order thus, we will not be able to find a pair of elements, for which no relation is defined. A total order is by all means a partial order which fulfills the additional property, that

(22) ∀x ∈ P∀y ∈ P(xRy ∨ yRx).

The direction of the relation is not important, as long as it is defined at all. Examples for total orders are (ℕ, ≤) or (ℝ, ≤). Both relations are antisymmetric, reflexive, transitive, and defined for each pair of elements, so they tick all the requirements of a total order (s. [5] p. 63).

Ex. 3 Lexicographic Order (s. [5] p. 65).

Lexicographic order extends an existing ordering over the symbols of an alphabet to words in the alphabet. The words can be of arbitrary length, but their length may be considered when determining whether one word is smaller than the other under the given lexicographic order.

Let (Σ, ≤) be a total ordering on the alphabet Σ and n ∈ ℕ. Then w ∈ Σⁿ with w = (w₁, w₂, …, wₙ) is called a word over Σ. The set of all words over Σ with arbitrary length is Σ*. Suppose w, w’ ∈ Σ* with w ∈ Σᵐ and w’ ∈ Σⁿ. The lexicographic ordering “≼” on Σ* is the total ordering

(23) w ≼ w’:⇔∀i∃k(1 ≤ i ≤ k): wᵢ = w’ᵢ ∧ ((m ≤ n) ∨ (wₖ⁺₁ < w’ₖ⁺₁)).

Let us unpack that. The predicate is constructed of three atoms:

(i) The words w and w’ are equal until the k-th symbol,

(ii) the length m of word w is shorter than the length n of word w’, or

(iii) the k+1-th symbol of word w is smaller than that of word w’.

In other words: One word is considered lexicographically lesser than the other if either the first symbol at which the two words differ is less than the symbol at that position in the other word, or if the word has all the same symbols as the other word but is shorter in length. This should intuitively align with our everyday understanding of lexicographic order. Let Σ be the English alphabet and Σ* the set of all English words. Then

(24) “road” ≼ “roadtrip”,

(25) “sets” ≼ “sits”, and

(26) “intuition” ≼ “knowledge”.

Conclusion

Sets allow to define “intrinsic” properties of objects. The fact that they are an element of a specific set means that we can make a statement about this object, reasoning about how this object behaves under certain circumstances. Sets are widely applied in the definition of fundamental axioms, which are used to derive more everyday mathematic concepts. Relations extend this notion of properties to how objects of one or more sets interact with each other. Their generality makes them applicable to define algebraic functions, binary operations, and graphs under the same syntactic framework.

The power of this notational toolset is wonderfully demonstrated in the construction of the natural number set from the Peano Axioms: Starting with a neutral element and a function, we construct a set which resembles the set of natural numbers. In two successive videos, Gabe Perez-Giz from PBS Infinite Series has comprehensibly introduced the Peano Axioms and how to use them to construct the natural numbers.

We are equipped now with the toolset to derive the logic structure of definitions as logic formulas. This way we can treat mathematical statements and theorems as logic equations. That will be helpful in untangling more complex arguments.

References

[1] D. J. Velleman: How to Prove It: A Structured Approach. 3rd ed., Cambridge University Press, Cambridge, 2019.

[2] L. Unger: Mathematische Grundlagen: Grundlagen. Kurseinheit 1, Volume 1, FernUniversität, Hagen, 2019.

[3] L. Unger: Mathematische Grundlagen: Etwas Integralrechnung und viel Logik. Kursheinheit 7, Volume 7, FernUniversität, Hagen, 2019.

[4] U. Kastens, H. K. Büning: Modellierung: Grundlagen und Formale Methoden. 4. Auflage, Carl Hanser Verlag GmbH Co KG, 2018.

[5] W. Hochstättler, et al.: Algorithmische Mathematik: Graphen. Kurseinheit 2, Volume 2, FernUniversität, Hagen, 2020.