First 5 minutes of hell
Definition of the Embedding:
- place vertices of on vertices of and route edges of along paths(!) in (edge from is mapped on some path (multiple edges) of )
- how to measure the quality of the embedding?
- maximal load of the host node (how many vertices are mapped to a single -vertex)
- maximal dilation (how long the longest edge-route is)
- maximal edge congestion (how many -edge routes pass through a single -edge)
and are quasiisometric if we can embed one to the other with constant overhead
- one can simulate the other with a constant slowdown (one parallel step in one takes a constant number of parallel steps in the other)
- they are computationally equivalent if they can simulate the other with constant slowdown
- lemma: if they are quasiisometric, then they are computationally equivalent (not the other way!)
Theorem: meshes and tori are quasiisometric and therefore computationally equivalent.
- trivially, mesh is a subset of the torus
- load = 1, dilation = 1, congestion = 1
- the other way: with Cartesian product decomposition, embedding the 1D torus onto the 1D mesh (where load = 1 and dilation = congestion = 2) and building it back again (using the orthogonality and Cartesian product)
- the cost of “faking” the wrap-around edges must be constant
- zig-zag method combining the even and odd edges
The ordinary butterfly and the wrapped butterfly are quasiisometric.
- trivially, we can just merge the end-vertices of the ordinary butterfly (load = 2, dilation = 1) and get the cycles of the wrapped butterfly
- the other way: a complicated constructive proof
Conceptual foundation
Quasiisometry and computational equivalence
Quasiisometry is a static, graph-theoretic equivalence between two interconnection networks based on the existence of mutual embeddings with bounded quality measures.
Definition 2 (Quasiisometric and computationally equivalent networks)
- and are quasiisometric if and both exist with constant embedding measures (load, expansion, dilation, congestion all bounded by constants independent of graph size).
- simulates with slowdown if one parallel step on can be simulated in parallel steps on .
- and are computationally equivalent networks if each can simulate the other with constant slowdown.
Lemma 3
Quasiisometric and are computationally equivalent, but not vice versa.
Why quasiisometry matters
Static embedding measures cannot capture dynamic behaviour of a parallel algorithm on the host network (e.g. a large dilation is not a problem if the corresponding route is used only scarcely). Quasiisometry is however robust with respect to any dynamic behaviour: if and are quasiisometric, then the asymptotic behaviour of any parallel algorithm differs by at most a constant multiplicative factor between the two networks. This is the strongest static argument for treating two topologies as “the same” for the purpose of parallel computation.
Result 1: Meshes and tori are quasiisometric
Theorem 6
Let be the -dimensional mesh and the -dimensional torus of identical dimensions. Then and are quasiisometric and therefore computationally equivalent.
Proof - the easy direction
trivially (the torus has all the mesh edges plus the wraparound edges), so simulates with no slowdown. The identity embedding has , , .
Proof - the hard direction: with ,
The proof uses Cartesian product decomposition, exploiting the orthogonality of meshes and tori:
- Decompose and .
- Embed each 1-D factor with and .
- Apply the Cartesian product to combine the per-dimension embeddings.
The 1-D zig-zag embedding
Number the vertices . Place the torus vertices in the mesh in the zig-zag order: i.e. hop over by 2 going right along the even-indexed vertices, then return on the alternate (odd-indexed) vertices. Every torus edge then maps to a mesh path of length at most 2, and every mesh link carries at most 2 torus-edge paths. The argument is identical for even and odd .
Why orthogonality is key
In orthogonal topologies, embedding moves along one dimension have no effect on the others. The Cartesian product preserves the per-dimension bounds: the overall embedding still has , , , regardless of the number of dimensions.
Consequence for MPI
This is the theoretical justification for the MPI_Cart_Create design decision that lets the programmer freely declare each dimension as either a 1-D mesh or a 1-D torus: computationally, the choice does not matter, only constant factors.
Result 2: Ordinary and wrapped butterflies are quasiisometric
Lemma 11
The ordinary butterfly and the wrapped butterfly are quasiisometric.
Proof - the easy direction:
This direction is trivial: merge the terminal vertices of each row of to obtain the row-cycles of . This yields and .
Proof - the hard direction: with ,
This case is more involved than the mesh-torus case because the butterfly is not orthogonal: the column index of a butterfly fixes which hypercubic dimension is exercised at that stage, so permuting columns simultaneously permutes the hypercube dimensions. Each row of is a 1-D torus , and the cylinder structure means the first and last drawn columns are physically identical.
Step 1: Canonical path in
By vertex symmetry of , it does not matter which path we pick. Choose path from to traversing all stages, with bits of the row address inverted in the order .
After embedding into , every edge of must have dilation at most 3, even though the embedded walk has to detour through the last column of and return.
Step 2: Why the idempotent mapping fails
The naive idempotent mapping maps each column of to the same-indexed column of . The last edge of would then have dilation (i.e. logarithmic in the number of rows), because we must visit ‘s last column and return to column . This concentrates the entire wraparound cost into a single edge of length , violating the constant-dilation goal of quasiisometry. The fix has to spread that cost out so every step carries at most a constant share.
Step 3: Reformulating as a walk problem in
We need a walk from to in such that:
- we visit each column exactly once,
- column is just transient,
- the distance between two neighbours on the walk is at most .
This is analogous to the zig-zag with , but with two complications: we embed into instead of , and bit inversions are tightly coupled to column moves (not independent like in orthogonal topologies). Both endpoint column numbers and row addresses are fixed in advance.
Step 4: The bit-permutation construction
Each valid walk corresponds to a specific permutation of the bits in the row address giving the order in which they are inverted.
For even , the two equivalent permutations are:
- (a)
- (b)
For odd , the two equivalent permutations are:
- (a)
- (b)
Both possibilities are equivalent in terms of dilations: each yields exactly 1 edge of dilation 3, 1 edge of dilation 1, and the rest have dilation 2. For the running example , the chosen permutation is .
Step 5: Using row-symmetry of
The key trick: instead of fighting the rigid column-to-dimension correspondence of butterflies, we exploit row-symmetry to relabel the source graph. A systematic permutation of bits in row addresses of is an automorphism (a bijection preserving adjacency), so we get the same , just drawn differently. We rename all vertices of using this automorphism, so that idempotent column mapping into the destination is now valid.
Step 6: The final embedding
For with permutation :
- column 1 of permuted maps to column 2 of ,
- column 2 of permuted maps to column 3 of ,
- column 3 of permuted maps to column 1 of .
The per-edge dilations achieved along the embedded walk are lengths in , all bounded by the target constant independently of . The construction yields and .
Lessons learned about butterfly symmetry
The constructive proof reveals two structural lemmas about butterflies.
Lemma 12
has automorphisms given by the permutations of bits in row addresses (with the standard layout).
Lemma 13
has automorphisms given by the permutations of bits in row addresses.
Said otherwise, is not vertex-symmetric, but row-symmetric: for any two rows and of , there is an automorphism sending to . The row-symmetry is inherited from the hypercube on which the butterfly is built: the hypercube’s dimension symmetry survives into the butterfly structure as a freedom to permute rows.
Comparison of the two results
Both results establish quasiisometry between a “sparser” topology (mesh, ordinary butterfly) and its “richer” counterpart (torus, wrapped butterfly), with the easy direction being trivial inclusion and the hard direction requiring a constructive embedding that spreads a wraparound cost evenly across edges.
The mesh-torus proof is simple because meshes and tori are orthogonal: per-dimension embeddings combine cleanly via the Cartesian product. The butterfly proof is more involved because butterflies are not orthogonal: column index and hypercube dimension are tied together, so we cannot independently permute dimensions. The workaround is to exploit the rich row-symmetry of to relabel vertices first, making idempotent column mapping valid.
In both cases the achieved bounds are , , constant - small enough to be considered “constant overhead” for parallel computation.
Potential exam questions
Given the lecturer’s proof-heavy, definition-precise style, expect questions like:
- Define quasiisometric networks. State Lemma 3 and explain why the converse (computational equivalence quasiisometry) does not hold.
- Prove Theorem 6: meshes and tori of the same dimensions are quasiisometric. Give the explicit construction of with .
- Why is the Cartesian product decomposition usable here? What property of meshes and tori makes the per-dimension argument compose?
- State Lemma 11 and prove the easy direction . What are the embedding measures?
- Sketch the proof of the hard direction with and . Why does the idempotent mapping fail, and what would its dilation be?
- For , give the bit-inversion permutation used in the construction and explain the corresponding mapping of columns of the permuted into columns of .
- State the two equivalent bit-inversion permutations for general even and odd . How many edges of each dilation value does each permutation produce?
- Why is the butterfly proof more difficult than the mesh-torus proof? What structural property is missing in butterflies?
- State Lemma 12 and Lemma 13. What does ” is row-symmetric but not vertex-symmetric” mean precisely, and how was this used in the embedding construction?
- Compare and contrast the two quasiisometry proofs in this lecture. What is the common high-level strategy, and where does each proof exploit a specific structural feature of the topologies involved?