Minimum XOR Path in a Grid

Problem LinkSolution Link

Intuition #

This problem is the third question in the biweekly LeetCode contest 179, and I was unable to solve it. This is a Grid Path Dynamic Programming problem, but with a twist that breaks the classical Dynamic Programming approach, because the cost of the function is XOR, not addition/subtraction.

During the contest, I “solved” (more like attempted) the problem in 3 ways.

Top-down/Bottom-up Dynamic Programming storing a single value per cell; however, since we are dealing with XOR, storing a single value does not work.

For example: If two paths reach (x-1, y) with XOR values 3 and 7, and grid[i][j] = 5:

The path with the larger XOR (7) produced the smaller result. The minimum XOR at a cell does not guarantee the minimum XOR at the next cell. (Optimal substructure is broken).

Backtracking/Brute Force

Correct idea, but the number of paths is exponential. Too slow.

dp[y][x][]int{} Out of Memory

This was actually the right direction, but I was not able to optimize it during the contest.

Approach: Storing All XOR Values #

The backtracking approach tracks for each cell all the possible XOR values it can have, and we can leverage this thinking by using dp[y][x][]int{}, but that will result in an Out of Memory runtime error. What is making the []int{} big, and why is it causing an Out of Memory runtime error?

We were storing too many values in []int{}, and some of them were duplicates, but XORing 2 equal numbers with another number will result in the same value, which means we do not actually care about duplicate values; which means we can use a hash map (set) to make the memory manageable.

So we need to:

R is the length of grid, and C is the length of grid[0].

Implementation #

 1func minCost(grid [][]int) int {
 2	R, C := len(grid), len(grid[0])
 3	dp := make([][]map[int]bool, R)
 4	for i := range grid {
 5		dp[i] = make([]map[int]bool, C)
 6		for j := range C {
 7			dp[i][j] = make(map[int]bool)
 8		}
 9	}
10	dp[0][0] = map[int]bool{grid[0][0]: true}
11
12	for i := range grid {
13		for j := range grid[i] {
14			if i > 0 {
15				for previous, _ := range dp[i-1][j] {
16					dp[i][j][grid[i][j]^previous] = true
17				}
18			}
19
20			if j > 0 {
21				for previous, _ := range dp[i][j-1] {
22					dp[i][j][grid[i][j]^previous] = true
23				}
24			}
25		}
26	}
27
28	result := math.MaxInt
29	for res, _ := range dp[R-1][C-1] {
30		result = min(result, res)
31	}
32
33	return result
34}

Optimization 1: Using a fixed array of size 1024 #

Since grid values are bounded by the constraint: 0 <= grid[i][j] <= 1023, and since 1023 is a 10-bit number. What does the XOR of 10-bit numbers always produce?

Correct, 10-bit numbers.

We can then replace the HashMap with a boolean array, where array[i] indicates whether i is a valid number to XOR with.

So we need to:

Implementation #

 1func minCost(grid [][]int) int {
 2	R, C := len(grid), len(grid[0])
 3	dp := make([][][1024]bool, R)
 4	for i := range grid {
 5		dp[i] = make([][1024]bool, C)
 6	}
 7	dp[0][0][grid[0][0]] = true
 8
 9	for i := range grid {
10		for j := range grid[i] {
11			if i > 0 {
12				for previous, ok := range dp[i-1][j] {
13					if ok {
14						dp[i][j][grid[i][j]^previous] = true
15					}
16				}
17			}
18
19			if j > 0 {
20				for previous, ok := range dp[i][j-1] {
21					if ok {
22						dp[i][j][grid[i][j]^previous] = true
23					}
24				}
25			}
26		}
27	}
28
29	result := math.MaxInt
30	for num, ok := range dp[R-1][C-1] {
31		if ok {
32			result = min(result, num)
33		}
34	}
35
36	return result
37}

Optimization 2: Rolling Array #

Each cell in the grid depends on the cell above it and the cell to its left, if present. That means if we are at row 2, we do not need or care about row 0 from now on.

We can then save space by using only 2 rows, not R rows.

So we need to:

Implementation #

 1func minCost(grid [][]int) int {
 2	R, C := len(grid), len(grid[0])
 3
 4	// Initializing row 0 of the grid
 5	// as prev and updating the XOR
 6	// values since it does not need
 7	// a top row.
 8	prev := make([][1024]bool, C)
 9	prev[0][grid[0][0]] = true
10	for i := 1; i < C; i++ {
11		for previous, ok := range prev[i-1] {
12			if ok {
13				prev[i][previous^grid[0][i]] = true
14			}
15		}
16	}
17
18	curr := make([][1024]bool, C)
19	for i := 1; i < R; i++ {
20		for j := 0; j < C; j++ {
21			for previous, ok := range prev[j] {
22				if ok {
23					curr[j][previous^grid[i][j]] = true
24				}
25			}
26
27			if j > 0 {
28				for previous, ok := range curr[j-1] {
29					if ok {
30						curr[j][previous^grid[i][j]] = true
31					}
32				}
33			}
34		}
35
36		prev = curr
37		curr = make([][1024]bool, C)
38	}
39
40	result := math.MaxInt
41	for num, ok := range prev[C-1] {
42		if ok {
43			result = min(result, num)
44		}
45	}
46
47	return result
48}

Optimization 3: Word-level Parallelism #

With [1024]bool, each element is 1 byte. To find set values, you loop 1024 times, touching 1024 separate bytes.

With [16]uint64, 64 booleans are packed into one 64-bit integer. The key speedup is:

if word == 0 { skip all 64 values instantly }

A single comparison skips 64 values. That’s word-level parallelism, processing 64 bits with one CPU instruction instead of 64 separate checks.

That means instead of [1024]bool we can use an array of uint64 of size 16 covering all the numbers from 0 to 1023.

Each unsigned number in the array will hold 64 numbers starting from 0 until 63, then from 64 until 127, and so on until 1023.

If v is the value we want to set in the array, we can set it using:

bits[v/64] |= 1 << (v % 64)

And to retrieve each number, we can use a for loop with an offset and a counter with bit shifting.

 1// Example
 2bits := uint64(1 << 7) // Setting the number 7 in uint64
 3bits |= uint64(1 << 4) // Setting the number 4 in unint64
 4counter := 0
 5for bits != 0 {
 6	if bits&1 != 0 {
 7		fmt.Println(counter) // 4 then 7
 8	}
 9	counter++
10	bits >>= 1
11}

So we need to:

Implementation #

 1func minCost(grid [][]int) int {
 2	R, C := len(grid), len(grid[0])
 3
 4	prev := make([][16]uint64, C)
 5	// You can ommit the | in |=,
 6	// since there's always excatly 1
 7	// distinct XOR value per cell,
 8	// because there's exactly one forced path.
 9	prev[0][grid[0][0]/64] |= 1 << (grid[0][0] % 64)
10	for i := 1; i < C; i++ {
11		updateNextUsingXOR(&prev[i-1], &prev[i], grid[0][i])
12	}
13
14	curr := make([][16]uint64, C)
15	for i := 1; i < R; i++ {
16		for j := 0; j < C; j++ {
17			// Above
18			updateNextUsingXOR(&prev[j], &curr[j], grid[i][j])
19
20			// Left
21			if j > 0 {
22				updateNextUsingXOR(&curr[j-1], &curr[j], grid[i][j])
23			}
24		}
25
26		prev = curr
27		curr = make([][16]uint64, C)
28	}
29
30	result := math.MaxInt
31	for offset, bits := range prev[C-1] {
32		counter := 0
33		for bits != 0 {
34			if bits&1 != 0 {
35				value := counter + (offset * 64)
36				result = min(result, value)
37			}
38
39			bits >>= 1
40			counter++
41		}
42	}
43
44	return result
45}
46
47func updateNextUsingXOR(original, toUpdate *[16]uint64, currentNumber int) {
48	for offset, bits := range original {
49		counter := 0
50
51		// Early stop
52		// We can use bits & (1 << v) != 0
53		// for v  in [1, 64], but by using
54		// counter + bits != 0 we can early
55		// stop if there's only one bit set
56		// in bits.
57		for bits != 0 {
58			if bits&1 != 0 {
59				previous := counter + (offset * 64)
60				previous ^= currentNumber
61
62				toUpdate[previous/64] |= 1 << (previous % 64)
63			}
64
65			bits >>= 1
66			counter++
67		}
68	}
69}

Complexity #