Dynamo and the Cosmic Grid

Author : Udynamo

Required Knowledge : Dynamic Programming, Convex Hull Trick, Alien's Trick

Time Complexity : $\mathcal{O}(N \log (MaxPenalty))$

Editorialist : Udynamo

Approach:

First, let's analyze the teleport score formula. Let's expand the math for a teleport from $(r_1, c_1)$ to $(r_2, c_2)$ :

$\text{Score} = (\Delta r + \Delta c)^2 - 2 \cdot \Delta r \cdot \Delta c + A[r_2] + B[c_2]$

Expand the squared term:

$\text{Score} = \Delta r^2 + 2 \cdot \Delta r \cdot \Delta c + \Delta c^2 - 2 \cdot \Delta r \cdot \Delta c + A[r_2] + B[c_2]$

The $2 \cdot \Delta r \cdot \Delta c$ terms cancel out perfectly, leaving us with:

$\text{Score} = \Delta r^2 + A[r_2] + \Delta c^2 + B[c_2]$
$\text{Score} = (r_2 - r_1)^2 + A[r_2] + (c_2 - c_1)^2 + B[c_2]$

Notice that the row scores and column scores are now completely independent. This means we don't need to solve a complex 2D grid problem! We can solve a 1D problem for the rows, solve the exact same 1D problem for the columns using dynamic programming, and simply add the two minimum scores together.

Let's formulate the 1D problem: What is the minimum score to reach index $i$ in exactly $k$ teleports? Let $dp[k][i]$ be the minimum score to reach index $i$ using exactly $k$ teleports.

$dp[k][i] = \min_{j < i} (dp[k-1][j] + (i - j)^2 + A[i])$

Calculating this naively takes $\mathcal{O}(N)$ transitions for each of the $N$ states, over $K$ steps, resulting in an $\mathcal{O}(N^2 \cdot K)$ time complexity. This is far too slow for $N, K \le 10^5$ . We need to optimize.

Optimization 1: Convex Hull Trick

Let's expand the distance squared term in our transition: $(i - j)^2 = i^2 - 2ij + j^2$ . Substituting this back into the DP:

$dp[k][i] = A[i] + i^2 + \min_{j < i} (\underbrace{-2j}_{m} \cdot \underbrace{i}_{x} + \underbrace{j^2 + dp[k-1][j]}_{c})$

This is exactly the equation of a line: $y = mx + c$ . Since $j$ is strictly increasing, the slopes $m = -2j$ are strictly decreasing. Furthermore, our query points $x = i$ are strictly increasing. Because both the slopes and the query points are monotonic, we can maintain the lower envelope of these lines using the Convex Hull Trick (CHT) with a Monotone Queue (or a Li Chao Tree) in $\mathcal{O}(1)$ amortized time per state.

This reduces the complexity to $\mathcal{O}(N \cdot K)$ , but since $K$ can be up to $N$ , the worst-case time is still $\mathcal{O}(N^2)$ .

Optimization 2: Alien's Trick / WQS Binary Search

To eliminate the $K$ dimension entirely, we can use WQS Binary Search (also known as Alien's Trick). Let $f(k)$ be the minimal score using exactly $k$ teleports. For Alien's Trick to work, the objective sequence $f(k)$ must be convex: the score saved by adding an extra teleport strictly diminishes as $k$ increases, which is mathematically expressed as $f(k-1) - f(k) \ge f(k) - f(k+1)$ .

Our score is $C(j, i) = (i - j)^2 + A[i]$ . If we test the Quadrangle Inequality for $a < b \le c < d$ :

$C(a, c) + C(b, d) \le C(a, d) + C(b, c)$

Expanding this algebraically, the landing scores ( $A[c]$ and $A[d]$ ) perfectly cancel out, leaving us with:

$2(a - b)(d - c) < 0$

Since $a < b$ and $c < d$ , this product is always strictly negative. This proves that our score kernel satisfies the Monge property.

While the Monge property strictly proves that DP decision points are monotonic (which justifies standard DP optimizations like Divide and Conquer), it does not automatically prove $f(k)$ is convex on its own. However, Alien's Trick is a separate technique that applies because the count-vs-penalty behavior under Lagrangian relaxation behaves nicely here. The algebraic cancellation of $A[i]$ shows that the landing scores act purely as independent node weights, preventing them from creating "weird dips" in the curve. In practice, the convexity of $f(k)$ for this distance-squared penalty is a known property that can be safely verified via stress testing.

Assuming $f(k)$ is convex, we can drop the $K$ dimension by introducing a linear penalty $\lambda$ for every teleport taken:

$dp[i] = \min_{j < i} (dp[j] + (i - j)^2 + A[i] + \lambda)$
High $\lambda$ (penalty): The DP takes fewer teleports to avoid the tax.
Low $\lambda$ (subsidy): The DP takes more teleports to collect the reward.

Let $v(\lambda)$ be the minimal score and $c(\lambda)$ be the number of teleports used to achieve it. Crucially, to handle collinear points on the convex hull, we must implement consistent tie-breaking. If multiple transitions yield the same minimal penalized score, we should prefer the transition that maximizes the teleport count $c(\lambda)$ (by storing pairs of (value, count)).

We compute these in $\mathcal{O}(N)$ using our CHT DP. We binary search over $\lambda$ to find the optimal $\lambda_{opt}$ where $c(\lambda_{opt}) \ge K$ . Because we broke ties by maximizing the count, our final recovery formula is safely $v(\lambda_{opt}) - \lambda_{opt} K$ , which perfectly recovers the exact answer even if $c(\lambda_{opt}) > K$ due to collinearity.

Because calculating $v(\lambda)$ and $c(\lambda)$ with the CHT dynamic programming solution will take $\mathcal{O}(N)$ time, the total time complexity is $\mathcal{O}(N \log(\text{MaxPenalty}))$ per testcase, which easily passes!

Setter's Code:

//Author : Udynamo
#include <bits/stdc++.h>
#define int long long
using namespace std;
 
const int INF = 1e18;
 
struct CHT {
    struct Line {
        int m, b, ind;
    };
 
    deque<Line> hull;
 
    bool bad(Line l1, Line l2, Line l3) {
        return (l3.b - l1.b) * (l1.m - l2.m) 
            <= (l2.b - l1.b) * (l1.m - l3.m);
    }
 
    void add(int m, int b, int ind) {
        Line L = {m, b, ind};
        while (hull.size() >= 2 && bad(hull[hull.size() - 2], hull.back(), L))
            hull.pop_back();
        hull.push_back(L);
    }
 
    int value(Line L, int x) {
        return L.m * x + L.b;
    }
 
    pair<int, int> query(int x) {
        if (hull.empty()) return {INF, 0}; 
        while (hull.size() >= 2 && value(hull[0], x) >= value(hull[1], x))
            hull.pop_front();
        return {value(hull[0], x), hull[0].ind};
    }
};
 
pair<int, int> solve_lambda(int lmb, vector<int>& cost){
    int n = cost.size();
    vector<pair<int, int>> dp(n); 
    
    dp[0] = {0, 0}; 
    
    CHT cht;
    cht.add(0, dp[0].first, 0); 
    
    for(int i = 1; i < n; i++){
        auto mn = cht.query(i);
        dp[i] = {mn.first + cost[i] + (i * i) + lmb, dp[mn.second].second + 1};
        cht.add(-2 * i, dp[i].first + (i * i), i);
    }
    return dp[n - 1];
}
 
int solve_1D(int N, int K, vector<int>& cost){
    int l = -2e10, r = 2e10, ans = 0; 
    
    while(l <= r){
        int mid = l + (r - l) / 2; 
        
        auto [val, cnt] = solve_lambda(mid, cost);
        if(cnt >= K){
            ans = mid;
            l = mid + 1;
        }else{
            r = mid - 1;
        }
    }
    auto [val, cnt] = solve_lambda(ans, cost);
    return val - ans * K;
}
 
void UNsolve(){
 
    int N, K;
    cin >> N >> K;
 
    vector<int> A(N), B(N);
    for (int i = 0; i < N; ++i) cin >> A[i];
    for (int i = 0; i < N; ++i) cin >> B[i];
 
    int dpA = solve_1D(N, K, A);
    int dpB = solve_1D(N, K, B);
 
    cout << dpA + dpB << endl;
}
 
signed main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int t;
    cin >> t;
    while(t--){
        UNsolve();
    }
 
    return 0;
}

Tester's Code:

#Author: nagajahnavidann1 >_<
#code well!
import sys
import time
from collections import deque

MOD = 10**9 + 7
INF = 10**18


def fun(p, x):
    return p[0] * x + p[1]

def cmp(l1, l2, l3):
    return (l2[1] - l1[1]) * (l1[0] - l3[0]) >= (l3[1] - l1[1]) * (l1[0] - l2[0])


def rn(lam, arr, n):
    dp = [0] * (n + 1)
    stp = [0] * (n + 1)
    cht = deque()
    id_dq = deque()

    dp[1] = 0
    stp[1] = 0

    cht.append((-2, 1))
    id_dq.append(1)

    for i in range(2, n + 1):
        while len(cht) >= 2 and fun(cht[0], i) >= fun(cht[1], i):
            cht.popleft()
            id_dq.popleft()

        j = id_dq[0]
        dp[i] = arr[i - 1] + i * i + lam + fun(cht[0], i)
        stp[i] = stp[j] + 1

        cur = (-2 * i, dp[i] + i * i)

        while len(cht) >= 2 and cmp(cht[-2], cht[-1], cur):
            cht.pop()
            id_dq.pop()

        cht.append(cur)
        id_dq.append(i)

    return dp[n], stp[n]

def main():
    # Fast I/O
    input_data = sys.stdin.read().split()
    if not input_data:
        return
    
    t = int(input_data[0])
    idx = 1
    
    start_time = time.perf_counter()
    
    for _ in range(t):
        n = int(input_data[idx])
        k = int(input_data[idx+1])
        idx += 2
        
        a = [int(x) for x in input_data[idx : idx + n]]
        idx += n
        
        b = [int(x) for x in input_data[idx : idx + n]]
        idx += n
        
        ans = 0
        
        h = 2 * 10**12
        l = -2 * 10**12
        bp = 0
        
        while l <= h:
            lam = l + (h - l) // 2
            res_dp, res_stp = rn(lam, a, n)
            
            if res_stp >= k:
                l = lam + 1
                bp = lam
            else:
                h = lam - 1
                
        res_dp, res_stp = rn(bp, a, n)
        ans += res_dp - bp * k
        
        h = 2 * 10**12
        l = -2 * 10**12
        bp = h
        
        while l <= h:
            lam = l + (h - l) // 2
            res_dp, res_stp = rn(lam, b, n)
            
            if res_stp >= k:
                l = lam + 1
                bp = lam
            else:
                h = lam - 1
                
        res_dp, res_stp = rn(bp, b, n)
        ans += res_dp - bp * k
        
        print(ans)
        
    end_time = time.perf_counter()
    sys.stderr.write(f"Time Taken : {end_time - start_time:.6f}\n")

if __name__ == '__main__':
    main()

Editorial for Problem E

Dynamo and the Cosmic Grid

Approach:

Setter's Code:

Tester's Code: