Author : Udynamo
Required Knowledge : Dynamic Programming, Convex Hull Trick, Alien's Trick
Time Complexity :
Editorialist : Udynamo
First, let's analyze the teleport score formula. Let's expand the math for a teleport from to :
Expand the squared term:
The terms cancel out perfectly, leaving us with:
Notice that the row scores and column scores are now completely independent. This means we don't need to solve a complex 2D grid problem! We can solve a 1D problem for the rows, solve the exact same 1D problem for the columns using dynamic programming, and simply add the two minimum scores together.
Let's formulate the 1D problem: What is the minimum score to reach index in exactly teleports?
Let be the minimum score to reach index using exactly teleports.
Calculating this naively takes transitions for each of the states, over steps, resulting in an time complexity. This is far too slow for . We need to optimize.
Optimization 1: Convex Hull Trick
Let's expand the distance squared term in our transition: .
Substituting this back into the DP:
This is exactly the equation of a line: .
Since is strictly increasing, the slopes are strictly decreasing. Furthermore, our query points are strictly increasing. Because both the slopes and the query points are monotonic, we can maintain the lower envelope of these lines using the Convex Hull Trick (CHT) with a Monotone Queue (or a Li Chao Tree) in amortized time per state.
This reduces the complexity to , but since can be up to , the worst-case time is still .
Optimization 2: Alien's Trick / WQS Binary Search
To eliminate the dimension entirely, we can use WQS Binary Search (also known as Alien's Trick). Let be the minimal score using exactly teleports. For Alien's Trick to work, the objective sequence must be convex: the score saved by adding an extra teleport strictly diminishes as increases, which is mathematically expressed as .
Our score is . If we test the Quadrangle Inequality for :
Expanding this algebraically, the landing scores ( and ) perfectly cancel out, leaving us with:
Since and , this product is always strictly negative. This proves that our score kernel satisfies the Monge property.
While the Monge property strictly proves that DP decision points are monotonic (which justifies standard DP optimizations like Divide and Conquer), it does not automatically prove is convex on its own. However, Alien's Trick is a separate technique that applies because the count-vs-penalty behavior under Lagrangian relaxation behaves nicely here. The algebraic cancellation of shows that the landing scores act purely as independent node weights, preventing them from creating "weird dips" in the curve. In practice, the convexity of for this distance-squared penalty is a known property that can be safely verified via stress testing.
Assuming is convex, we can drop the dimension by introducing a linear penalty for every teleport taken:
High (penalty): The DP takes fewer teleports to avoid the tax.
Low (subsidy): The DP takes more teleports to collect the reward.
Let be the minimal score and be the number of teleports used to achieve it. Crucially, to handle collinear points on the convex hull, we must implement consistent tie-breaking. If multiple transitions yield the same minimal penalized score, we should prefer the transition that maximizes the teleport count (by storing pairs of (value, count)).
We compute these in using our CHT DP. We binary search over to find the optimal where . Because we broke ties by maximizing the count, our final recovery formula is safely , which perfectly recovers the exact answer even if due to collinearity.
Because calculating and with the CHT dynamic programming solution will take time, the total time complexity is per testcase, which easily passes!
//Author : Udynamo
#include <bits/stdc++.h>
#define int long long
using namespace std;
const int INF = 1e18;
struct CHT {
struct Line {
int m, b, ind;
};
deque<Line> hull;
bool bad(Line l1, Line l2, Line l3) {
return (l3.b - l1.b) * (l1.m - l2.m)
<= (l2.b - l1.b) * (l1.m - l3.m);
}
void add(int m, int b, int ind) {
Line L = {m, b, ind};
while (hull.size() >= 2 && bad(hull[hull.size() - 2], hull.back(), L))
hull.pop_back();
hull.push_back(L);
}
int value(Line L, int x) {
return L.m * x + L.b;
}
pair<int, int> query(int x) {
if (hull.empty()) return {INF, 0};
while (hull.size() >= 2 && value(hull[0], x) >= value(hull[1], x))
hull.pop_front();
return {value(hull[0], x), hull[0].ind};
}
};
pair<int, int> solve_lambda(int lmb, vector<int>& cost){
int n = cost.size();
vector<pair<int, int>> dp(n);
dp[0] = {0, 0};
CHT cht;
cht.add(0, dp[0].first, 0);
for(int i = 1; i < n; i++){
auto mn = cht.query(i);
dp[i] = {mn.first + cost[i] + (i * i) + lmb, dp[mn.second].second + 1};
cht.add(-2 * i, dp[i].first + (i * i), i);
}
return dp[n - 1];
}
int solve_1D(int N, int K, vector<int>& cost){
int l = -2e10, r = 2e10, ans = 0;
while(l <= r){
int mid = l + (r - l) / 2;
auto [val, cnt] = solve_lambda(mid, cost);
if(cnt >= K){
ans = mid;
l = mid + 1;
}else{
r = mid - 1;
}
}
auto [val, cnt] = solve_lambda(ans, cost);
return val - ans * K;
}
void UNsolve(){
int N, K;
cin >> N >> K;
vector<int> A(N), B(N);
for (int i = 0; i < N; ++i) cin >> A[i];
for (int i = 0; i < N; ++i) cin >> B[i];
int dpA = solve_1D(N, K, A);
int dpB = solve_1D(N, K, B);
cout << dpA + dpB << endl;
}
signed main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int t;
cin >> t;
while(t--){
UNsolve();
}
return 0;
}
#Author: nagajahnavidann1 >_<
#code well!
import sys
import time
from collections import deque
MOD = 10**9 + 7
INF = 10**18
def fun(p, x):
return p[0] * x + p[1]
def cmp(l1, l2, l3):
return (l2[1] - l1[1]) * (l1[0] - l3[0]) >= (l3[1] - l1[1]) * (l1[0] - l2[0])
def rn(lam, arr, n):
dp = [0] * (n + 1)
stp = [0] * (n + 1)
cht = deque()
id_dq = deque()
dp[1] = 0
stp[1] = 0
cht.append((-2, 1))
id_dq.append(1)
for i in range(2, n + 1):
while len(cht) >= 2 and fun(cht[0], i) >= fun(cht[1], i):
cht.popleft()
id_dq.popleft()
j = id_dq[0]
dp[i] = arr[i - 1] + i * i + lam + fun(cht[0], i)
stp[i] = stp[j] + 1
cur = (-2 * i, dp[i] + i * i)
while len(cht) >= 2 and cmp(cht[-2], cht[-1], cur):
cht.pop()
id_dq.pop()
cht.append(cur)
id_dq.append(i)
return dp[n], stp[n]
def main():
# Fast I/O
input_data = sys.stdin.read().split()
if not input_data:
return
t = int(input_data[0])
idx = 1
start_time = time.perf_counter()
for _ in range(t):
n = int(input_data[idx])
k = int(input_data[idx+1])
idx += 2
a = [int(x) for x in input_data[idx : idx + n]]
idx += n
b = [int(x) for x in input_data[idx : idx + n]]
idx += n
ans = 0
h = 2 * 10**12
l = -2 * 10**12
bp = 0
while l <= h:
lam = l + (h - l) // 2
res_dp, res_stp = rn(lam, a, n)
if res_stp >= k:
l = lam + 1
bp = lam
else:
h = lam - 1
res_dp, res_stp = rn(bp, a, n)
ans += res_dp - bp * k
h = 2 * 10**12
l = -2 * 10**12
bp = h
while l <= h:
lam = l + (h - l) // 2
res_dp, res_stp = rn(lam, b, n)
if res_stp >= k:
l = lam + 1
bp = lam
else:
h = lam - 1
res_dp, res_stp = rn(bp, b, n)
ans += res_dp - bp * k
print(ans)
end_time = time.perf_counter()
sys.stderr.write(f"Time Taken : {end_time - start_time:.6f}\n")
if __name__ == '__main__':
main()