Skip to main content

Root-Finding Algorithms - Bisection, Newton-Raphson, and ML Applications

Reading time: ~18 minutes | Level: Numerical Methods → Optimization

You want to find the learning rate where the gradient norm equals a target value. You want to find the temperature parameter where the KL divergence between two distributions equals a budget. You implement layer normalization and need to find the scale that makes the output variance equal 1.

These are all root-finding problems: find xx such that f(x)=0f(x) = 0 (or equivalently, f(x)=cf(x) = c after shifting). Root-finding algorithms are the workhorses of optimization, calibration, and constraint satisfaction in production ML systems.

What You Will Learn

  • Bisection method: guaranteed convergence, linear rate
  • Newton-Raphson method: quadratic convergence, fragility
  • Secant method: superlinear convergence without derivatives
  • Convergence analysis: linear, superlinear, quadratic
  • Fixed-point iterations and their connection to ML optimizers
  • ML applications: learning rate search, quantile finding, calibration

Part 1 - The Bisection Method

The idea

Given f(a)<0<f(b)f(a) < 0 < f(b) (sign change), by the intermediate value theorem there is a root in [a,b][a, b]. Bisection repeatedly halves the interval:

  1. Compute c=(a+b)/2c = (a + b)/2
  2. If f(c)=0f(c) = 0: done
  3. If f(a)f(c)<0f(a) \cdot f(c) < 0: root is in [a,c][a, c] → set b=cb = c
  4. Else: root is in [c,b][c, b] → set a=ca = c
  5. Repeat until ba<tolerance|b - a| < \text{tolerance}

Convergence: Linear - error halves each step. After nn iterations, error (ba)/2n\leq (b-a)/2^n.

import numpy as np
from typing import Callable, Optional

def bisection(
f: Callable,
a: float,
b: float,
tol: float = 1e-10,
maxiter: int = 100
) -> tuple:
"""
Bisection root-finding method.
Guaranteed to converge if f(a) and f(b) have opposite signs.

Returns: (root, n_iterations, converged)
"""
if f(a) * f(b) > 0:
raise ValueError(f"f(a)={f(a):.4f} and f(b)={f(b):.4f} must have opposite signs")

history = []

for n in range(maxiter):
c = (a + b) / 2.0
fc = f(c)
history.append({'c': c, 'f(c)': fc, 'interval_width': b - a})

if abs(fc) < tol or (b - a) / 2 < tol:
return c, n + 1, True, history

if f(a) * fc < 0:
b = c # Root is in left half
else:
a = c # Root is in right half

return (a + b) / 2.0, maxiter, False, history

# Example: find √2 as root of f(x) = x² - 2
f = lambda x: x**2 - 2.0
root, n_iters, converged, hist = bisection(f, 1.0, 2.0, tol=1e-12)
print(f"Root: {root:.12f}")
print(f"True: {np.sqrt(2):.12f}")
print(f"Iterations: {n_iters}")
print(f"Error: {abs(root - np.sqrt(2)):.2e}")

# Show convergence rate (should halve each iteration)
for i, step in enumerate(hist[:8]):
print(f"Iter {i+1}: c = {step['c']:.8f}, |f(c)| = {abs(step['f(c)']):.2e}")

Key property: Bisection always converges for continuous ff with a sign-bracketed interval. It is the method of last resort when you need guaranteed convergence and other methods fail.

Part 2 - Newton-Raphson Method

The idea

Newton-Raphson uses both f(x)f(x) and f(x)f'(x) to take a better step. The linear approximation of ff near xkx_k:

f(x)f(xk)+f(xk)(xxk)=0f(x) \approx f(x_k) + f'(x_k)(x - x_k) = 0

Solving for xx:

xk+1=xkf(xk)f(xk)x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}

Convergence analysis

For a simple root xx^* (where f(x)=0f(x^*) = 0 and f(x)0f'(x^*) \neq 0), Newton-Raphson has quadratic convergence:

xk+1xCxkx2|x_{k+1} - x^*| \leq C \cdot |x_k - x^*|^2

This means: once you are close to the root, the number of correct decimal digits doubles each iteration. Going from 4 correct digits to 8 to 16 - convergence is extremely fast.

import numpy as np

def newton_raphson(
f: Callable,
df: Callable,
x0: float,
tol: float = 1e-12,
maxiter: int = 50
) -> tuple:
"""
Newton-Raphson root-finding.
Requires f and its derivative df.

Quadratic convergence near the root.
Can diverge if starting far from root or if f'(x) ≈ 0.
"""
x = x0
history = []

for n in range(maxiter):
fx = f(x)
history.append({'x': x, 'f(x)': fx})

if abs(fx) < tol:
return x, n + 1, True, history

dfx = df(x)
if abs(dfx) < 1e-14:
raise ValueError(f"Near-zero derivative at x={x:.6f} - Newton-Raphson failed")

x = x - fx / dfx # Newton step

return x, maxiter, False, history

# Example: f(x) = x² - 2, f'(x) = 2x
f = lambda x: x**2 - 2.0
df = lambda x: 2.0 * x

root, n_iters, converged, hist = newton_raphson(f, df, x0=1.5, tol=1e-14)
print(f"Root: {root:.14f}")
print(f"Iterations: {n_iters}") # Typically 4-6 for this problem

# Observe quadratic convergence
true_root = np.sqrt(2)
print("\nQuadratic convergence:")
for i, step in enumerate(hist):
err = abs(step['x'] - true_root)
print(f"Iter {i}: error = {err:.2e}")
# Errors: 1e-1 → 1e-2 → 1e-4 → 1e-8 → 1e-16 (digits double!)

Newton-Raphson in multiple dimensions

For systems F(x)=0F(x) = 0 where F:RnRnF: \mathbb{R}^n \to \mathbb{R}^n, Newton's method uses the Jacobian:

xk+1=xkJF(xk)1F(xk)x_{k+1} = x_k - J_F(x_k)^{-1} F(x_k)

Or equivalently (avoid explicit inverse): solve JF(xk)δ=F(xk)J_F(x_k) \delta = -F(x_k), then xk+1=xk+δx_{k+1} = x_k + \delta.

def newton_system(
F: Callable,
J_F: Callable,
x0: np.ndarray,
tol: float = 1e-10,
maxiter: int = 50
) -> np.ndarray:
"""
Newton's method for nonlinear systems F(x) = 0.
J_F: Jacobian function x -> J (n x n matrix)
"""
x = x0.copy()

for _ in range(maxiter):
Fx = F(x)
if np.linalg.norm(Fx) < tol:
break

J = J_F(x)
# Solve J @ delta = -F(x) - do NOT use J_inv!
delta = np.linalg.solve(J, -Fx)
x = x + delta

return x

When Newton-Raphson fails

# Failure mode 1: Oscillation (cycle)
f_cycle = lambda x: x**3 - 2*x + 2
df_cycle = lambda x: 3*x**2 - 2
# Starting at x=0: Newton step goes to x=1, then back to x=0 - infinite loop!

# Failure mode 2: Divergence (wrong basin of attraction)
f_div = lambda x: np.arctan(x)
df_div = lambda x: 1.0 / (1 + x**2)
# For x0 far from origin: converges to wrong root or diverges

# Safeguard: Combine Newton with bisection (Brent's method)
from scipy.optimize import brentq, newton

# scipy.optimize.brentq: guaranteed convergence + fast near root
# Best of both worlds for 1D problems
root = brentq(lambda x: x**2 - 2, 1.0, 2.0, xtol=1e-12)
print(f"Brent's method: {root:.12f}")

Part 3 - Secant Method

The secant method approximates f(x)f'(x) using two previous iterates:

f(xk)f(xk)f(xk1)xkxk1f'(x_k) \approx \frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}

Substituting into the Newton step:

xk+1=xkf(xk)xkxk1f(xk)f(xk1)x_{k+1} = x_k - f(x_k) \cdot \frac{x_k - x_{k-1}}{f(x_k) - f(x_{k-1})}

No derivative needed. Requires two initial points x0,x1x_0, x_1.

Convergence: Superlinear - convergence order 1.618\approx 1.618 (the golden ratio). Faster than bisection, slower than Newton.

def secant_method(
f: Callable,
x0: float,
x1: float,
tol: float = 1e-12,
maxiter: int = 50
) -> tuple:
"""
Secant method - derivative-free, superlinear convergence.
Requires two starting points but no analytical derivative.
"""
history = [(x0, f(x0)), (x1, f(x1))]

for n in range(maxiter):
f0, f1 = f(x0), f(x1)

if abs(f1 - f0) < 1e-14:
break # Near-parallel secant line - numerical issue

# Secant step
x2 = x1 - f1 * (x1 - x0) / (f1 - f0)
history.append((x2, f(x2)))

if abs(x2 - x1) < tol or abs(f(x2)) < tol:
return x2, n + 1, True, history

x0, x1 = x1, x2

return x1, maxiter, False, history

# Example: find root of f(x) = cos(x) - x (the Dottie number ≈ 0.7391)
f = lambda x: np.cos(x) - x
root, n_iters, converged, _ = secant_method(f, 0.0, 1.0)
print(f"Root (Dottie number): {root:.10f}") # ≈ 0.7390851332
print(f"Iterations: {n_iters}") # ~8-10 iterations

Part 4 - Convergence Order Comparison

Convergence orders:

Method | Order | Evals/step | When to use
─────────────────┼───────┼────────────┼─────────────────────────
Bisection | 1.0 | 1 | Guaranteed convergence, no derivative
Newton-Raphson | 2.0 | 1 | Fast, have derivative, near root
Secant | 1.618 | 1 | Fast, no derivative
Brent's method | 1.618 | 1 | Guaranteed + fast (bracket + secant)
Halley's method | 3.0 | 1 | Have f, f', f'' - very fast

Error reduction per iteration:
Bisection: |e_{k+1}| = 0.5 × |e_k|
Secant: |e_{k+1}| = C × |e_k|^1.618
Newton: |e_{k+1}| = C × |e_k|^2

After 10 iterations starting with error 0.1:
Bisection: 0.1 / 2^10 ≈ 10^-4
Secant: (0.1)^(1.618^10) ≈ 10^-11 ← practically at machine precision
Newton: (0.1)^(2^10) ≈ 10^-1024 ← beyond float range after 4-5 iterations

Part 5 - ML Applications

Finding the optimal learning rate

One practical application of bisection: finding the learning rate where the gradient norm crosses a threshold (learning rate finder algorithm):

import numpy as np

def learning_rate_finder(
model,
dataloader,
loss_fn,
target_loss_ratio: float = 3.0, # Stop when loss is 3x initial loss
min_lr: float = 1e-7,
max_lr: float = 10.0,
n_steps: int = 100
) -> float:
"""
Find the maximum safe learning rate using linear sweep + exponential schedule.
Uses root-finding concept: find LR where loss starts to diverge.
"""
from copy import deepcopy
import torch

initial_state = deepcopy(model.state_dict())

lrs = np.exp(np.linspace(np.log(min_lr), np.log(max_lr), n_steps))
losses = []
initial_loss = None

for step, lr in enumerate(lrs):
batch = next(iter(dataloader))

# Take one step at this learning rate
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
loss = loss_fn(model(batch['x']), batch['y'])

if initial_loss is None:
initial_loss = loss.item()

losses.append(loss.item())

if loss.item() > target_loss_ratio * initial_loss or np.isnan(loss.item()):
# Divergence detected - last stable LR is a few steps back
optimal_lr = lrs[max(0, step - 3)]
model.load_state_dict(initial_state)
return optimal_lr

optimizer.zero_grad()
loss.backward()
optimizer.step()

model.load_state_dict(initial_state)
return lrs[np.argmin(losses)] # LR at minimum loss


def find_temperature_by_bisection(
log_probs: np.ndarray,
target_entropy: float,
tol: float = 1e-6
) -> float:
"""
Find temperature T such that entropy of softmax(log_probs / T) = target_entropy.
Used in knowledge distillation and policy entropy regularization.

Entropy is monotonically increasing in T → root-finding is valid.
"""
def entropy_deficit(log_T: float) -> float:
T = np.exp(log_T)
# Compute softmax(log_probs / T)
scaled = log_probs / T
probs = np.exp(scaled - np.max(scaled))
probs /= probs.sum()
entropy = -np.sum(probs * np.log(probs + 1e-12))
return entropy - target_entropy

# Bisection in log-temperature space
log_T_root, _, _, _ = bisection(
entropy_deficit, a=-5.0, b=5.0, tol=tol
)
return np.exp(log_T_root)


def find_calibration_threshold(
probabilities: np.ndarray,
labels: np.ndarray,
target_fpr: float = 0.05
) -> float:
"""
Find decision threshold where False Positive Rate = target_fpr.
Classic root-finding application in classifier calibration.
"""
from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(labels, probabilities)

# Find threshold where fpr ≈ target_fpr using bisection
def fpr_at_threshold(t: float) -> float:
preds = (probabilities >= t).astype(int)
fp = np.sum((preds == 1) & (labels == 0))
n = np.sum(labels == 0)
return fp / n - target_fpr # We want this = 0

threshold, _, _, _ = bisection(fpr_at_threshold, a=0.0, b=1.0, tol=1e-6)
return threshold

Fixed-point iterations and ML optimizers

Many ML algorithms can be viewed as fixed-point iterations:

xk+1=g(xk)x_{k+1} = g(x_k)

where we are finding a fixed point x=g(x)x^* = g(x^*) (equivalently, a root of f(x)=xg(x)f(x) = x - g(x)).

Banach fixed-point theorem: If gg is a contraction (g(x)g(y)Lxy\|g(x) - g(y)\| \leq L\|x - y\| with L<1L < 1), then the iteration converges to a unique fixed point.

import numpy as np

# Gradient descent as fixed-point iteration
# x_{k+1} = x_k - η∇f(x_k) = g(x_k)
# Fixed point: g(x*) = x* → η∇f(x*) = 0 → x* is a critical point

# RMSprop/Adam as preconditioned fixed-point iteration
# Fixed point: adapted step such that accumulated gradient statistics balance

def power_iteration(A: np.ndarray, n_iters: int = 100) -> tuple:
"""
Power iteration: fixed-point method to find the dominant eigenvector.
x_{k+1} = A x_k / ‖A x_k‖
Fixed point: eigenvector corresponding to largest eigenvalue.
"""
n = A.shape[0]
v = np.random.randn(n)
v = v / np.linalg.norm(v)

for _ in range(n_iters):
Av = A @ v
eigenvalue = v @ Av # Rayleigh quotient
v = Av / np.linalg.norm(Av)

return v, v @ A @ v # (eigenvector, eigenvalue)

# Example: find largest eigenvalue of a matrix
A = np.array([[4.0, 1.0, 0.5],
[1.0, 3.0, 0.2],
[0.5, 0.2, 2.0]])
v, lambda_max = power_iteration(A)
print(f"Largest eigenvalue: {lambda_max:.6f}")
print(f"numpy.linalg.eigvals: {np.max(np.linalg.eigvals(A)):.6f}")

Part 6 - SciPy Root-Finding Interface

from scipy.optimize import brentq, newton, bisect, root_scalar

# 1D root finding - use root_scalar for a unified interface
def find_root_demo():
f = lambda x: x**3 - 2*x - 5 # Root near x ≈ 2.0946

# Method 1: Brent's method (recommended for bracketed 1D)
result = root_scalar(f, method='brentq', bracket=[1, 3])
print(f"Brent's: {result.root:.10f} in {result.iterations} iters")

# Method 2: Newton-Raphson (requires derivative, faster if available)
df = lambda x: 3*x**2 - 2
result = root_scalar(f, method='newton', x0=2.0, fprime=df)
print(f"Newton: {result.root:.10f} in {result.iterations} iters")

# Method 3: Secant (no derivative needed)
result = root_scalar(f, method='secant', x0=1.5, x1=2.5)
print(f"Secant: {result.root:.10f} in {result.iterations} iters")

find_root_demo()

# Multi-dimensional systems
from scipy.optimize import fsolve, root

def solve_nonlinear_system():
"""
Solve the system:
f1(x, y) = x² + y² - 4 = 0 (circle of radius 2)
f2(x, y) = x - y = 0 (diagonal)
Solution: (√2, √2) and (-√2, -√2)
"""
def system(vars):
x, y = vars
return [x**2 + y**2 - 4, x - y]

def jacobian(vars):
x, y = vars
return [[2*x, 2*y],
[1, -1]]

# fsolve: classic Newton-Krylov based solver
sol = fsolve(system, x0=[1.0, 1.5], fprime=jacobian)
print(f"Solution: ({sol[0]:.6f}, {sol[1]:.6f})") # (1.41421, 1.41421)

solve_nonlinear_system()

Interview Questions

Q1: Compare bisection, Newton-Raphson, and secant methods. When would you use each?
MethodConvergence OrderDerivative NeededBracketingRecommendation
Bisection1 (linear)NoRequiredGuaranteed convergence, last resort
Newton-Raphson2 (quadratic)YesNoFastest when derivative available
Secant1.618 (superlinear)NoNoGood balance, no derivative needed
Brent's1.618 + bisectionNoRequiredBest practice for 1D problems

Use bisection when: You need guaranteed convergence and have a sign-bracketed interval. Tolerance requirements are modest (1-6 decimal places). Example: finding a decision threshold in a classifier.

Use Newton-Raphson when: You have an analytical derivative and a good starting guess near the root. Fast convergence is critical. Example: solving the Fisher information system in second-order optimization.

Use secant when: Derivative is unavailable or expensive, but you can evaluate ff at two nearby points. Example: bisecting in a smooth parameter space where finite differences are cheap.

In practice: scipy.optimize.root_scalar with method='brentq' (for bracketed problems) or 'newton' (for well-specified problems with derivatives). Brent's method gets the best of bisection (guaranteed convergence) and secant (fast near root).

Q2: Why does Newton-Raphson have quadratic convergence and what can break it?

Quadratic convergence proof sketch:

Let xx^* be the root: f(x)=0f(x^*) = 0. Newton step: xk+1=xkf(xk)/f(xk)x_{k+1} = x_k - f(x_k)/f'(x_k).

Error at step k+1k+1: ek+1=xk+1x=f(ξ)2f(xk)ek2e_{k+1} = x_{k+1} - x^* = \frac{f''(\xi)}{2 f'(x_k)} e_k^2

where ξ\xi is between xkx_k and xx^* (by Taylor expansion). For xkx_k near xx^*, f(xk)f(x)f'(x_k) \approx f'(x^*) which is bounded away from zero. So:

ek+1Cek2,C=f(x)2f(x)|e_{k+1}| \leq C |e_k|^2, \quad C = \frac{|f''(x^*)|}{2|f'(x^*)|}

The error squares each iteration - quadratic convergence.

What breaks it:

  1. Poor initial guess: Far from root, Newton step can overshoot, oscillate, or diverge. No convergence guarantee without bracket.

  2. Near-zero derivative: If f(xk)0f'(x_k) \approx 0 (near a local extremum), the step f/ff/f' is enormous → divergence. Protect with a maximum step size.

  3. Multiple roots: At a root where f(x)=f(x)=0f(x^*) = f'(x^*) = 0 (multiplicity > 1), Newton degrades to linear convergence, not quadratic.

  4. Discontinuous derivative: Quadratic convergence requires ff'' to be bounded. Functions with sharp kinks may not converge quadratically.

Safeguards: Use Brent's method (combines Newton/secant with bisection bracket), or add a maximum step size, or use line search to ensure each step reduces f(x)|f(x)|.

Q3: What is a fixed-point iteration and how does gradient descent fit this framework?

A fixed-point iteration finds xx^* satisfying x=g(x)x^* = g(x^*) by iterating xk+1=g(xk)x_{k+1} = g(x_k).

Any root-finding problem f(x)=0f(x) = 0 can be converted: define g(x)=xαf(x)g(x) = x - \alpha f(x) for some α0\alpha \neq 0. Then x=g(x)x^* = g(x^*) iff f(x)=0f(x^*) = 0.

Gradient descent as fixed-point iteration: Minimizing L(θ)L(\theta): θk+1=θkηL(θk)=g(θk)\theta_{k+1} = \theta_k - \eta \nabla L(\theta_k) = g(\theta_k)

Fixed point: g(θ)=θg(\theta^*) = \theta^* iff L(θ)=0\nabla L(\theta^*) = 0 - a critical point of LL.

Convergence condition (Banach fixed-point theorem): The iteration converges if and only if gg is a contraction in a neighborhood of xx^*: g(x)op<1\|g'(x)\|_{op} < 1 for xx near xx^*.

For gradient descent: g(θ)=IηHL(θ)g'(\theta) = I - \eta H_L(\theta) where HLH_L is the Hessian. The spectral radius condition ρ(IηH)<1\rho(I - \eta H) < 1 gives the stability condition:

0<η<2λmax(H)0 < \eta < \frac{2}{\lambda_{\max}(H)}

This is exactly the standard gradient descent convergence condition! The learning rate upper bound is determined by the largest eigenvalue of the Hessian.

Q4: How would you use root-finding to implement a temperature search for model calibration?

Problem: A neural network classifier produces logits zz and you want to find temperature TT such that the calibrated model is well-calibrated (expected calibration error is minimized, or some calibration metric equals a target).

Common formulation: Find TT such that the expected confidence equals the expected accuracy: E[maxksoftmax(z/T)k]=AccuracyE\left[\max_k \text{softmax}(z/T)_k\right] = \text{Accuracy}

Since confidence is monotonically decreasing in TT (higher temperature → more uniform distribution → lower confidence), this is a monotone equation amenable to bisection.

def temperature_scaling_calibration(
logits: np.ndarray, # (n_samples, n_classes)
labels: np.ndarray, # (n_samples,) integer labels
target_nll: float = None
) -> float:
"""
Find optimal temperature T to minimize NLL (negative log-likelihood).
Uses bisection on the derivative of NLL w.r.t. T.

NLL(T) is convex in log(T), so its derivative has exactly one root.
"""
def nll_at_temperature(log_T: float) -> float:
T = np.exp(log_T)
scaled_logits = logits / T
# Numerically stable log-softmax
log_probs = scaled_logits - np.log(np.sum(np.exp(
scaled_logits - scaled_logits.max(axis=1, keepdims=True)
), axis=1, keepdims=True)) - scaled_logits.max(axis=1, keepdims=True)
return -np.mean(log_probs[np.arange(len(labels)), labels])

def dnll_dlogT(log_T: float) -> float:
"""Derivative of NLL w.r.t. log(T) - root = optimal temperature."""
delta = 1e-5
return (nll_at_temperature(log_T + delta) - nll_at_temperature(log_T - delta)) / (2 * delta)

# Bisection to find log_T where dNLL/d(log T) = 0
optimal_log_T, _, _, _ = bisection(dnll_dlogT, a=-3.0, b=3.0)
return np.exp(optimal_log_T)

This is the core of temperature scaling - a simple post-hoc calibration method that achieves state-of-the-art calibration on many classification benchmarks by solving a single 1D root-finding problem.

Q5: What is Brent's method and why is it preferred over both bisection and Newton-Raphson?

Brent's method (also called Brent-Dekker) combines three techniques:

  1. Bisection: Guaranteed convergence when the other methods are not safe
  2. Secant method: Superlinear convergence when conditions are favorable
  3. Inverse quadratic interpolation: Even faster convergence with 3 points

How it works: At each step, Brent's method tries the secant/quadratic interpolation step. If that step falls within the current bracket and makes sufficient progress, it uses it. Otherwise, it falls back to bisection. The algorithm always maintains a valid bracket.

Why it's preferred:

  1. Guaranteed convergence: Always converges because bisection is the fallback - unlike pure Newton/secant which can diverge.

  2. Superlinear convergence: In practice converges nearly as fast as secant/Newton near the root.

  3. No derivative required: Uses only function evaluations.

  4. Robustly handles discontinuities and sharp features: Unlike Newton which requires smooth ff.

scipy implementation: scipy.optimize.brentq(f, a, b) - this is the recommended default for all 1D root-finding in Python. It is what scipy.optimize.root_scalar(method='brentq') uses.

When to use Newton over Brent: Only when you have an analytic derivative and a close initial guess. Newton's quadratic convergence can reach machine precision in 4-5 iterations while Brent needs 8-15. For truly tight tolerances (< 10⁻¹⁴) with smooth functions, Newton-Raphson's derivative-guided steps win.

Quick Reference

MethodConvergenceRequiresBest for
BisectionO(1/2n)O(1/2^n)Bracket, fGuaranteed convergence, coarse problems
Newton-RaphsonO(e2)O(\|e\|^2)f, f', good x₀Fast with derivative, smooth f
SecantO(e1.618)O(\|e\|^{1.618})f, two x₀Moderate, no derivative
Brent'sO(e1.618)O(\|e\|^{1.618}) + safeBracket, fBest general 1D choice
# Recommended: scipy.optimize.brentq for 1D problems
from scipy.optimize import brentq, root_scalar

# Bracketed (most reliable)
root = brentq(f, a, b, xtol=1e-12)

# With derivative (fastest for smooth f)
root = root_scalar(f, method='newton', x0=x_init, fprime=df).root

# System of equations
from scipy.optimize import fsolve
solution = fsolve(F_system, x0, fprime=J_system)

Next: Lesson 07: Sparse Matrix Methods →

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Root-Finding Algorithms demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.