Lecture 08
Fundamental algorithms for scientific computing in Python
| Subpackage | Description | Subpackage | Description | |
|---|---|---|---|---|
cluster |
Clustering algorithms | odr |
Orthogonal distance regression | |
constants |
Physical and mathematical constants | optimize |
Optimization and root-finding routines | |
fftpack |
Fast Fourier Transform routines | signal |
Signal processing | |
integrate |
Integration and ordinary differential equation solvers | sparse |
Sparse matrices and associated routines | |
interpolate |
Interpolation and smoothing splines | spatial |
Spatial data structures and algorithms | |
io |
Input and Output | special |
Special functions | |
linalg |
Linear algebra | stats |
Statistical distributions and functions | |
ndimage |
N-dimensional image processing |
In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, etc. All numerical code would reside in SciPy. However, one of NumPy’s important goals is compatibility, so NumPy tries to retain all features supported by either of its predecessors. Thus, NumPy contains some linear algebra functions and Fourier transforms, even though these more properly belong in SciPy. In any case, SciPy contains more fully-featured versions of the linear algebra modules, as well as many other numerical algorithms. If you are doing scientific computing with Python, you should probably install both NumPy and SciPy. Most new features belong in SciPy rather than NumPy.
The mean (non-squared) Euclidean distance between the observations passed and the centroids generated.
Once we have centroids from kmeans(), we can assign new points to clusters using vq() (vector quantization).
For general numeric integration in 1D we use scipy.integrate.quad(), which takes as arguments: the function to be integrated and then the lower and upper bounds of the integral.
The PDF for a normal distribution is given by,
\[ f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left(-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2 \right) \]
We can check that we’ve implemented a valid pdf by integrating from \(-\infty\) to \(\infty\),
\[ f(x) = \begin{cases} \frac{c}{\sigma \sqrt{2 \pi}} \exp\left(-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2 \right), & \text{for } a \leq x \leq b \\ 0, & \text{otherwise.} \\ \end{cases} \]
def trunc_norm_pdf(x, μ=0, σ=1, a=-np.inf, b=np.inf):
if (b < a):
raise ValueError("b must be greater than a")
x = np.asarray(x)
scalar_input = x.ndim == 0
x = np.atleast_1d(x)
full_pdf = (1/(σ * np.sqrt(2*np.pi))) * np.exp(-0.5 * ((x - μ)/σ)**2)
full_pdf[(x < a) | (x > b)] = 0
return full_pdf[0] if scalar_input else full_pdfdef trunc_norm_pdf(x, μ=0, σ=1, a=-np.inf, b=np.inf):
if (b < a):
raise ValueError("b must be greater than a")
x = np.asarray(x)
scalar_input = x.ndim == 0
x = np.atleast_1d(x)
nc = 1 / quad(lambda x: norm_pdf(x, μ, σ), a, b)[0]
full_pdf = nc * (1/(σ * np.sqrt(2*np.pi))) * np.exp(-0.5 * ((x - μ)/σ)**2)
full_pdf[(x < a) | (x > b)] = 0
return full_pdf[0] if scalar_input else full_pdf\[ f(\bf{x}) = \det{(2\pi\Sigma)}^{-1/2} \exp{\left(-\frac{1}{2} (\bf{x}-\mu)^T \Sigma^{-1}(\bf{x}-\mu) \right)} \]
These are supported by dblquad() and tplquad() respectively (see nquad() for higher dimensions).
brent
=====
Options
-------
maxiter : int
Maximum number of iterations to perform.
xtol : float
Relative error in solution `xopt` acceptable for convergence.
disp : int, optional
If non-zero, print messages.
``0`` : no message printing.
``1`` : non-convergence notification messages only.
``2`` : print a message on convergence too.
``3`` : print iteration results.
Notes
-----
Uses inverse parabolic interpolation when possible to speed up
convergence of golden section method.
bounded
=======
Options
-------
maxiter : int
Maximum number of iterations to perform.
disp: int, optional
If non-zero, print messages.
``0`` : no message printing.
``1`` : non-convergence notification messages only.
``2`` : print a message on convergence too.
``3`` : print iteration results.
xatol : float
Absolute error in solution `xopt` acceptable for convergence.
golden
======
Options
-------
xtol : float
Relative error in solution `xopt` acceptable for convergence.
maxiter : int
Maximum number of iterations to perform.
disp: int, optional
If non-zero, print messages.
``0`` : no message printing.
``1`` : non-convergence notification messages only.
``2`` : print a message on convergence too.
``3`` : print iteration results.
message:
Optimization terminated successfully;
The returned value satisfies the termination criteria
(using xtol = 1.48e-08 )
success: True
fun: -1.0
x: 5.0000000006185585
nit: 8
nfev: 11
Implements classes for most continuous and discrete distributions,
rvs - Random Variates
pdf - Probability Density Function
cdf - Cumulative Distribution Function
sf - Survival Function (1-CDF)
ppf - Percent Point Function (Inverse of CDF)
isf - Inverse Survival Function (Inverse of SF)
stats - Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis
moment - non-central moments of the distribution
Model parameters can be passed to any of the methods directly, or a distribution can be constructed using a specific set of parameters, which is known as freezing.
Maximum likelihood estimation is possible via the fit() method,
(np.float64(2.589173185346138),
np.float64(-0.00011290008283903167),
np.float64(0.9676759000682381))
Provides implementations of many special mathematical functions commonly used in statistics, physics, and engineering.
| Function | Description |
|---|---|
gamma, gammaln |
Gamma function and its log |
beta, betaln |
Beta function and its log |
factorial, comb, perm |
Combinatorial functions |
erf, erfc |
Error function and complement |
expit, logit |
Logistic and inverse logistic |
softmax, log_softmax |
Softmax and log-softmax |
digamma, polygamma |
Digamma and polygamma functions |
bessel* |
Bessel functions (many variants) |
The error function \(\text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2} dt\),
Both numpy.linalg and scipy.linalg provide linear algebra routines, but SciPy’s version is more comprehensive.
| Feature | numpy.linalg |
scipy.linalg |
|---|---|---|
| Basic operations | inv, solve, det, eig |
All of NumPy’s + more |
| Decompositions | SVD, QR, Cholesky | + LU, Schur, Hessenberg, polar |
| Matrix functions | Limited | expm, logm, sqrtm, funm |
| Specialized solvers | No | Banded, triangular, symmetric |
| LAPACK access | Partial | Full access via low-level routines |
For matrices with many zero entries, sparse representations are more memory-efficient and can be much faster.
Dense (stores all elements)
| Format | Name | Best for |
|---|---|---|
csr_matrix |
Compressed Sparse Row | Row slicing, matrix-vector products |
csc_matrix |
Compressed Sparse Column | Column slicing, arithmetic |
coo_matrix |
Coordinate | Building sparse matrices incrementally |
dia_matrix |
Diagonal | Diagonal/banded matrices |
lil_matrix |
List of Lists | Building matrices, row-based modifications |
scipy.sparse.linalg provides solvers optimized for sparse matrices.
General rule - use sparse when density < 10% and matrix is large.
Sta 663 - Spring 2025