Introduction
SciPy is an enormous Python library for scientific computing. This article will explain how to get started with SciPy, survey what the library has to offer, and give some examples of how to use it for common tasks.
First steps with SciPy
The SciPy download page has links to the SourceForge download sites for SciPy and NumPy. (SciPy depends on NumPy and so both packages must be installed in order to use SciPy.) The version of SciPy (and NumPy) must be compatible with your version of Python. At the time of this writing, SciPy is available for Python 2.6 and earlier. In particular, Python 3.0 is not yet supported.
IronPython cannot use SciPy directly because much of SciPy is implemented in C and at this time IronPython can only directly import modules implemented in Python. However, the Ironclad package enables IronPython to use SciPy. See Numerical computing in IronPython with Ironclad for details.
To start using SciPy, import the scipy
package. By convention, the scipy
package is often imported with the sp
abbreviation for ease of use.
>>> import scipy as sp
There is some functionality at the root of the scipy
hierarchy, but most functionality is located in sub-packages that must be imported separately. For example, the erf
function is located in the special
sub-package for special functions. To call the erf
function, you need to first import the special
sub-package.
>>> from scipy import special
>>> special.erf(2.0)
0.99532226501895271
Getting help
The SciPy documentation page has links to extensive documentation available in HTML, PDF, and CHM (HTML help) formats.
As with any Python package, you can also find help for SciPy objects using Python's help()
function from the command line. However, sometimes help
is unhelpful when it comes to SciPy. The function scipy.info()
is analogous to the standard help()
function but specialized to give better documentation for SciPy objects. When scipy.info()
is given a string argument, it does a search for matching objects. When scipy.info()
is given an object, it returns documentation specific to that object. For example, if scipy
was imported as sp
, then:
>>> sp.info("gamma")
returns documentation on both the gamma probability distribution and the gamma function. But,
>>> sp.info(special.gamma)
only returns documentation for the gamma function.
Library overview
The following table lists the sub-packages of scipy
along with a brief description of each. The next section will give examples using some of the more common sub-packages.
Sub-package | Description |
cluster | Clustering algorithms |
constants | Mathematical and physical constants |
fftpack | Fourier transforms |
integrate | Numerical integration |
interpolate | Interpolation |
io | Input and output |
linalg | Linear algebra |
maxentropy | Maximum entropy models |
misc | Miscellaneous |
ndimage | Multi-dimensional image processing |
odr | Orthogonal distance regression |
optimize | Optimization |
signal | Signal processing |
sparse | Sparse matrices |
spatial | Spatial algorithms and data structures |
special | Special functions |
stats | Statistical functions |
stsci | Image processing |
weave | C/C++ integration |
Examples
SciPy is huge. The draft SciPy Reference Guide is currently 632 pages. This article will only illustrate a tiny sampling of what you can do with SciPy, focusing on some of the more common applications.
Special functions
The special
sub-package contains mathematical functions beyond those included in the standard Python math
package. The most commonly used special function is probably the gamma function, Γ(x). The following example shows how to access it from SciPy.
>>> from scipy.special import gamma
>>> gamma(0.5)
1.7724538509055159
SciPy contains a dozen functions related to the gamma function. For example, there is a separate function gammaln
just to return the logarithm of the gamma function. This may seem redundant, but it is very practical. Since the gamma function grows very quickly, it can easily overflow, and so the logarithm of the gamma function is often more useful than the gamma function itself. Some of the other gamma-related functions in SciPy include the incomplete gamma function gammainc
, the beta function beta
, and the logarithmic derivative of the gamma function psi
. Gamma functions are far from the only special functions included in SciPy. All the most commonly used special functions are included: Bessel functions, Fresnel integrals, etc.
Some of the functions in special
would not be classified as “special” in a mathematical sense. These are elementary functions that are included because they present numerical difficulties. For example, scipy.special
contains a function log1p
that evaluates log(1 + x). This function may seem useless at first: why not just use math.log(1 + x)
instead of log1p
? The problem is that in applications, you often need to evaluate log(1 + x) for small values of x. If x is sufficiently small (e.g., less than 10-16), then math.log(1 + x)
will return 0 because 1 + x will equal x to machine precision and math.log(1 + x)
will simply return log(1) which equals 0. The function log1p
evaluates log(1 + x) indirectly without first computing 1 + x.
Constants
The constants
sub-package contains a wide variety of physical constants. The following code will display a few common constants:
>>> from scipy import constants
>>> constants.c # speed of light
299792458.0
>>> constants.h # Plank’s constant
6.6260693000000002e-034
>>> constants.N_A # Avogadro’s number
6.0221415000000003e+023
The dictionary physical_constants
has descriptive strings as keys. The values are triples containing a constant's value, its units of measurement, and the uncertainty in the value. For example, the dictionary gives this information on the mass of an electron.
>>> constants.physical_constants["electron mass"]
(9.1093825999999998e-031, ‘kg’, 1.5999999999999999e-037)
In addition to physical constants, constants
contains information on units. For example, constants.nautical_mile
equals 1852, the number of meters in a nautical mile. And, in case you ever wondered, constants.troy_ounce
equals 0.0311034768, the number of kilograms in a troy ounce. There is also support for SI and binary unit prefixes. For example, the SI prefix constants.kilo
equals 103 = 1000.0, and the binary prefix constants.kibi
= 210 = 1024.
Integration
The integrate
sub-package contains several routines for numerical integration. The most commonly used routine is quad
(named for “quadrature”, an old-fashioned name for integration). The first argument to quad
is a function of one variable to integrate. For simple functions, it is convenient to use lambda
to define the function inline, though of course integrands can be defined elsewhere using def
. The quad
function returns a pair: the value of the integral and an estimate of the error in the value. The following code integrates cos(ex) between the limits of 2 and 3.
>>> from scipy import integrate
>>> integrate.quad(lambda x: math.cos(math.exp(x)), 2, 3)
(-0.063708480528704675, 2.4175070627010321e-014)
To specify infinite limits of integration in quad
, use the constant scipy.inf
for ∞, as in the following example:
>>> from scipy import inf
integrate.quad(lambda x: math.exp(-x*x), -inf, inf)
(1.7724538509055159, 1.4202636780944923e-008)
The integrate
sub-package contains other integration routines, such as dblquad
for double integrals and tplquad
for triple integrals. It also contains odeint
for numerically evaluating systems of ordinary differential equations.
Probability and statistics
The stats
sub-package contains a wealth of functions for probability and statistics. The library currently features 81 continuous distributions and 12 discrete distributions. The following example shows how to compute the probability that a normal (Gaussian) random variable with mean 0 and standard deviation 3 takes on a value less than 5. It also shows how to generate five random samples from the same distribution.
>>> from scipy.stats import norm
>>> norm.cdf(5, 0, 3)
0.9522096477271853
>>> norm.rvs(0, 3, size=5)
array([ 4.85229537, 3.0104119 , 1.13189841, 5.19688369, -2.97970912])
See Probability distributions in SciPy for more examples of working with probability distributions.
The stats
sub-package has simple functions such as std
for computing the sample standard deviation of an array of numbers. It has more sophisticated functions such as glm
for working with linear regression, analysis of variance, etc. It also contains functions for numerous statistical tests and common chores such as producing histograms.
Note that some statistical functionality is located outside the stats
sub-package. For example, orthogonal distance regression support is contained in its own sub-package odr
. Also, you might use the linalg
sub-package in conjunction with stats
.
Further resources
The SciPy.org website has links to further documentation, including tutorials and cookbook examples.
The Mathesaurus is a sort of Rosetta stone for comparing mathematical environments such as SciPy, Matlab, R, etc. The resources on that site are useful even if you do not know one of the other packages and just want a cheat sheet for doing various tasks in Python (especially SciPy).
The IPython shell is a popular environment for working interactively with SciPy. Also, many SciPy users use matplotlib to create plots from either the standard Python command line for from IPython.