Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Python

Getting started with the SciPy (Scientific Python) library

4.67/5 (4 votes)
26 Jun 2014BSD7 min read 79.3K  
How to get started using the SciPy library for scientific computing in Python.

Introduction

SciPy is an enormous Python library for scientific computing. This article will explain how to get started with SciPy, survey what the library has to offer, and give some examples of how to use it for common tasks.

First steps with SciPy

The SciPy download page has links to the SourceForge download sites for SciPy and NumPy. (SciPy depends on NumPy and so both packages must be installed in order to use SciPy.) The version of SciPy (and NumPy) must be compatible with your version of Python. At the time of this writing, SciPy is available for Python 2.6 and earlier. In particular, Python 3.0 is not yet supported.

IronPython cannot use SciPy directly because much of SciPy is implemented in C and at this time IronPython can only directly import modules implemented in Python. However, the Ironclad package enables IronPython to use SciPy. See Numerical computing in IronPython with Ironclad for details.

To start using SciPy, import the scipy package. By convention, the scipy package is often imported with the sp abbreviation for ease of use.

>>> import scipy as sp

There is some functionality at the root of the scipy hierarchy, but most functionality is located in sub-packages that must be imported separately. For example, the erf function is located in the special sub-package for special functions. To call the erf function, you need to first import the special sub-package.

>>> from scipy import special
>>> special.erf(2.0)
0.99532226501895271

Getting help

The SciPy documentation page has links to extensive documentation available in HTML, PDF, and CHM (HTML help) formats.

As with any Python package, you can also find help for SciPy objects using Python's help() function from the command line. However, sometimes help is unhelpful when it comes to SciPy. The function scipy.info() is analogous to the standard help() function but specialized to give better documentation for SciPy objects. When scipy.info() is given a string argument, it does a search for matching objects. When scipy.info() is given an object, it returns documentation specific to that object. For example, if scipy was imported as sp, then:

>>> sp.info("gamma")

returns documentation on both the gamma probability distribution and the gamma function. But,

>>> sp.info(special.gamma)

only returns documentation for the gamma function.

Library overview

The following table lists the sub-packages of scipy along with a brief description of each. The next section will give examples using some of the more common sub-packages.

Sub-package Description
cluster Clustering algorithms
constants Mathematical and physical constants
fftpack Fourier transforms
integrate Numerical integration
interpolate Interpolation
io Input and output
linalg Linear algebra
maxentropy Maximum entropy models
misc Miscellaneous
ndimage Multi-dimensional image processing
odr Orthogonal distance regression
optimize Optimization
signal Signal processing
sparse Sparse matrices
spatial Spatial algorithms and data structures
special Special functions
stats Statistical functions
stsci Image processing
weave C/C++ integration

Examples

SciPy is huge. The draft SciPy Reference Guide is currently 632 pages. This article will only illustrate a tiny sampling of what you can do with SciPy, focusing on some of the more common applications.

Special functions

The special sub-package contains mathematical functions beyond those included in the standard Python math package. The most commonly used special function is probably the gamma function, Γ(x). The following example shows how to access it from SciPy.

>>> from scipy.special import gamma
>>> gamma(0.5)
1.7724538509055159

SciPy contains a dozen functions related to the gamma function. For example, there is a separate function gammaln just to return the logarithm of the gamma function. This may seem redundant, but it is very practical. Since the gamma function grows very quickly, it can easily overflow, and so the logarithm of the gamma function is often more useful than the gamma function itself. Some of the other gamma-related functions in SciPy include the incomplete gamma function gammainc, the beta function beta, and the logarithmic derivative of the gamma function psi. Gamma functions are far from the only special functions included in SciPy. All the most commonly used special functions are included: Bessel functions, Fresnel integrals, etc.

Some of the functions in special would not be classified as “special” in a mathematical sense. These are elementary functions that are included because they present numerical difficulties. For example, scipy.special contains a function log1p that evaluates log(1 + x). This function may seem useless at first: why not just use math.log(1 + x) instead of log1p? The problem is that in applications, you often need to evaluate log(1 + x) for small values of x. If x is sufficiently small (e.g., less than 10-16), then math.log(1 + x) will return 0 because 1 + x will equal x to machine precision and math.log(1 + x) will simply return log(1) which equals 0. The function log1p evaluates log(1 + x) indirectly without first computing 1 + x.

Constants

The constants sub-package contains a wide variety of physical constants. The following code will display a few common constants:

>>> from scipy import constants
>>> constants.c # speed of light
299792458.0
>>> constants.h # Plank’s constant
6.6260693000000002e-034
>>> constants.N_A # Avogadro’s number
6.0221415000000003e+023

The dictionary physical_constants has descriptive strings as keys. The values are triples containing a constant's value, its units of measurement, and the uncertainty in the value. For example, the dictionary gives this information on the mass of an electron.

>>> constants.physical_constants["electron mass"]
(9.1093825999999998e-031, ‘kg’, 1.5999999999999999e-037)

In addition to physical constants, constants contains information on units. For example, constants.nautical_mile equals 1852, the number of meters in a nautical mile. And, in case you ever wondered, constants.troy_ounce equals 0.0311034768, the number of kilograms in a troy ounce. There is also support for SI and binary unit prefixes. For example, the SI prefix constants.kilo equals 103 = 1000.0, and the binary prefix constants.kibi = 210 = 1024.

Integration

The integrate sub-package contains several routines for numerical integration. The most commonly used routine is quad (named for “quadrature”, an old-fashioned name for integration). The first argument to quad is a function of one variable to integrate. For simple functions, it is convenient to use lambda to define the function inline, though of course integrands can be defined elsewhere using def. The quad function returns a pair: the value of the integral and an estimate of the error in the value. The following code integrates cos(ex) between the limits of 2 and 3.

>>> from scipy import integrate
>>> integrate.quad(lambda x: math.cos(math.exp(x)), 2, 3)
(-0.063708480528704675, 2.4175070627010321e-014)

To specify infinite limits of integration in quad, use the constant scipy.inf for ∞, as in the following example:

>>> from scipy import inf
integrate.quad(lambda x: math.exp(-x*x), -inf, inf)
(1.7724538509055159, 1.4202636780944923e-008)

The integrate sub-package contains other integration routines, such as dblquad for double integrals and tplquad for triple integrals. It also contains odeint for numerically evaluating systems of ordinary differential equations.

Probability and statistics

The stats sub-package contains a wealth of functions for probability and statistics. The library currently features 81 continuous distributions and 12 discrete distributions. The following example shows how to compute the probability that a normal (Gaussian) random variable with mean 0 and standard deviation 3 takes on a value less than 5. It also shows how to generate five random samples from the same distribution.

>>> from scipy.stats import norm
>>> norm.cdf(5, 0, 3)
0.9522096477271853
>>> norm.rvs(0, 3, size=5)
array([ 4.85229537,  3.0104119 ,  1.13189841,  5.19688369, -2.97970912])

See Probability distributions in SciPy for more examples of working with probability distributions.

The stats sub-package has simple functions such as std for computing the sample standard deviation of an array of numbers. It has more sophisticated functions such as glm for working with linear regression, analysis of variance, etc. It also contains functions for numerous statistical tests and common chores such as producing histograms.

Note that some statistical functionality is located outside the stats sub-package. For example, orthogonal distance regression support is contained in its own sub-package odr. Also, you might use the linalg sub-package in conjunction with stats.

Further resources

The SciPy.org website has links to further documentation, including tutorials and cookbook examples.

The Mathesaurus is a sort of Rosetta stone for comparing mathematical environments such as SciPy, Matlab, R, etc. The resources on that site are useful even if you do not know one of the other packages and just want a cheat sheet for doing various tasks in Python (especially SciPy).

The IPython shell is a popular environment for working interactively with SciPy. Also, many SciPy users use matplotlib to create plots from either the standard Python command line for from IPython.

License

This article, along with any associated source code and files, is licensed under The BSD License