This module gives a global overview over the landscape of Python libraries that deal with artificial intelligence and machine learning.
Introduction
This is the fourth module of our series on learning Python and its use in machine learning (ML) and artificial intelligence (AI). We've walked through the Python basics, so now we can take a look at what libraries are available to work on AI and ML tasks.
Note that this is more of a laundry list of Python libraries with links where you can learn more. We'll dive into examples of putting these libraries to use in future modules.
NLTK
NLTK (Natural Language Toolkit) offers many packages to deal with natural language processing (NLP), which is a branch of artificial intelligence related to computational linguistics that is focused on understanding and interpreting written human language. With the NLTK, you can analyze sentences, categorize words, perform sentiment analysis, and more.
OpenCV
OpenCV (Open Computer Vision) is a library for optimized real-time computer vision and machine learning. It can process images using filters and transformations, detect features in, and extract data from, images; it’s used for applications such as optical character recognition (OCR), face detection, object tracking, and more.
Keras
Keras is a high-level library for deep learning and neural networks in Python. It runs on top of the TensorFlow, Theano, or CNTK deep learning frameworks. Keras has gained a lot of popularity compared to using TensorFlow on its own or using the PyTorch framework because it's the simplest to use among these three for quickly prototyping deep learning projects.
TensorFlow
TensorFlow is an open-source library for developing and training machine learning models. It offers both high-level and low-level APIs. Compared to Keras, it has higher performance and is therefore used more commonly on large datasets. Keras is now included in the TensorFlow toolkit as TensorFlow's high-level API for building and training prototypes of deep learning models.
PyTorch
PyTorch is a deep learning framework. It focuses on tensor computation (like NumPy, but accelerated using the GPU) and deep neural networks.
scikit-learn
scikit-learn is a library that offers a variety of "traditional" machine learning methods (linear models, support vector machines, decision trees, and so on). It includes no deep learning features, however, because deep learning is rather specialized, and machine learning libraries usually focus on either that or on the traditional machine learning methods. That makes scikit-learn an entirely different beast than Keras, TensorFlow, and PyTorch.
NumPy and SciPy
NumPy and SciPy are not specifically libraries for machine learning, but they are an important part of this overview because they offer the fundamental tools for scientific computing.
NumPy offers powerful N-dimensional arrays, linear algebra, Fourier transforms, and smoothly works together with other libraries such as OpenCV (which is not surprising, as they're all built on top of NumPy).
SciPy works on top of NumPy's arrays and offers functions for numerical analysis, such as interpolation, optimization, integration, differential equation solvers, statistics, and more.
Matplotlib
Matplotlib is a library for creating visualizations in Python. You can use it to create static, animated, or interactive charts and figures, in 2D or 3D.
Pandas
Pandas is a library for data analysis and manipulation. It can read/write files of many different formats and offers many features to manage your data: indexing, selecting, merging, reshaping, and much more.
What to Choose?
As you can see, there are a lot of choices you can make.
If you are an enterprise developer, you will likely want to choose a high-level library such as NLTK, OpenCV, or Keras. That will allow you to get on track as fast as possible.
Libraries such as the higher-level parts of TensorFlow or PyTorch help you construct neural networks without having to worry about the low-level complexity.
Low-level solutions, such as some parts of TensorFlow, or doing everything with NumPy, usually require a lot more work, so it takes longer to deliver solutions, and those solutions are harder to maintain.
Next Steps
Now we've seen an overview of the available libraries, we can dive into learning how to use them. The next four modules will talk about the usage of OpenCV, NLTK, Keras, SciPy, NumPy, and TensorFlow.