Article Series
Why Open Source?
While there is no question about the capabilities and strengths of licensed and proprietary databases, their highly restrictive pricing model and a significantly higher Total Cost of Ownership (TCO) for large installation bases has made enterprises and medium businesses look towards less expensive open-source solutions. Enterprises have been shying away from Open Source software primarily due to the fact that there is no support in case of any major issues. Open Source Databases provide the same or even better functionality at a significantly lower cost. Migrating from commercial to open source databases results in significant cost savings for enterprises in terms of licensing and support.
Moving from Licensed Software to Open Source
Licensed software like SQL Server and Oracle are backed by their respective vendors and provide support as part of their license or separately, depending on the licensing model. They have your back (at least on paper) when your databases are down due to software bugs & vulnerabilities, heavy loads or other hardware and network issues.
On the other hand, open source software is provided “as is” if you host it on your servers and you’re on your own when there is any issue. But the good news is that there are third-party companies that provide support to Open source databases. Note that the software itself is still free to use, the price is for support only. One such company, EnterpriseDB, provides support for Open Source PostgreSQL as well as its own fork of PostgreSQL.
Cloud solution providers also provide PostgreSQL as PaaS offerings. One such offering is AWS Aurora (PostgreSQL compatible). This model has high availability and support built-in with a pay-as-you-go pricing model and is best suited for small and medium businesses. Note that AWS also provides PostgreSQL as an RDS service at roughly half the cost but without built-in high availability. You can also run PostgreSQL on Cloud at a much cheaper cost by manually hosting PostgreSQL instances in Linux instances running on the Cloud.
Why PostgreSQL?
PostgreSQL is an enterprise class, feature-rich open source database system that is highly reliable and extremely performant and very suited for real-time and mission critical applications. Until a decade ago, MySQL used to be the default choice of enterprises for migration to Open Source databases. Ever since its acquisition by Oracle bin 2010, its roadmap became uncertain and enterprises started moving away from it.
MariaDB is an open source fork of MySQL that was intended to continue the open source legacy of MySQL, but it didn’t really take off in the enterprise world. There had to be a solution to fill the gap and PostgreSQL came in as a viable alternative and has become the choice of enterprises for moving their databases to open source. According to DB Engines Ranking for 2019, PostgreSQL is the fourth most popular database MariaDB is 14th.
The Contenders
SQL Server: History, Editions and Versions
SQL Server is a relational database management system developed by Microsoft. It was first released in 1989 and has since been in active development. The latest stable version as of Sep 2019 is SQL Server 2017. According to DB Engines Ranking, SQL Server is the third most popular Database in 2019 and also the DBMS of the year 2016.
It comes in various editions such as Enterprise, Standard, Developer, Express, Azure, etc. SQL Server Express Edition is a scaled down, free edition of SQL Server, which includes the core database engine. While there are no limitations on the number of databases or users supported, it is limited to using one processor, 1 GB memory and 10 GB database files.
SQL Server, being a proprietary Microsoft software, has traditionally been available only on the Windows operating system. But starting SQL Server 2017, it is available on Linux too. It is also available as PaaS offerings from Cloud providers like Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP).
For more information, visit SQL Server Home Page or the Wikipedia page or the DB Engines page.
PostgreSQL: History, Editions and Versions
PostgreSQL is the most advanced open source object-relational database system. It has been in development for over 30 years and is managed by the PostgreSQL Global Development Group. According to DB Engines Ranking, PostgreSQL is the fourth most popular Database in 2019 and also the DBMS of the year for 2017 and 2018.
PostgreSQL started as a fork of the Ingres Database in 1982 and was first released in 1996. The latest version is PostgreSQL 11.5 as of Sep 2019. It is a very mature and stable product with an extremely rich set of features and tools. This article discusses migrating to PostgreSQL 11.x unless otherwise specifically mentioned.
PostgreSQL is available on Linux, FreeBSD, OpenBSD, macOS and Windows, and also as PaaS offerings from Cloud providers like Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP).
To get started with PostgreSQL, visit its website or the Wikipedia article or the DB Engines page.
Choosing the Right PostgreSQL Edition and Version for Your Needs
PostgreSQL is an open source database and is available in a wide range of varieties. The first thing to do is to decide where to host your PostgreSQL database. The available options are:
- On-premise hosting on Linux infrastructure
- On-premise hosting on Windows infrastructure
- Cloud hosting using IaaS such AWS EC2 VMs running Linux
- Cloud hosting using IaaS such AWS EC2 VMs running Windows
- Cloud hosting using PaaS solutions like AWS RDS or Aurora. These solutions almost always run on Linux due to stability and economy concerns.
The next thing to do is to choose your PostgreSQL version. For On-prem hosting, the rule of thumb is to always start with the latest stable release (PostgreSQL 11.5 as of this writing) and then keep upgrading to newly released stable versions (keeping support lifecycle in mind). Cloud providers like AWS allow you to choose the last three or four stable releases and take some time to certify and make available the latest stable release. They also allow you to do in-place upgrade for minor version upgrades and also major version upgrades with minimal downtime.
Check out the Feature Matrix page for a version-wise comparison of features. Check out the relevant PostgreSQL offering page of the different Cloud providers for features and limitations before signing up.
Production Workloads
PostgreSQL works best with Linux, so always choose Linux for your production workloads for maximum performance, stability and high-availability. SQL Server traditionally runs on Windows and chances are that if you are planning to reuse your SQL Server hardware, you may want to run PostgreSQL on Windows. You will most probably not be able to do this since you will have to have your PostgreSQL databases instances up and running along with your SQL Server databases in parallel for a period before you can retire your SQL Server databases.
Development, Testing, Staging or Pilot
For development and testing, you can either choose Linux or Windows running on on-premise or cloud infrastructure depending on what works best for your organization and the skillset of your team.
Here’s a cost-saving tip. You can run all your non-production PostgreSQL databases in Cloud and turn them off when not needed to save on billing. For example, AWS EC2s can be stopped and they’ll not be billed until you boot-up the instances again. AWS RDS does not allow you to stop an RDS instance once it has been launched, you can only Terminate the instances.
A neat trick is to take a backup of the database, move the backup to an S3 bucket and then terminate the instance. When development resumes later, launch a new instance and restore from the backup. To avoid changing the database endpoints in code whenever a new database is launched, you can create a DNS entry that points to the current database and use the DNS endpoint in your code and jobs. The only place which will require a change when a new database is launched is the DNS record.
This is much cheaper than having physical servers in your data centers throughout the year and spending on their hardware, power and maintenance.
On-Premise vs Cloud
| On-Premise | Cloud IaaS | Cloud PaaS |
Hosting | PostgreSQL is installed on servers in your Data Center | You launch Cloud Virtual Machines like AWS EC2 and install PostgreSQL | This is Database As A Service, no installation required. You select the launch parameters and launch the service |
Clustering | Manual clustering setup is required | Manual clustering setup is required | Clustering is built-in depending on the type of service offering |
High Availability / Automatic Failover | High Availability, Load Balancing, and Replication must be set up manually. | High Availability, Load Balancing, and Replication must be set up manually. | High Availability is built-in depending on the service offering. For example, AWS Aurora provides High Availability whereas plain vanilla AWS RDS instance does not. |
Tools | You have control over the hardware and therefore should be able to run all PostgreSQL tools on the servers. | You have control over the hardware and therefore should be able to run all PostgreSQL tools on the servers. | The Cloud Platform may restrict you from running certain tools. For example, pgAgent is not available In AWS Aurora. |
Scheduling | You can use pgAgent or other third-party schedulers to schedule data transfer jobs and maintenance jobs on the servers. | You can use pgAgent or other third-party schedulers to schedule data transfer jobs and maintenance jobs on the servers. | Cloud providers have their own built-in services for scheduling jobs. For example, AWS has its own scheduling service and does not support pgAgent in RDS or Aurora. |
Management & Monitoring | Management & Monitoring should be done manually | Management & Monitoring should be done manually | The Cloud Platform automatically manages the databases for you and provides GUIs/portal to monitor the health of your databases. E.g., AWS CloudWatch |
Support | Support for On-Premise installations are usually the organization’s infrastructure support team. Some third-party companies provide support for On-Premise installations. | Support for On-Premise installations are usually the organization’s infrastructure support team. Some third-party companies provide support for Cloud IaaS installations. | Support is provided by the Cloud Provider through automated Support systems. |
Backup & Restore | Backup and Restore has to be done manually or automated jobs need to be set up. | Backup and Restore has to be done manually or automated jobs need to be set up. | Automated Backup and Restore feature is available with most Cloud providers. For example, you can set up automated backups and restore from backups through AWS Console. |
Cost | Infrastructure costs only. If you already own the hardware, then this will probably be the cheapest option of the three. | Clouds VMs like AWS EC2 are comparatively cheaper depending on the configuration you choose. You may have to spend additional cost if you choose to use third-party licensed software for monitoring, management, etc. | Cloud PaaS are usually expensive since they have many High Availability and several other features built-in. |
When talking about Cloud providers, I always give examples from AWS because that is the only cloud platform I have migrated databases to, so far. It is in no way an endorsement of one provider over another. Other providers have similar offerings, please visit their website to learn more about their offerings.
The post Migrating your SQL Server Workloads to PostgreSQL – Part 1 appeared first on The Developer Space.