Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Hosted-services / AWS

Migrating your SQL Server Workloads to PostgreSQL – Part 3

3.00/5 (1 vote)
22 Sep 2019CPOL10 min read 5.8K  
Part 3 of this multi-part article series discusses the actual migration process: Schema and data migration, ETL/ELT, Reporting & OLAP, Scheduling and Applications & Services migration.. The post Migrating your SQL Server Workloads to PostgreSQL – Part 3 appeared first on The Developer Space..

Article Series

Getting Started with the actual migration

The first steps in the migration process is to gather and analyze the following information: 

  1. The number of SQL Server Instances to be migrated
  2. The total number of Databases across all such instances
  3. The total size of the data to be migrated in terms of number of tables, number of rows in each table and the total size of the data (e.g. 500 tables, 20 million total rows and 10 TB of data)
  4. The number of objects to be moved (Tables, Views, Stored Procedures, Jobs etc.)
  5. The capabilities of the source platform and the target platform and the comparison
  6. Workarounds/Alternative solutions for features that are not supported by the target platform (such as ETL, OLAP, etc.)
  7. Application code migration
  8. Jobs, Queries and Reports migration
  9. Data warehousing / OLAP considerations
  10. Post migration performance comparison and performance improvement
  11. Post migration monitoring and support

Schema migration 

Once you have all the requirements gathered in hand and have already set up your PostgreSQL database instance, the first step is to migrate the database schema and the objects. 

Migration Tools 

Migrating Schema from SQL Server to PostgreSQL can be done either manually or using tools. There are free and commercial tools available to migrate schema and data from one platform to another. 

Free tools are developed and provided by PostgreSQL community enthusiasts. 

Free/Open Source Tools 

pgloader migrates SQL Server database to PostgreSQL. It supports automatic discovery of the schema, including build of the indexes, primary key and foreign keys constraints, and provides various casting rules which can convert the SQL Server data type to a PostgreSQL equivalent data type. It also supports loading data. 

Sqlserver2pgsql is a Perl script that converts SQL Server schema to a PostgreSQL schema. It can also optionally produce a Pentaho Data Integrator (Kettle) job to migrate the Table data from SQL Server to PostgreSQL. 

ETL Tools 

Though ETL tools like Talend and Pentaho Kettle provide the option of creating the schema on the go while loading data, I strongly advise against using this feature. This is because the indexes, triggers and constraints are not carried over during the migration. Only the table structure with the columns with inferred data types are created. 

Check out the ETL Considerations section for a detailed discussion on the topic. 

Cloud-specific Vendor Tools 

If you are migrating your SQL Server databases to AWS Aurora (PostgreSQL compatible), you can use AWS Schema Conversion Tool (SCT) to migrate your schemas. SCT is a desktop application that analyzes your SQL Server Schema and produces PL/PGSQL scripts to migrate the schemas to PostgreSQL. The SCT does a very good job at migrating your Tables, Views, Indexes, etc. but only partially and incompletely migrates code objects like Stored Procedures and Triggers. You’ll realize that you must do a lot of rewriting to make it work properly with PostgreSQL. My advice is to not depend on auto-generated code and rewrite all SQL Server Stored Procedures to PostgreSQL functions.

AWS Schema Conversion Tool

Note that this tool does not migrate the data. To migrate the data, you can either use an ETL tool like Talend or Pentaho or optionally use the AWS Data Migration Service, which is an AWS offering for migrating huge data from source systems to AWS Aurora databases. 

Similarly, other Cloud providers have similar tools to migrate schema and data to PostgreSQL databases. 

Manual migration of schema

Manual migration involves: 

  1. Manually analyze every database object
  2. Use the SQL Server Management Studio to generate scripts for the Tables, views, Stored Procedures, etc.
  3. Make changes to the script to make it compatible with PostgreSQL syntax.
  4. Execute the Scripts on the PostgreSQL instances using pgAdmin or a similar client tool.

This is the most accurate way of migrating the schemas and data, but unfortunately works only for small databases with few tables. This is an impractical approach for large databases with hundreds of tables and Stored Procedures. 

Data Migration 

Once the Schema is migrated and the tables have been created, the next logical step is to move the data from the SQL Server database tables to the PostgreSQL database tables. 

Open Source Tools  

As already discussed in schema migration section, there are free tools available to migrate data along with schema. pgloader and Sqlserver2pgsql allow you to migrate data. 

AWS Data Migration Service (DMS) is specifically meant for migrating data from external sources to AWS Aurora databases. Note that this service is not free and is charged based on the size of data being transferred. 

ETL Tools 

ETL tools like Talend and Pentaho are extremely good solutions to move large volume of data between databases. You can create Talend packages or Pentaho Kettle Packages to transfer data between your database tables in an easy, efficient and stable manner. 

Manual migration of data

Manual migration involves the following steps: 

  1. Use SQL Server Management studio to generate INSERT scripts with data.
  2. Modify the scripts to conform to PostgreSQL syntax. The good news is that there isn’t much difference between the INSERT statement syntax between SQL Server and PostgreSQL.
  3. Use pgAdmin or a similar client tool to execute the scripts in the PostgreSQL database.

ETL/ELT Considerations 

ETL refers to Extract, Transform and Load, it is a 3-step process applied to extract the data from various sources (which can exist in various forms), clean-up and transform, and load in to a target database. Database systems usually receive data from upstream systems and send data to downstream systems through ETL/ELT processes. 

SQL Server Integration Services (SSIS) was the ETL tool of choice of SQL Server databases and came bundled with older versions of SQL Server, but Microsoft has recently separated the two and bundled SSIS under SQL Server Data Tools (SSDT) that is free for use. That means you can literally keep your SSIS packages and make them work with PostgreSQL. I strongly advise you against this approach because SSIS does not have a native driver to connect to PostgreSQL. You’ll either have to use ODBC or use commercial third-party drivers. 

Open Source databases usually have a plethora of Open Source solutions for ETL/ELT. Talend and Pentaho are two such tools that started off as Open Source projects but were later acquired by other companies. You can use the Open Source editions without support or use the Commercial editions if you require support. 

Talend 

Talend is a fast ETL processing machine with in-built support for PostgreSQL. Talend Open Studio is an Eclipse based GUI tool with drag-drop feature that allows you to create ETL packages and test them. It creates Java code for you behind the scenes and also compiles the packages to jar files. Therefore, it can run on any platform that supports Java. Talend is required only to develop the packages and it is not required to be installed on the machines where the jobs are running. It is available both in Open Source and commercial forms. Talend Open Source does not come with a scheduler and scheduling jobs might be a challenge. You’ll have to resort to platform-provided schedulers like crontab, Windows Task Scheduler or the Cloud schedulers. 

Pentaho 

Pentaho Data Integrator (Kettle) is another popular ETL tool with in-built support for PostgreSQL. Like Talend, Pentaho Kettle is also a Java based software and can run on any platform with Java support. It started out as an Open Source project and later acquired by Hitachi systems. It is now available as both Open Source and Commercial editions. 

Reporting Platforms

SQL Server’s de facto Reporting platform is SQL Server Reporting Services (SSRS). In earlier versions of SQL Server, SSRS came bundled with the database and required a license to use. In recent versions, Microsoft is bundling SSRS as part of SQL Server Data Tools (SSDT) which is royalty free to use. SSRS supports a lot of databases and if you are already on SSRS, you can continue using your Reports by repointing your reports to your PostgreSQL databases.

PostgreSQL does not come with it’s own Reporting tools, but there are a lot of Open Source Reporting Solutions and Commercial Reporting Solutions that can connect to your PostgreSQL database and generate reports. The list is too long to warrant a discussion on each one of them, however, there few tools that stand out and deserve a mention:

JasperReports

JasperReports Server is a stand-alone and embeddable reporting server. It provides reporting and analytics that can be embedded into a web or mobile application as well as operate as a central information hub for the enterprise by delivering mission critical information on a real-time or scheduled basis to the browser, mobile device, or email inbox in a variety of file formats. JasperReports Server is optimized to share, secure, and centrally manage your Jaspersoft reports and analytic views

BIRT

The Business Intelligence and Reporting Tools Project (BIRT) is an Eclipse based open source technology platform used to create data visualizations and reports that can be embedded into rich client and web applications.

Pentaho Reporting

Pentaho Reporting is a suite of Open Source tools that allows you to create pixel-perfect reports of your data in PDF, Excel, HTML, Text, Rich-Text-File, XML and CSV. These computer generated reports easily refine data from various sources into a human readable form.

All of the above solutions are available in both Open Source and Commercial editions.

OLAP (Analytics) Platforms

SQL Server Analytics Services (SSAS), the de facto Analytics Platform for SQL Server, is an online analytical processing and data mining tool used by organizations to analyze and make sense of information possibly spread out across multiple databases, or in disparate tables or files

PostgreSQL does not have a built-in OLAP Server, but there are a host of Open Source and commercial OLAP solutions that you can look into to replace your SSAS Reports and Cubes.

Pentaho Mondrian

Mondrian is an Open Source Online Analytical Processing server (OLAP) solution that allows business users to analyze large and complex amounts of data in real-time.

Apache Kylin

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis on Hadoop and Alluxio supporting extremely large datasets. It was originally developed by eBay, and is now a project of the Apache Software Foundation

Scheduling Options

SQL Server comes with SQL Agent that allows you to schedule Jobs that run SSIS packages, execute SQL Queries and Stored Procedures and execute OS tasks and programs.

PostgreSQL has pgAgent, a job scheduling agent for PostgreSQL databases, capable of running multi-step batch or shell scripts and SQL tasks on complex schedules.

Please note that certain Cloud based PostgreSQL offerings don’t support plugins like pgAgent. In those cases, you may want to utilize the cloud specific schedulers like AWS Batch. The AWS Batch scheduler evaluates when, where, and how to run jobs that have been submitted to a job queue. Jobs run in approximately the order in which they are submitted as long as all dependencies on other jobs have been met.

Application/Services Migration  

After the Database migration is completed, you would want to migrate your Applications, Services, Reporting systems, etc. that were pointing to your SQL Server Databases to point to your new PostgreSQL databases. 

Technology Stack / Components / Drivers (Java, .NET drivers)  

PostgreSQL has support for ODBC and also has native drivers for most programming languages and frameworks: 

  • .Net
  • C (Native Library)
  • C++
  • Delphi
  • Java (JDBC)
  • JavaScript (Node.js)
  • Perl
  • PHP
  • Python
  • Tcl

Java Applications 

Connecting to PostgreSQL databases from your Java applications and services would primarily be through JDBC drivers. Checkout the PostgreSQL JDBC page for more details. 

.NET Applications 

Npgsql is by far the most widely used open source .NET connector for PostgreSQL. It is a high-performance driver build on top of ADO.NET and provides an easy migration path by using similar object names as SQL Server native driver. For example, to change the connection object, replace SqlConnection with NpgsqlConnection

For a complete list, check out the drivers and interfaces page. 

The post Migrating your SQL Server Workloads to PostgreSQL – Part 3 appeared first on The Developer Space.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)