Performance Tuning After PostgresToMsSql Migration

PostgresToMsSql: Schema Mapping and Data Type CompatibilityMigrating a database from PostgreSQL to Microsoft SQL Server (MSSQL) involves more than copying tables and data — it requires careful schema mapping and attention to data type compatibility. PostgreSQL and MSSQL have different features, data types, default behaviors, and SQL dialects. This article explains key differences, practical mapping strategies, pitfalls to avoid, and examples to help you migrate schemas accurately and reliably.

1. High-level differences to keep in mind

SQL dialect: PostgreSQL follows the SQL standard closely and adds many advanced features (e.g., arrays, JSONB, range types). MSSQL implements T‑SQL, which has its own syntax and procedural extensions (T-SQL).
Case sensitivity: PostgreSQL identifiers are case-sensitive only if quoted; otherwise they are folded to lower-case. MSSQL folds unquoted identifiers to upper-case internally but is case-insensitive by default (collation-dependent).
Schemas and permissions: Both systems support schemas (namespaces) but manage permissions and default schemas differently.
Extensions and features: PostgreSQL has many extensions (PostGIS, pgcrypto) that have no direct equivalents in MSSQL or require different implementations.
Transaction semantics and DDL: Some PostgreSQL DDL operations are transactional; in MSSQL, certain DDL operations are not fully transactional.

2. Data type mapping: common types and recommended translations

Below are common PostgreSQL types and recommended MSSQL equivalents, with notes about differences and conversion considerations.

PostgreSQL type	MSSQL type	Notes / Caveats
smallint	SMALLINT	Direct match.
integer / int / int4	INT	Direct match.
bigint / int8	BIGINT	Direct match.
serial / bigserial	INT IDENTITY / BIGINT IDENTITY	Use IDENTITY(seed,increment) or SEQUENCE in MSSQL. Remove DEFAULT nextval(…) from migrated schema.
numeric(p,s) / decimal(p,s)	DECIMAL(p,s)	Match precision/scale. Beware precision/scale limits and rounding behavior.
real	REAL	32-bit float; direct map.
double precision	FLOAT(53)	MSSQL FLOAT default is double precision when specified as FLOAT(53).
boolean	BIT	In MSSQL BIT stores 0/1; NULL allowed. Beware boolean expression differences.
text	VARCHAR(MAX) or NVARCHAR(MAX)	Use NVARCHAR(MAX) if Unicode (recommended). For performance, map small texts to VARCHAR(n)/NVARCHAR(n).
varchar(n) / character varying(n)	VARCHAR(n) or NVARCHAR(n)	Choose NVARCHAR for Unicode; length semantics similar.
char(n) / character(n)	CHAR(n) or NCHAR(n)	Fixed-length semantics similar.
bytea	VARBINARY(MAX)	Use VARBINARY for binary data.
timestamp [without time zone]	DATETIME2 or DATETIME	DATETIME2 has higher precision (up to 100ns) and is recommended. Note: timestamp without time zone in Postgres stores no timezone info.
timestamp with time zone (timestamptz)	DATETIMEOFFSET or DATETIME2 + separate offset handling	DATETIMEOFFSET preserves timezone offset; DATETIME2 does not. Converting timestamptz values to UTC and storing in DATETIME2 is common.
date	DATE	Direct mapping.
time [without time zone]	TIME	Use TIME or TIME(7) for precision.
interval	TIME, DATETIMEOFFSET, or custom representation (e.g., BIGINT seconds)	MSSQL lacks a direct interval type — store as seconds, or structured fields, or use custom functions.
uuid	UNIQUEIDENTIFIER	MSSQL UNIQUEIDENTIFIER stores GUIDs; conversion functions needed.
json / jsonb	NVARCHAR(MAX), VARCHAR(MAX), or SQL Server JSON functions	MSSQL has JSON support via functions but no native JSON type; store as text and use OPENJSON/JSON_VALUE/JSON_QUERY. For heavy JSON use, consider schema or hybrid approach.
ARRAY	Normalized tables or Delimited strings or SQL Server table-valued types	MSSQL doesn’t support array column types. Normalize arrays into child tables or use JSON.
hstore	NVARCHAR(MAX) or mapping to key-value table	No native hstore; map to JSON or separate table.
cidr / inet / macaddr	VARCHAR(n) or specialized types via extension	MSSQL has no inet type; store as VARCHAR and validate with functions.
money	MONEY or DECIMAL(19,4)	MONEY has rounding quirks; DECIMAL is safer for precise calculations.
XML	XML	MSSQL supports XML type with XQuery functions; behavior differs.
geometric types	Custom tables / geometry (use SQL Server Spatial types)	Use SQL Server geometry/geography types for spatial data; map carefully (SRID differences).
range types (int4range, tsrange)	Separate start/end columns or custom types	No direct equivalent—use two columns or normalized representation.
enum	CHECK constraint on VARCHAR/INT or separate lookup table	Use constrained VARCHAR or small INT referencing lookup table for better extensibility.

3. Keys, defaults, and identity columns

PostgreSQL serial/bigserial: these create sequences and set DEFAULT nextval(…). In MSSQL replace with IDENTITY or create SEQUENCE objects and set DEFAULT NEXT VALUE FOR sequence_name.
Primary keys, unique constraints, and indexes map directly; review clustered vs nonclustered choices — MSSQL has clustered index concept (one per table) which affects physical ordering.
Foreign keys: translate directly, but watch for ON DELETE/UPDATE behaviors.
Default expressions: some Postgres expressions (e.g., now(), uuid_generate_v4(), gen_random_uuid()) must be mapped to MSSQL equivalents (GETUTCDATE()/GETDATE(), NEWID(), NEWSEQUENTIALID(), or custom CLR functions).
Computed/generated columns: Postgres GENERATED AS IDENTITY or computed columns map to MSSQL computed columns or identity — verify persisted vs non-persisted behavior.

4. Constraints, indexes, and advanced indexing

Check constraints and unique constraints translate directly; ensure constraint names do not exceed MSSQL length limits.
Partial indexes: PostgreSQL supports partial indexes (WHERE clause). MSSQL does not directly support partial indexes; emulate with filtered indexes (available) or move logic into included WHERE in filtered index (MSSQL filtered index syntax supports a WHERE clause — similar concept but different rules).
Expression indexes: Postgres expression-based indexes may need computed columns in MSSQL (persisted computed columns can be indexed).
GIN/GiST indexes: No direct equivalents. For text search use MSSQL Full-Text Search; for arrays or JSON use inverted/FTS or normalized tables.
Full-text search: PostgreSQL uses tsvector + GIN/GiST. MSSQL offers Full-Text Search (CONTAINS, FREETEXT) with different configuration and behavior.

5. Procedural code, triggers, and functions

PostgreSQL uses PL/pgSQL (and other languages). MSSQL uses T-SQL (Transact-SQL).
Stored procedures and functions must be rewritten for T-SQL — syntax and built-in functions differ.
Triggers: convert triggers to MSSQL triggers; understand AFTER vs INSTEAD OF behavior differences.
Set-returning functions in PostgreSQL (returning table rows) map to T-SQL table-valued functions, but implementation differs.
Error handling: PL/pgSQL’s EXCEPTION blocks map to TRY…CATCH in T-SQL.

6. Dealing with PostgreSQL-specific features

Arrays: normalize or use JSON. Example: a tags text[] column → create tags table with (parent_id, tag) rows, or tags as JSON array and use OPENJSON for queries.
JSONB: MSSQL lacks native binary JSON but supports JSON functions. JSON storage in NVARCHAR(MAX) is typical; performance and indexing require computed columns or full-text/search indexing strategies.
Extensions (PostGIS): use SQL Server spatial types (geometry/geography) and translate SRIDs, functions, and indexes carefully.
Window functions: both support window functions, but some syntax/function names may differ.
Common Table Expressions (CTEs): both support CTEs; conversion generally straightforward.
WITH ORDINALITY and some advanced SQL constructs may need rewriting.

7. Collation, encoding, and locale

PostgreSQL typically uses UTF-8; MSSQL can use NVARCHAR for Unicode and collations for case sensitivity and accent sensitivity. Choose appropriate collation to match sorting and comparison behavior.
Collation affects string comparisons, ORDER BY, and uniqueness. Test indexes and unique constraints if collation differs.

8. Migration strategy and practical steps

Inventory schema and features:
- List tables, columns, types, constraints, indexes, sequences, triggers, functions, views, and extensions.
Choose type mappings and document exceptions:
- Decide NVARCHAR vs VARCHAR, DATETIME2 vs DATETIMEOFFSET, how to handle arrays/json/enums.
Create target schema in MSSQL:
- Prefer generating DDL scripts programmatically. Adjust identity, computed columns, and defaults.
Migrate static reference data first, then tables without FKs, then dependent tables (or disable FK checks and re-enable after).
Convert data:
- For types requiring transformation (UUIDs, JSON, bytea), apply conversion functions.
- Use bulk load tools (bcp, BULK INSERT, SSIS, Azure Data Factory) or ETL tools.
Recreate indexes, constraints, and permissions.
Translate and deploy stored procedures, functions, triggers.
Validate:
- Row counts, checksums, sample queries, and application tests.
Performance tuning:
- Update statistics, adjust indexes, consider clustered index choice, examine query plans and rewrite slow queries.
Cutover planning:
- Consider near-zero downtime techniques (replication, dual writes, logical replication + sync, or ETL with change data capture), testing fallback plans.

9. Examples

Example: serial to IDENTITY PostgreSQL:

id SERIAL PRIMARY KEY

MSSQL equivalent:

id INT IDENTITY(1,1) PRIMARY KEY

Example: jsonb to NVARCHAR + computed index PostgreSQL:

payload jsonb CREATE INDEX idx_payload_title ON mytable ((payload->>'title'));

MSSQL:

payload NVARCHAR(MAX); -- Create persisted computed column to extract title, then index it ALTER TABLE mytable ADD payload_title AS JSON_VALUE(payload, '$.title') PERSISTED; CREATE INDEX idx_payload_title ON mytable(payload_title);

Example: array of integers PostgreSQL:

tags INT[]

MSSQL options:

Normalize:


CREATE TABLE item_tags (item_id INT, tag INT, PRIMARY KEY(item_id, tag));

Or store JSON:


tags NVARCHAR(MAX) -- JSON array like '[1,2,3]'

10. Testing and validation checklist

Schema parity: column counts/types/constraints match intended mapping.
Referential integrity: FK constraints enforced and validated.
Sample queries: compare result sets on representative queries.
Aggregate checksums: use hashing (checksum functions) for critical tables.
Performance benchmarks: compare slowest queries and tune indexes.
Application-level tests: full test suite passing against MSSQL environment.

11. Tools that help

ETL/replication: SQL Server Integration Services (SSIS), Azure Data Factory, Pentaho, Talend, Apache NiFi.
Migration assistants: Microsoft SQL Server Migration Assistant (SSMA) for PostgreSQL can automate many conversions.
Custom scripts: python (psycopg2 + pyodbc), Go, or other ETL code for complex transforms.
Change Data Capture/replication: consider logical replication, Debezium + Kafka + sink, or commercial replication tools for minimal downtime.

12. Common pitfalls and gotchas

Relying on PostgreSQL-specific types (array, jsonb, hstore, range, enum) without a conversion plan.
Differences in NULL handling and empty string semantics.
Time zone mishandling when converting timestamptz.
Assumptions about index behavior and planner choices — queries may need rewriting for optimal T-SQL performance.
Collation and case-sensitivity causing duplicate-key errors or missing matches.
Oversights in default values that reference sequences or functions.

13. Summary recommendations

Use NVARCHAR and DATETIME2 by default for Unicode text and timestamps unless you have a reason otherwise.
Normalize arrays and enums into tables for portability and queryability.
Treat JSONB as NVARCHAR with planned computed columns for indexing when needed.
Convert sequences/serials to IDENTITY or MSSQL SEQUENCE carefully, preserving next values.
Run thorough validation and performance testing; expect to rewrite stored procedures and queries.

If you want, I can:

Generate a DDL conversion script for a sample PostgreSQL schema you provide.
Produce example T-SQL rewrites for specific PostgreSQL functions or stored procedures.
Create a migration checklist tailored to your database size and downtime tolerance.

Performance Tuning After PostgresToMsSql Migration

1. High-level differences to keep in mind

2. Data type mapping: common types and recommended translations

3. Keys, defaults, and identity columns

4. Constraints, indexes, and advanced indexing

5. Procedural code, triggers, and functions

6. Dealing with PostgreSQL-specific features

7. Collation, encoding, and locale

8. Migration strategy and practical steps

9. Examples

10. Testing and validation checklist

11. Tools that help

12. Common pitfalls and gotchas

13. Summary recommendations

Comments

Leave a Reply Cancel reply

More posts

SAP Crystal Reports Viewer: Features, Benefits, and Best Practices

enaxos.export

The Self-Extractor: A Guide to Personal Growth and Transformation

From Code to Clarity: How Symbol Extractors Enhance Data Analysis