PostgresToMsSql: Schema Mapping and Data Type CompatibilityMigrating a database from PostgreSQL to Microsoft SQL Server (MSSQL) involves more than copying tables and data — it requires careful schema mapping and attention to data type compatibility. PostgreSQL and MSSQL have different features, data types, default behaviors, and SQL dialects. This article explains key differences, practical mapping strategies, pitfalls to avoid, and examples to help you migrate schemas accurately and reliably.
1. High-level differences to keep in mind
- SQL dialect: PostgreSQL follows the SQL standard closely and adds many advanced features (e.g., arrays, JSONB, range types). MSSQL implements T‑SQL, which has its own syntax and procedural extensions (T-SQL).
- Case sensitivity: PostgreSQL identifiers are case-sensitive only if quoted; otherwise they are folded to lower-case. MSSQL folds unquoted identifiers to upper-case internally but is case-insensitive by default (collation-dependent).
- Schemas and permissions: Both systems support schemas (namespaces) but manage permissions and default schemas differently.
- Extensions and features: PostgreSQL has many extensions (PostGIS, pgcrypto) that have no direct equivalents in MSSQL or require different implementations.
- Transaction semantics and DDL: Some PostgreSQL DDL operations are transactional; in MSSQL, certain DDL operations are not fully transactional.
2. Data type mapping: common types and recommended translations
Below are common PostgreSQL types and recommended MSSQL equivalents, with notes about differences and conversion considerations.
PostgreSQL type | MSSQL type | Notes / Caveats |
---|---|---|
smallint | SMALLINT | Direct match. |
integer / int / int4 | INT | Direct match. |
bigint / int8 | BIGINT | Direct match. |
serial / bigserial | INT IDENTITY / BIGINT IDENTITY | Use IDENTITY(seed,increment) or SEQUENCE in MSSQL. Remove DEFAULT nextval(…) from migrated schema. |
numeric(p,s) / decimal(p,s) | DECIMAL(p,s) | Match precision/scale. Beware precision/scale limits and rounding behavior. |
real | REAL | 32-bit float; direct map. |
double precision | FLOAT(53) | MSSQL FLOAT default is double precision when specified as FLOAT(53). |
boolean | BIT | In MSSQL BIT stores 0/1; NULL allowed. Beware boolean expression differences. |
text | VARCHAR(MAX) or NVARCHAR(MAX) | Use NVARCHAR(MAX) if Unicode (recommended). For performance, map small texts to VARCHAR(n)/NVARCHAR(n). |
varchar(n) / character varying(n) | VARCHAR(n) or NVARCHAR(n) | Choose NVARCHAR for Unicode; length semantics similar. |
char(n) / character(n) | CHAR(n) or NCHAR(n) | Fixed-length semantics similar. |
bytea | VARBINARY(MAX) | Use VARBINARY for binary data. |
timestamp [without time zone] | DATETIME2 or DATETIME | DATETIME2 has higher precision (up to 100ns) and is recommended. Note: timestamp without time zone in Postgres stores no timezone info. |
timestamp with time zone (timestamptz) | DATETIMEOFFSET or DATETIME2 + separate offset handling | DATETIMEOFFSET preserves timezone offset; DATETIME2 does not. Converting timestamptz values to UTC and storing in DATETIME2 is common. |
date | DATE | Direct mapping. |
time [without time zone] | TIME | Use TIME or TIME(7) for precision. |
interval | TIME, DATETIMEOFFSET, or custom representation (e.g., BIGINT seconds) | MSSQL lacks a direct interval type — store as seconds, or structured fields, or use custom functions. |
uuid | UNIQUEIDENTIFIER | MSSQL UNIQUEIDENTIFIER stores GUIDs; conversion functions needed. |
json / jsonb | NVARCHAR(MAX), VARCHAR(MAX), or SQL Server JSON functions | MSSQL has JSON support via functions but no native JSON type; store as text and use OPENJSON/JSON_VALUE/JSON_QUERY. For heavy JSON use, consider schema or hybrid approach. |
ARRAY | Normalized tables or Delimited strings or SQL Server table-valued types | MSSQL doesn’t support array column types. Normalize arrays into child tables or use JSON. |
hstore | NVARCHAR(MAX) or mapping to key-value table | No native hstore; map to JSON or separate table. |
cidr / inet / macaddr | VARCHAR(n) or specialized types via extension | MSSQL has no inet type; store as VARCHAR and validate with functions. |
money | MONEY or DECIMAL(19,4) | MONEY has rounding quirks; DECIMAL is safer for precise calculations. |
XML | XML | MSSQL supports XML type with XQuery functions; behavior differs. |
geometric types | Custom tables / geometry (use SQL Server Spatial types) | Use SQL Server geometry/geography types for spatial data; map carefully (SRID differences). |
range types (int4range, tsrange) | Separate start/end columns or custom types | No direct equivalent—use two columns or normalized representation. |
enum | CHECK constraint on VARCHAR/INT or separate lookup table | Use constrained VARCHAR or small INT referencing lookup table for better extensibility. |
3. Keys, defaults, and identity columns
- PostgreSQL serial/bigserial: these create sequences and set DEFAULT nextval(…). In MSSQL replace with IDENTITY or create SEQUENCE objects and set DEFAULT NEXT VALUE FOR sequence_name.
- Primary keys, unique constraints, and indexes map directly; review clustered vs nonclustered choices — MSSQL has clustered index concept (one per table) which affects physical ordering.
- Foreign keys: translate directly, but watch for ON DELETE/UPDATE behaviors.
- Default expressions: some Postgres expressions (e.g., now(), uuid_generate_v4(), gen_random_uuid()) must be mapped to MSSQL equivalents (GETUTCDATE()/GETDATE(), NEWID(), NEWSEQUENTIALID(), or custom CLR functions).
- Computed/generated columns: Postgres GENERATED AS IDENTITY or computed columns map to MSSQL computed columns or identity — verify persisted vs non-persisted behavior.
4. Constraints, indexes, and advanced indexing
- Check constraints and unique constraints translate directly; ensure constraint names do not exceed MSSQL length limits.
- Partial indexes: PostgreSQL supports partial indexes (WHERE clause). MSSQL does not directly support partial indexes; emulate with filtered indexes (available) or move logic into included WHERE in filtered index (MSSQL filtered index syntax supports a WHERE clause — similar concept but different rules).
- Expression indexes: Postgres expression-based indexes may need computed columns in MSSQL (persisted computed columns can be indexed).
- GIN/GiST indexes: No direct equivalents. For text search use MSSQL Full-Text Search; for arrays or JSON use inverted/FTS or normalized tables.
- Full-text search: PostgreSQL uses tsvector + GIN/GiST. MSSQL offers Full-Text Search (CONTAINS, FREETEXT) with different configuration and behavior.
5. Procedural code, triggers, and functions
- PostgreSQL uses PL/pgSQL (and other languages). MSSQL uses T-SQL (Transact-SQL).
- Stored procedures and functions must be rewritten for T-SQL — syntax and built-in functions differ.
- Triggers: convert triggers to MSSQL triggers; understand AFTER vs INSTEAD OF behavior differences.
- Set-returning functions in PostgreSQL (returning table rows) map to T-SQL table-valued functions, but implementation differs.
- Error handling: PL/pgSQL’s EXCEPTION blocks map to TRY…CATCH in T-SQL.
6. Dealing with PostgreSQL-specific features
- Arrays: normalize or use JSON. Example: a tags text[] column → create tags table with (parent_id, tag) rows, or tags as JSON array and use OPENJSON for queries.
- JSONB: MSSQL lacks native binary JSON but supports JSON functions. JSON storage in NVARCHAR(MAX) is typical; performance and indexing require computed columns or full-text/search indexing strategies.
- Extensions (PostGIS): use SQL Server spatial types (geometry/geography) and translate SRIDs, functions, and indexes carefully.
- Window functions: both support window functions, but some syntax/function names may differ.
- Common Table Expressions (CTEs): both support CTEs; conversion generally straightforward.
- WITH ORDINALITY and some advanced SQL constructs may need rewriting.
7. Collation, encoding, and locale
- PostgreSQL typically uses UTF-8; MSSQL can use NVARCHAR for Unicode and collations for case sensitivity and accent sensitivity. Choose appropriate collation to match sorting and comparison behavior.
- Collation affects string comparisons, ORDER BY, and uniqueness. Test indexes and unique constraints if collation differs.
8. Migration strategy and practical steps
- Inventory schema and features:
- List tables, columns, types, constraints, indexes, sequences, triggers, functions, views, and extensions.
- Choose type mappings and document exceptions:
- Decide NVARCHAR vs VARCHAR, DATETIME2 vs DATETIMEOFFSET, how to handle arrays/json/enums.
- Create target schema in MSSQL:
- Prefer generating DDL scripts programmatically. Adjust identity, computed columns, and defaults.
- Migrate static reference data first, then tables without FKs, then dependent tables (or disable FK checks and re-enable after).
- Convert data:
- For types requiring transformation (UUIDs, JSON, bytea), apply conversion functions.
- Use bulk load tools (bcp, BULK INSERT, SSIS, Azure Data Factory) or ETL tools.
- Recreate indexes, constraints, and permissions.
- Translate and deploy stored procedures, functions, triggers.
- Validate:
- Row counts, checksums, sample queries, and application tests.
- Performance tuning:
- Update statistics, adjust indexes, consider clustered index choice, examine query plans and rewrite slow queries.
- Cutover planning:
- Consider near-zero downtime techniques (replication, dual writes, logical replication + sync, or ETL with change data capture), testing fallback plans.
9. Examples
-
Example: serial to IDENTITY PostgreSQL:
id SERIAL PRIMARY KEY
MSSQL equivalent:
id INT IDENTITY(1,1) PRIMARY KEY
-
Example: jsonb to NVARCHAR + computed index PostgreSQL:
payload jsonb CREATE INDEX idx_payload_title ON mytable ((payload->>'title'));
MSSQL:
payload NVARCHAR(MAX); -- Create persisted computed column to extract title, then index it ALTER TABLE mytable ADD payload_title AS JSON_VALUE(payload, '$.title') PERSISTED; CREATE INDEX idx_payload_title ON mytable(payload_title);
-
Example: array of integers PostgreSQL:
tags INT[]
MSSQL options:
- Normalize:
CREATE TABLE item_tags (item_id INT, tag INT, PRIMARY KEY(item_id, tag));
- Or store JSON:
tags NVARCHAR(MAX) -- JSON array like '[1,2,3]'
- Normalize:
10. Testing and validation checklist
- Schema parity: column counts/types/constraints match intended mapping.
- Referential integrity: FK constraints enforced and validated.
- Sample queries: compare result sets on representative queries.
- Aggregate checksums: use hashing (checksum functions) for critical tables.
- Performance benchmarks: compare slowest queries and tune indexes.
- Application-level tests: full test suite passing against MSSQL environment.
11. Tools that help
- ETL/replication: SQL Server Integration Services (SSIS), Azure Data Factory, Pentaho, Talend, Apache NiFi.
- Migration assistants: Microsoft SQL Server Migration Assistant (SSMA) for PostgreSQL can automate many conversions.
- Custom scripts: python (psycopg2 + pyodbc), Go, or other ETL code for complex transforms.
- Change Data Capture/replication: consider logical replication, Debezium + Kafka + sink, or commercial replication tools for minimal downtime.
12. Common pitfalls and gotchas
- Relying on PostgreSQL-specific types (array, jsonb, hstore, range, enum) without a conversion plan.
- Differences in NULL handling and empty string semantics.
- Time zone mishandling when converting timestamptz.
- Assumptions about index behavior and planner choices — queries may need rewriting for optimal T-SQL performance.
- Collation and case-sensitivity causing duplicate-key errors or missing matches.
- Oversights in default values that reference sequences or functions.
13. Summary recommendations
- Use NVARCHAR and DATETIME2 by default for Unicode text and timestamps unless you have a reason otherwise.
- Normalize arrays and enums into tables for portability and queryability.
- Treat JSONB as NVARCHAR with planned computed columns for indexing when needed.
- Convert sequences/serials to IDENTITY or MSSQL SEQUENCE carefully, preserving next values.
- Run thorough validation and performance testing; expect to rewrite stored procedures and queries.
If you want, I can:
- Generate a DDL conversion script for a sample PostgreSQL schema you provide.
- Produce example T-SQL rewrites for specific PostgreSQL functions or stored procedures.
- Create a migration checklist tailored to your database size and downtime tolerance.
Leave a Reply