The Manual Schema Mapping Problem
When you ingest a new JSON API into your database, the workflow looks like this:
- Get a sample JSON response
- Open a spreadsheet
- Map each field to a SQL column
- Guess the type (is
pricean integer or a float?) - Write the CREATE TABLE by hand
- Run it
- Realize
created_atis a string, not a timestamp - Alter the table
- Repeat for the next 12 API endpoints
This is slow, error-prone, and doesn't scale. Every new API endpoint means another manual schema mapping session. And if the API changes — a field goes from integer to string, a new field appears — you're back to step 2.
json2sql's --schema-only flag eliminates the manual mapping:
json2sql convert api-response.json --schema-only --dialect postgres
Output:
No spreadsheet. No guessing. The types are inferred from the actual data values.
Quick Start: Generate Your First Schema
Step 1: Install json2sql
# pip
pip install json2sql
# Homebrew (macOS / Linux)
brew tap Coding-Dev-Tools/tap
brew install json2sql
# Scoop (Windows)
scoop bucket add Coding-Dev-Tools https://github.com/Coding-Dev-Tools/scoop-bucket
scoop install json2sql
Step 2: Save a Sample JSON Response
# Fetch a sample from your API
curl -s https://api.example.com/users | python3 -m json.tool > users.json
The JSON file:
[
{
"id": 1,
"name": "Alice Chen",
"email": "alice@example.com",
"age": 30,
"active": true,
"balance": 1250.50,
"created_at": "2024-01-15T10:30:00Z"
},
{
"id": 2,
"name": "Bob Martinez",
"email": "bob@example.com",
"age": 25,
"active": false,
"balance": 890.25,
"created_at": "2024-02-20T14:00:00Z"
}
]
Step 3: Generate the Schema
json2sql convert users.json --schema-only --dialect postgres
Output:
Step 4: Write to a Migration File
json2sql convert users.json --schema-only --dialect postgres --output migrations/001_create_users.sql
Type Inference: How It Works
json2sql inspects every value in your JSON data and maps it to the correct SQL type for your dialect:
| JSON Value | Example | PostgreSQL | MySQL | SQLite |
|---|---|---|---|---|
| String | "Alice" |
TEXT |
VARCHAR(255) |
TEXT |
| Integer | 42 |
INTEGER |
INT |
INTEGER |
| Float | 3.14 |
DOUBLE PRECISION |
DOUBLE |
REAL |
| Boolean | true |
BOOLEAN |
TINYINT(1) |
INTEGER |
| Null | null |
TEXT |
TEXT |
TEXT |
Type upgrading: If the first record has "count": null and a later record has "count": 42, json2sql upgrades the column from TEXT to INTEGER. It scans all records and uses the most specific type found — so you get accurate types even when the first record has nulls.
Schema-Only vs. Full Conversion
The --schema-only flag generates DDL without data — perfect for migration workflows. The default (without the flag) generates both DDL and INSERT statements:
json2sql convert |
json2sql convert --schema-only |
|
|---|---|---|
| Output | CREATE TABLE + INSERT | CREATE TABLE only |
| Best for | One-shot data loading | Schema-first migration, incremental loads |
| Review workflow | Hard to review DDL mixed with data | Clean DDL for human review |
| CI/CD | Too much output for PR reviews | Compact DDL diffs in PR comments |
| Schema changes | Re-run with new data | Compare old vs. new DDL |
Five Real-World Schema Generation Workflows
Workflow 1: New API Onboarding
You're integrating a third-party payments API. You have the docs but no schema. Fetch a sample response, generate the DDL, review it, and run it. Five minutes instead of an hour of manual mapping.
# Fetch sample and generate schema in one pipeline
curl -s https://api.payment.com/transactions | \
json2sql convert --schema-only --dialect postgres --table transactions \
--output migrations/001_create_transactions.sql
Workflow 2: Multi-API Schema Bootstrap
You're building a data warehouse that ingests from five different APIs. Each has its own JSON shape. Generate all five schemas in one script:
#!/bin/bash
# bootstrap_warehouse.sh
APIS=("users" "orders" "products" "payments" "analytics")
for api in "${APIS[@]}"; do
curl -s "https://api.example.com/$api" > "/tmp/${api}.json"
json2sql convert "/tmp/${api}.json" \
--schema-only \
--dialect postgres \
--table "$api" \
--output "migrations/001_create_${api}.sql"
echo "Generated schema for $api"
done
Workflow 3: Schema Drift Detection in CI
Your API partner changed their response format — a new field appeared, a type changed. Detect it before it breaks your ETL:
# .github/workflows/schema-drift.yml
name: API Schema Drift Detection
on:
schedule:
- cron: '0 8 * * 1' # Monday 8 AM
jobs:
drift-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install json2sql
run: pip install json2sql
- name: Fetch current API schema
run: |
curl -s $API_URL > /tmp/current.json
json2sql convert /tmp/current.json \
--schema-only --dialect postgres > /tmp/current_schema.sql
- name: Compare with baseline
run: |
if ! diff -q schemas/baseline_schema.sql /tmp/current_schema.sql > /dev/null 2>&1; then
echo "::warning::API schema has changed!"
diff schemas/baseline_schema.sql /tmp/current_schema.sql
# Create an issue or notify
else
echo "No schema drift detected."
fi
Workflow 4: Nested JSON → Multiple Tables
Your JSON has nested objects and arrays. --schema-only --flatten generates DDL for all the relational tables at once:
json2sql convert orders.json --schema-only --flatten --dialect postgres
Output:
Workflow 5: Multi-Dialect Schema Generation
Your dev team uses SQLite locally, staging runs MySQL, and production is PostgreSQL. Generate schemas for all three from the same JSON:
# Generate for all three dialects
for dialect in postgres mysql sqlite; do
json2sql convert api-data.json \
--schema-only \
--dialect "$dialect" \
--table "users" \
--output "schemas/users_${dialect}.sql"
done
PostgreSQL output:
MySQL output:
SQLite output:
Type Inference Edge Cases
Null Values Get Upgraded
When the first record has a null value but later records have a real value, json2sql upgrades the type:
[
{"id": 1, "nickname": null},
{"id": 2, "nickname": "Ace"}
]
Result: nickname is inferred as TEXT, not left as a generic nullable column. json2sql scans all records before deciding the final type.
Booleans Are Not Integers
In JSON, true/false are distinct from 1/0. json2sql handles this correctly — even though Python's bool is a subclass of int, the type inference checks for booleans first:
[
{"id": 1, "active": true},
{"id": 2, "active": false}
]
Result: active → BOOLEAN (PostgreSQL), TINYINT(1) (MySQL), INTEGER (SQLite).
Mixed Types Default to TEXT
If a field contains both strings and numbers across records (which shouldn't happen in a well-designed API but does in practice), json2sql falls back to TEXT for safety:
[
{"id": 1, "code": "ABC"},
{"id": 2, "code": 42}
]
Result: code → TEXT. The integer value is preserved — it'll be inserted as '42' — and your pipeline doesn't crash on a type mismatch.
Tip: If you see unexpected TEXT types in your schema, check your source data for mixed types. This is usually a sign that the upstream API is inconsistent — worth flagging with the API owner.
CI/CD Integration: Schema-as-Code
Pattern 1: PR-Driven Schema Changes
When someone changes the JSON schema sample in your repository, automatically generate the DDL and post it as a PR comment:
# .github/workflows/schema-review.yml
name: Schema Review
on:
pull_request:
paths: ['schemas/*.json']
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install json2sql
run: pip install json2sql
- name: Generate DDL for changed schemas
run: |
for file in $(git diff --name-only origin/main -- 'schemas/*.json'); do
table=$(basename "$file" .json)
echo "### Schema for \`$table\`" >> /tmp/ddl_review.md
echo "" >> /tmp/ddl_review.md
echo '```sql' >> /tmp/ddl_review.md
json2sql convert "$file" --schema-only --dialect postgres --table "$table" >> /tmp/ddl_review.md
echo '```' >> /tmp/ddl_review.md
echo "" >> /tmp/ddl_review.md
done
- name: Post DDL as PR comment
if: hashFiles('/tmp/ddl_review.md') != ''
run: |
gh pr comment ${{ github.event.pull_request.number }} \
--body-file /tmp/ddl_review.md
Pattern 2: Automated Migration Generation
Generate migration files from JSON samples as part of your build process:
# .github/workflows/generate-migrations.yml
name: Generate Migrations
on:
push:
paths: ['schemas/*.json']
branches: [main]
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install json2sql
run: pip install json2sql
- name: Generate migration for each schema
run: |
VERSION=$(date +%Y%m%d%H%M%S)
for file in schemas/*.json; do
table=$(basename "$file" .json)
json2sql convert "$file" \
--schema-only \
--dialect postgres \
--table "$table" \
--output "migrations/${VERSION}_create_${table}.sql"
done
- name: Commit migrations
run: |
git add migrations/
git commit -m "Auto-generate migrations from schema samples" || true
git push
Combining Schema-Only with Flatten
The --schema-only and --flatten flags work together to generate DDL for nested JSON — producing multiple CREATE TABLE statements from a single nested JSON document:
json2sql convert orders.json --schema-only --flatten --dialect postgres
For a JSON document with nested customer info and line items, this produces three tables:
orders— the parent table with flattened customer fieldsorders_items— a child table for the line items array
Each child table automatically gets a foreign key reference to its parent (using the parent's id, name, or {table}_id field if present).
Schema-first flattening: Use --schema-only --flatten to design your relational schema before loading any data. Review the DDL, adjust column names or types if needed, then run the full conversion with INSERT statements to load the data.
Install json2sql
# pip
pip install json2sql
# Homebrew (macOS / Linux)
brew tap Coding-Dev-Tools/tap
brew install json2sql
# Scoop (Windows)
scoop bucket add Coding-Dev-Tools https://github.com/Coding-Dev-Tools/scoop-bucket
scoop install json2sql
Star json2sql on GitHub
Related Reading
- Convert JSON to SQL in One Command — json2sql basic tutorial
- Nested JSON to Relational SQL — flatten mode deep dive
- Catch Data Schema Drift in CI — DataMorph schema validation
- Block Deployments on Config Drift — ConfigDrift CI gating