· engineering  · 8 min read

Enforce Data Integrity: Master your JSON with JSON Schema

Struggling with inconsistent JSON data? JSON Schema offers a powerful, standardized approach to validate, structure, and streamline your data management.

Struggling with inconsistent JSON data? JSON Schema offers a powerful, standardized approach to validate, structure, and streamline your data management.

JSON, it’s everywhere, the popularity of this flexible, ubiquitous, and somewhat verbose data exchange format shows little sign of slowing. But with simplicity and flexibility comes risk - how do we know that the JSON we’re exchanging has the right “shape”, and how do we communicate and validate the “shape” of this data?
The answer lies with JSON Schema.

What is JSON Schema?

JSON Schema is a specification for defining what JSON can look like, a blueprint, describing the shape of your data, and crucially, the relationships between it.

Its simplicity is its key - it is itself “just” JSON, and libraries and tools supporting it’s use are available in many languages.

A quick glance at some of the implementations available for JSON Schema, shows immediately just how portable it is.

And it’s easy to get started, this basic example defines a user object, with a required name, and an optional age (integer with a minimum value of 18).

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "age": { "type": "integer", "minimum": 18 }
  },
  "required": ["name"]
}

This simple JSON, defining the shape of our user data, can be used to validate user input - whether in the application, or the infrastructure, for example with AWS API Gateway Models, or MongoDb.
It can also be used for creating TypeScript types, generating forms and sharing api documentation dynamically.

By describing our data this way we get improved data consistency, better error handling, faster, cheaper rejections of incorrect data, and quicker, easier implementation, because the docs you share, are the docs you use.

An aside: Zod et al

Zod is a very popular, powerful JSON validation library for TypeScript.
JSON Schema, offers two main advantages over Zod, and similar libraries however:

  • It is a portable, established standard, it can be in non TypeScript environments, and form part of your living documentation.
  • It can express much more complex relationships and restrictions on and between data.

However, it doesn’t need to be an either/or situation - in many cases you can of course convert JSON Schema to Zod and vice-versa.

Complex schemas: Limits, Lists and patterns

The example below is slightly more complex, we have:

  • default values for AutoScaleLimit and LogFileRetention
  • a regular expression and string length limits for LogFileName.
  • a range of numbers for AutoScaleLimit.
  • a list of valid entries for LogFileRetention.
  • a required property (AutoScaleLimit).
  • no non-defined properties allowed (“additionalProperties”: false).
  • a “dependentRequired” property - if LogFileName is given, LogFileRetention is required.
{
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "LogFileName": {
      "type": "string",
      "pattern": "^[\\w+-_]{1,}",
      "minLength": 3,
      "maxLength": 10
    },
    "AutoScaleLimit": {
      "description": "The maximum number instances of this service to spin up",
      "default": 1,
      "minimum": 0,
      "maximum": 10,
      "type": "integer"
    },
    "LogFileRetention": {
      "type": "integer",
      "description": "How long to retain log files for (days)",
      "default": 7,
      "enum": [1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365]
    }
  },
  "required": ["AutoScaleLimit"],
  "dependentRequired": {
    "LogFileName": ["LogFileRetention"]
  }
}

SubSchemas: dependencies and conditions

As we saw above, an object can have a “required” property, listing which properties in the data are required.
A more powerful feature is schema composition - we can validate a schema against a subschema using these keywords:

  • allOf Must be valid against all subschemas (AND).
  • anyOf Must be valid against any of the subschemas (OR).
  • oneOf Must be valid against exactly one of the subschemas (XOR).
  • not Must not be valid against the single subschema.

We can also have conditional subschemas - if/then/else.
If the data matches a subschema, then it must match a second one, else(optional) it must match another.

dependentSchemas builds on the dependentRequired we saw above, to specify an entire subschema that is to be applied if a property is present.

This example uses “if/then” and “allOf” to force CPU size types to Fit the allowed Memory size subschemas, and the required fields

    "Parameters": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "CPU": {
          "default": 256,
          "type": "integer",
          "enum": [256, 512, 1024, 2048, 4096, 8192],
          "enumNames": [
            "256 (0.25 vCPU)",
            "512 (0.5 vCPU)",
            "1024 (1 vCPU)",
          ]
        },
        "Memory": {
          "type": "integer"
        },
      },
      "allOf": [
        {
          "if": {
            "properties": {
              "CPU": {
                "const": 256
              }
            }
          },
          "then": {
            "properties": {
              "Memory": {
                "description": "TaskDefinition Memory for 256 (0.25 vCPU)",
                "default": 512,
                "type": "integer",
                "enum": [512, 1024, 2048]
              }
            }
          }
        },
        {
          "if": {
            "properties": {
              "CPU": {
                "const": 512
              }
            }
          },
          "then": {
            "properties": {
              "Memory": {
                "description": "TaskDefinition Memory for 512 (0.5 vCPU)",
                "default": 1024,
                "type": "integer",
                "enum": [1024, 2048, 3072, 4096]
              }
            }
          }
        },
        {
          "if": {
            "properties": {
              "CPU": {
                "const": 1024
              }
            }
          },
          "then": {
            "properties": {
              "Memory": {
                "description": "TaskDefinition Memory for 1024 (1 vCPU)",
                "default": 2048,
                "type": "integer",
                "enum": [2048, 3072, 4096, 5120, 6144, 7168, 8192]
              }
            }
          }
        },
        {
          "required": [
            "Memory",
            "CPU"
          ]
        }
      ]
    },

For more on schema composition, refer to the docs.

Referencing other schemas

We like our code to be DRY, and schemas are no exception.
By giving our Schemas an $idproperty, we can reference them elsewhere later.

Using the $defs keyword (in previous versions of the spec, this was definitions), we can define a number of schemas that can be referenced from within the same schema, even recursively.

In the example below we define an “address” schema in $defsthen later use it twice in our schema for the users home and other addresses.

{
  "$defs": {
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": "string" },
        "city": { "type": "string" },
        "zip": { "type": "string", "pattern": "^[0-9]{5}$" } // Regex for 5-digit US zip code
      },
      "required": ["street", "city", "zip"]
    }
  },
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "email": { "type": "string", "format": "email" }, // Email format validation
    "age": { "type": "integer", "minimum": 18 },
    "mainAddress": {
      "$ref": "#/$defs/address" // Reference the address sub-schema
    },
    "otherAddresses": {
      "type": "array", // an array of address objects
      "minItems": 0,
      "maxItems": 5,
      "uniqueItems": true, // Ensure unique addresses
      "items": {
        "$ref": "#/$defs/address" // Reference the address sub-schema
      }
    }
  }
}

We’re not limited to referencing schemas in the same document however.
If we treat the $id as the base URI of our schema, we can reference other schemas relatively to it (for example, in the directory structure, or as a url).
However the implementation of the fetching of these schemas varies from implementation to implementation and is not part of the specification, It’s important to remember that schema URIs are primarily identifiers, not necessarily locations to download them from.

Read more about referencing other schemas in the docs.

Note: - tools like AJV for TypeScript can also be used to *deference a schema. Dereferencing is the essentially replacing all the references with the actual schema - this can sometimes be necessary as not all tools will be support references, or the schemas reference internal URIs that are not publicly accessible. Bear in mind that not all schemas can be dereferenced, as references can be recursive or circular.

Advanced arrays and objects

We saw in the examples above, how arrays can be implemented, their lengths constrained, uniqueness enforced, and items defined -which we can of course combine with keywords such as oneOf to allow arrays with differing item types.
Another keyword for arrays to be aware of is contains - this allows us to validate an array if at least one item matches a given schema:

{
  "type": "array",
  "contains": { "type": "integer", "minimum": 5 }
}

We’ve also seen basic objects above, and how their shape, required properties and dependencies can be defined.
And we alluded to additional properties, when mentioning using "additionalProperties":false to prevent this, the corollary of this is that we can add additional properties to objects.
To this end propertyNames allows us to enforce the pattern of additional property names, while patternProperties allows us to apply specific schemas to different patterns. minPropertiesand maxProperties allow us to restrict the number of (additional) properties in an object.

{
  "type": "object",
  "minProperties": 2,
  "maxProperties": 5,
  "propertyNames": {
    "pattern": "^[a-z]+$"
  },
  "patternProperties": {
    "^S": { "type": "string" },
    "^I": { "type": "integer" }
  }
  ...
}

Conclusion

JSON Schema is not just a powerful tool for ensuring data consistency and integrity—it’s a standardized solution that integrates seamlessly with a wide array of technologies.
By adopting JSON Schema, you can enhance error handling, reduce development time, and create robust, self-validating documentation that evolves alongside your codebase. Whether you’re validating API requests, defining database schemas, or generating TypeScript types, JSON Schema provides a versatile and reliable framework for managing your JSON data.

Next time you’re facing the challenge of validating or structuring your JSON data, consider implementing JSON Schema. Its benefits in terms of consistency, efficiency, and maintainability make it an invaluable asset for any developer. Start exploring the numerous libraries and tools available, and see how JSON Schema can transform your approach to data management.

References & Resources:

James Babington

About James Babington

A cloud architect and engineer with a wealth of experience across AWS, web development, and security, James enjoys writing about the technical challenges and solutions he's encountered, but most of all he loves it when a plan comes together and it all just works.

Comments

No comments yet. Be the first to comment!

Leave a Comment

Check this box if you don't want your comment to be displayed publicly.

Back to Blog

Related Posts

View All Posts »