If you're a network engineer stepping into the world of automation, you've probably encountered the term schema more times than you can count.
But what exactly is a data schema, and why should you care?
The short answer: A schema specifies how data is organized and interpreted in databases or data models. Since network automation is only as good as the data it relies on, a solid understanding of schemas and how they model data is essential for automation engineers.
What is a data schema?
A schema defines the structure, format, and constraints of data.
A data schema specifies:
- What types of data exist
- What attributes each type has
- How different pieces of data relate to each other
- What rules the data must follow
Schema plays an incredibly important role in network automation because it ensures data consistency and integrity, and facilitates communication between different systems by providing a shared understanding of the dataʼs structure.
There's no perfect data schema
Everything in technology brings trade-offs. It's no different with schema types.
You will never find the perfect schema. But if you understand the different schema types, and the strengths and weaknesses of each one, you can find the best schema for your particular use case.
With schemas, the biggest trade-off is always flexibility vs data integrity.

When schemas are loosely defined or unenforced, you get flexibility. It's incredibly easy to add new fields, change data structures, and move fast.
This works beautifully for quick scripts and proof-of-concepts, small projects with one or two contributors, or rapidly evolving data models where you're still figuring out what you actually need.
But as your system scales, flexibility becomes a liability. Without schema enforcement, you end up with inconsistent data.
Some devices have a "site" field, others have "location," and nobody's quite sure which one is correct. You get silent failures when your automation script expects an integer but gets a string. Integration becomes a nightmare because each consumer of your data has to handle all the edge cases differently.
When schemas are formally defined and enforced, you trade some flexibility for integrity and consistency. But that integrity is essential if you want data that will support network automation in production.
Where a data schema can be defined
There are three places, or levels, where a schema can be defined and implemented. It's important to understand the differences between the levels because where the schema is defined has a big impact on those trade-offs we just discussed.
- Storage level: Schema is enforced at the database level. You get solid data structure and integrity.
- Application level: Schema is enforced through code at the application layer. You get more flexibility but your data integrity depends entirely on the quality of your code.
- User level: Schema is not formally enforced at all. Users are simply expected to follow conventions documented somewhere (hopefully). You get maximum flexibility but minimum guarantees.

There's always a data schema
Here's something important to understand: Everything has a schema, whether you realize it or not.
You might have heard terms like "schema-less" databases or "schema-free" configuration management. These are technically a myth.
Even the worst possible schema in the world—no consistency, completely abstract, any type of data in any field—is still a schema. Because if there's no schema, there's no data.

Systems marketed as "schema-less" simply mean the schema isn't enforced at the database level. Instead, the schema lives implicitly in the code (at the application level) or in the minds of engineers (at the user level).
The 3 components of every data schema
No matter the type, every data schema has three fundamental components: structure, relationships, and constraints.
1. Structure
Structure defines what types of things exist in your data model and what attributes they have. Think about modeling a network device. You might have a name (string), a model (string), a serial number (string), a management IP address, an install date, and a flag indicating whether it's active (boolean).
Each attribute has a type, and types matter because they determine what operations are valid. You can do math on integers but not on strings. You can validate IP address formats but not arbitrary text.
2. Relationships
Relationships define how different objects relate to each other, and this is where schemas really start to reflect how we think about infrastructure.
The most common relationship pattern is a one-to-many relationship. One device has many interfaces. One site contains many devices. These hierarchical relationships are everywhere in network modeling.
You'll also encounter many-to-many relationships. A device can have multiple tags, and each tag can apply to multiple devices. A prefix can be allocated to multiple sites in a multi-site deployment. These are trickier to model but crucial for real-world scenarios.
One-to-one relationships are less common but do exist. A device might have one primary management interface that you model separately from the rest.

The type of relationship matters for network automation. When you query "Show me all interfaces on devices in the NYC datacenter," you're traversing relationships. You find the site object for NYC, then find all device objects related to that site, then find all interface objects related to those devices. Your schema defines these relationships, and your database uses them to answer queries efficiently.
3. Constraints and validation
Constraints are the rules that keep your data clean and consistent. They prevent impossible or invalid states.
Think about the constraints that make sense for network data. Examples:
- Required fields: Every device must have a name.
- Uniqueness constraints: No two devices can have the same name, and no two interfaces on the same device can share a name.
- Format validation: IP addresses must be valid IPv4 or IPv6 addresses, and MAC addresses must match the standard format.
- Range constraints: Interface speed must be a positive integer, and VLAN IDs must be between 1 and 4094.
Then there's referential integrity, which is all about maintaining consistency across relationships. An interface can't be assigned to a device that doesn't exist. When you delete a device, what happens to its interfaces? Do they get deleted too, or does the system prevent you from deleting a device that still has interfaces attached?
Without constraints, your data becomes a mess, and it can't be relied on to accurately drive automation.
Data schema languages
Now that you understand what schemas are and what they do, let's look at the different languages and tools used to define them. Each has its strengths and weaknesses.
SQL: The original and still dominant
SQL has been around since 1970, and is the language of relational databases. It's a language for both defining data schemas and querying data.
Here's what a simple SQL schema looks like:
CREATE TABLE devices (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL UNIQUE,
model VARCHAR(100),
serial_number VARCHAR(100),
site_id INTEGER REFERENCES sites(id)
);
The strengths of SQL are hard to beat. It's mature, well-understood, and universally supported. The schema is mandatory and enforced at the database level, so you can't accidentally violate it. The query capabilities are powerful, and you get strong consistency with ACID transaction guarantees.
SQL shines in traditional applications that require strict data integrity, transactional workloads where consistency is critical, and systems where the data model is relatively stable over time.
JSON Schema: Modern and flexible
JSON Schema came along in 2010 as a way to define the structure of JSON documents. It's widely used for API validation, configuration files, and NoSQL databases.
A JSON Schema looks like this:
{
"title": "New Blog Post",
"content": "content of the blog...",
"publishedDate": "2023-08-25T15:00:00Z",
"author": {
"username": "authoruser",
"email": "[email protected]"
},
"tags": ["Technology", "Programming"]
}
{
"$id": "https://example.com/blog-post.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "A representation of a blog post",
"type": "object",
"required": ["title", "content", "author"],
"properties": {
"title": {
"type": "string"
},
"content": {
"type": "string"
},
"publishedDate": {
"type": "string",
"format": "date-time"
},
"author": {
"$ref": "https://example.com/user-profile.schema.json"
},
"tags": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
What makes JSON Schema appealing is that it's human-readable and easy to understand. The tooling for validation and code generation is excellent. It supports external references and schema composition, which means you can build complex schemas from reusable pieces. And since it's designed for JSON, it's a natural fit for JSON-based APIs and documents.
JSON Schema is useful when you're building APIs and need to validate requests and responses, when you're working with configuration files that need validation, or when you want to document data structures in your API documentation.
YANG: Built for networks
YANG was introduced in 2010 and designed specifically for modeling network configuration and operational data. If you're working with network devices, this is the industry standard.
A YANG model looks like this:
module example-network {
namespace "http://example.com/network";
prefix "ex";
container device {
leaf hostname {
type string;
}
leaf ip-address {
type inet:ipv4-address;
}
leaf model {
type string;
}
list interfaces {
key "name";
leaf name {
type string;
}
leaf enabled {
type boolean;
}
}
}
}
YANG's strength is that it's purpose-built for network management. It integrates seamlessly with NETCONF, RESTCONF, and gNMI protocols. It has strong typing and validation for network-specific data types, and it's an industry standard with extensive vendor support across major equipment manufacturers.
YANG is useful when managing network device configurations, modeling operational state, or integrating with standard network management protocols.
GraphQL: Schema + query language
GraphQL, developed at Facebook in 2012 and open-sourced in 2015, takes a different approach. It combines a schema definition language with a query language, giving you a complete system for building modern APIs.
Here's a GraphQL schema:
type Mutation {
createPost(input: CreatePostInput!): Post
createUser(input: CreateUserInput!): User
addComment(input: AddCommentInput!): Comment
}
input CreatePostInput {
title: String!
content: String!
authorId: ID!
}
input CreateUserInput {
name: String!
email: String!
}
input AddCommentInput {
postId: ID!
content: String!
authorId: ID!
}
The power of GraphQL is that a single query can fetch exactly the data you need, with no over-fetching or under-fetching. It's strongly typed with excellent developer tooling. The schema serves as both documentation and a contract between your API and its consumers. And it has built-in support for mutations, which are how you change data.
GraphQL is ideal for APIs where clients need flexible querying capabilities, applications with complex nested data requirements, and teams that want type safety across both frontend and backend code.
Infrahub schema: Infrastructure-specific
Infrahub takes a modern approach to data modeling with a schema optimized specifically for networking and infrastructure management.
An Infrahub schema looks like this:
---
nodes:
- name: Device
namespace: Dcim
label: Network Device
icon: clarity:network-switch-solid
inherit_from:
- DcimGenericDevice
- DcimPhysicalDevice
attributes:
- name: name
kind: Text
unique: true
order_weight: 1000
- name: height
label: Height (U)
optional: false
default_value: 1
kind: Number
order_weight: 1400
relationships:
- name: platform
peer: DcimPlatform
cardinality: one
kind: Attribute
order_weight: 1300
What sets Infrahub apart is its native support for infrastructure concepts like hierarchies, inheritance, and relationships. And the system automatically generates a GraphQL API from your schema, so you don't have to write and maintain that layer yourself.
The Infrahub schema excels as the foundation for a network source of truth.
Choosing the right data schema to work with
So which schema approach should you use? Here's a practical framework based on what you're actually trying to accomplish.
- If you're working with network device configuration, YANG is the industry standard. Start here if you're working with NETCONF/RESTCONF or building device configuration templates.
- For API development, look at GraphQL or JSON Schema. Choose GraphQL if you want flexible querying capabilities. Go with JSON Schema if you need straightforward validation and documentation.
- For a network source of truth, consider a graph-oriented schema like the one in Infrahub.
- If your team is already comfortable with relational databases and your data model fits naturally into tables and rows, you might stick with traditional SQL.
For production systems at scale, you'll need to invest in formal schemas with enforcement. Your future self, and your team, will thank you when you're not debugging mysterious data inconsistencies at 2 am.
Practical data schema takeaways
Here's what you need to remember as you start thinking about schemas in your own work building and evolving network automation.
- Everything has a schema. The question is whether it's explicit and enforced, or implicit and living in someone's head.
- There are always trade-offs between flexibility and data integrity, speed and consistency. Choose based on your actual requirements, not industry hype or what seems trendy.
- All schemas share three core components: structure (objects and their attributes), relationships (how things connect), and constraints (validation rules). Understanding these fundamentals helps you evaluate any schema language or approach.
- Different tools exist for different jobs. SQL excels at transactional systems. JSON Schema shines for APIs. YANG is purpose-built for network devices. GraphQL enables flexible querying. Infrahub specializes in network data. Match the tool to the problem.
- Finally, it's perfectly okay to start simple and plan for growth. Beginning with loose schemas for prototypes is fine, but build in enforcement as your system matures and the stakes get higher.
Understanding schemas is the foundation for making smart decisions about how you model and store your network data. But once you've defined your schema, you need somewhere to actually store that data… which brings us to databases.
Next, we'll explore different database types, their performance characteristics and trade-offs, and why graph databases are the perfect fit for network automation.