Operational data: 10 must-know concepts for the non-technical SME leader

Pete Harvey
May 14
12 min read

Structured and unstructured operational data

10 things every SME owner needs to know about data

What is data?

Data is information that is recorded in a form that can be stored, processed, or communicated. It can be numbers, text, images, signals or any other raw facts that represent something about the world.

When you think about data in your business, you probably think about the data that is stored inside your software systems. This data - whether it’s in your finance system, CRM, or an Excel spreadsheet - is often called “structured data”. The rule of thumb is if it’s got rows and columns, it’s structured data.

You’ve probably got a lot of “unstructured data” too. That’s data that has no predefined format. Some examples might be emails, job descriptions, support tickets/resolutions, notes fields in CRMs. There’s a lot of valuable information in unstructured data, but it’s traditionally been harder to analyse than structured data.

Data without meaning is useless

A column called job_status containing the value 3 is data. It tells you nothing. The meaning - that 3 means "job complete" - lives somewhere else: probably inside your head, or the head of the person that configured your job management system.

Whenever you think about data, you also need to think about what it means. One of the core questions to always ask is “would a stranger be able to understand what this data means” - if not, then AI isn’t going to be able to either.

Your data is certainly imperfect

Most businesses - even huge corporations - didn’t sit down and design their data landscape. A businesses data landscape evolves over time. You buy a CRM to automate the sales process. You implement a finance system to automate book-keeping. You build a spreadsheet to log when something happens.

Over time, these systems begin to fight against each other. It becomes impossible to answer simple questions like “how profitable are my products”. You know that the data - theoretically - exists.

But getting answers is impossible.

If you can master the 10 concepts below, you’ll be able to diagnose why.

1. Keys: how data gets connected

Every entity that matters to your business (e.g. customer, job, product) should have one unique identifier that every system agrees on. In technical circles, a unique identifier that appears in every source of data about an entity is called a key. With it, it becomes easy to pull together all the information you business knows about a given entity.

What good looks like: Say you have a customer called Acme Ltd. You could assign this customer a unique customer number (e.g. #1042). That unique identifier is present next to the customer name in your CRM, your invoicing tool, and your support desk. One customer. One ID.

The reality in most businesses: "Acme Ltd", "Acme Limited", and "ACME" exist as separate records with no means of joining them together. Sales are reported separately for each of them, instead of for a single customer.

Failure modes:

Which customers have bought from us more than once? - The same customer exists under three different names across your systems. Each looks like a new customer.
What is our total revenue from this client across all their sites? - Each site was set up as a separate customer record. There's no link between them.
How many active customers do we actually have right now? - Your CRM has 340 records but dozens are duplicates with different IDs. The real number could be 280. You don't know.

2. Types: how data stays consistent

Every column in every system should have a defined, enforced format. A date field contains only dates. A number field contains only numbers. A status field only accepts values from a fixed, agreed list. In technical circles, these constraints are called data types. With them, your data stays clean at the point of entry - before it ever reaches a report.

What good looks like: In your CRM spreadsheet, your deal stage field accepts exactly four values: Prospect, Qualified, Proposal, Closed. Nobody can type anything else.

The reality in most businesses: Over three years, your team has typed "closed - won", "Closed Won", "CW", "won!", and "yes" into the same field. Your pipeline report now has eleven stages. Five of them mean the same thing.

Failure modes:

Which deals closed in Q3? - Half your team typed the close date as text ("July 2024") rather than a date field. The system can't filter or sort it.
How many leads are currently in the proposal stage? - Your pipeline has "Proposal", "proposal sent", "PROPOSAL", and "Sent Proposal" as separate stages. The query only catches one of them.
What percentage of jobs were completed on time last month? - Completion status was entered as free text. "Done", "complete", "finished", "yes", and "✓" all mean the same thing. None of them group together.

3. Raw vs. processed data: protecting the original record

Ideally, the data your systems output is never directly touched by human hands. Any cleaning, correcting, or transforming happens in a separate layer, on top of the original. In technical circles, this separation is called the distinction between raw and processed data. With it, you always have something to go back to.

What good looks like: You export a report from a system into Excel. Nobody edits that export directly. You open another tab that reads from the raw data and outputs a new dataset. The raw data is always kept separate. The source is always intact.

The reality in most businesses: Someone edits a revenue figure directly in the table that the system outputted - "just to fix a typo". The original value is gone. When the team doesn’t agree with the answer that is produced at the end of the spreadsheet, they can’t easily draw a line from the upstream system, to the wrong answer.

Failure modes:

What did our revenue look like before we corrected for that bad batch of invoices? - Someone edited the figures directly in the source table. The original values are gone.
Can we prove to the auditor what our figures were at year end? - The data has been modified since then and there's no log of what changed or when.
When did this transaction actually occur - and who changed it? - There's no audit trail. The record just shows the current value with no history.

4. Data ownership: who is accountable when systems disagree

Each dataset should have a named person responsible for its accuracy. When two systems show different values for the same thing, there should a clear rule for which system wins, and a clear person whose job it is to resolve the discrepancy. Without this, nobody fixes anything because it is always someone else's problem.

What good looks like: It is agreed that the CRM owns customer status. If the billing system shows a different status, the CRM is right, and the billing system gets corrected. One person is responsible for making sure that happens.

The reality in most businesses: The CRM says a customer is active. The billing system says they cancelled three months ago. Both teams are aware. Neither feels it is their problem to resolve. The discrepancy sits there for another quarter.

Failure modes:

Is this customer still active or have they churned? - The CRM says active. The billing system shows no payment in six months. Nobody knows which to trust and nobody has been asked to decide.
What is the correct address on file for this client? - Three systems have three different addresses, all updated at different times by different people. None is designated as the master.
Which system do we trust when the CRM and the billing tool disagree? - There is no agreed rule. Different people in different teams are working from different answers.

5. Single source of truth: one number you can defend

Ideally, for each type of entity or metric that matters to your business, there is one agreed-upon definition. When someone asks for the revenue figure, there is one place to look and one answer to give.

What good looks like: Finance owns the revenue number. They have made the decisions about when revenue is recognised, what counts as revenue, and have written this down. If a dashboard number differs from the finance system, the dashboard is wrong - full stop.

Everyone in the business knows this. The board pack number is never questioned.

The reality in most businesses: Finance pulled from Xero, sales pulled from the CRM, and ops pulled from the job management tool. All three are different. The debate about which is right takes longer than the board meeting itself.

Failure modes:

Are we on track against target or not? - Depends who you ask. The sales team says yes. Finance says no. Both are using data from the same period.
How many open orders do we have right now? - Your order management system and your CRM have different counts and nobody has established which one is authoritative.

6. Grain: what one row actually means

Ideally, every table in your data has a clear, consistent definition of what a single row represents. One row is one invoice. Or one row is one line item. Or one row is one day's summary. Not a mixture. In technical circles this is called the grain of a table. Without it, any aggregation - a sum, an average, a count - can silently produce the wrong answer.

What good looks like: Your sales table has one row per invoice line item. When people sum the revenue column they know they are getting total line-item value, not total invoice value. They know they need to expect multiple line items in an invoice. Reports built on the table are consistent because the foundation is consistent.

The reality in most businesses: Some rows are whole invoices, some are line items, and some are monthly summaries - all in the same table, with no column to tell them apart. Summing the revenue column double and triple counts without any warning.

Failure modes:

What is our average deal size? - Some rows in your deals table are individual line items, some are whole orders. Averaging the revenue column mixes the two and the result is meaningless.
How much did we spend with this supplier last year? - Your purchase table has both invoice-level and payment-level rows mixed together. Summing it double counts everything paid in instalments.
What is our revenue per customer? - Your revenue table is at transaction line level but your customer table is at account level. Joining them without accounting for grain inflates every customer's revenue figure.

7. Lineage: where does this number come from

Ideally, every figure in every report can be traced back to the original records it was built from. If a number looks wrong, you can follow the chain back - from dashboard to source system - and either verify it or find where it broke. In technical circles this is called data lineage.

What good looks like: A revenue figure in your monthly management meeting links to a specific set of invoices in your accounting system. If someone questions it, you can open those invoices and show them exactly where the number comes from.

The reality in most businesses: The dashboard was built by a contractor two years ago. The logic lives in an undocumented spreadsheet that feeds into another spreadsheet. Nobody currently in the business knows how the number is calculated.

Failure modes:

Why is this month's revenue figure different from what we reported last week? - The Excel spreadsheet is built on a custom report from your ERP system, but you didn’t know that and used another export.
Where does this number in the dashboard actually come from? - It was built by a contractor two years ago. The logic is buried in an undocumented spreadsheet that feeds into another spreadsheet.
Can we trust this report enough to share it with an investor? - You believe it is right but you cannot prove it. If an investor asks you to walk them through the calculation, you cannot.

8. Timestamps and history: knowing what changed and when

It’s important that key properties (e.g. deal stage in the CRM) capture when it was created and when it last changed. In an ideal world, previous versions of a record are preserved rather than overwritten — so you can reconstruct what your data looked like at any point in the past. Without this, your data only ever tells you about right now.

What good looks like: Your CRM stores the date a deal moved into each stage. If you want to know what your pipeline looked like on the 1st of last month, you can reconstruct it exactly. You can calculate how long deals typically sit at each stage and where they tend to stall.

The reality in most businesses: Your database only holds the current state of each record.

When a status changes, the old value is overwritten. The history is gone. You can see where things are, but not where they have been.

Failure modes:

How long does it typically take us to close a deal? - Your CRM only stores the current stage. There is no record of when each deal moved between stages, so you cannot calculate time in stage.
What did our customer base look like six months ago? - Your database only holds current state. Records that were deleted or updated have no history. The snapshot does not exist.
When did this client first start having issues - and did we spot it at the time? - Support tickets have no created date, just a last-updated date. You can see the current state of the problem but not when it started or how it evolved.

9. Null vs. zero vs. unknown: three things that look the same but aren't

Your data should distinguish clearly between three different situations: a value that is zero, a value that was never entered, and a value that exists but was not captured. In technical circles, an empty field is called a null. It is not the same as zero. Confusing them corrupts aggregations, averages, and trends in ways that are very hard to detect.

What good looks like: A product with no sales this month is recorded as 0. A product that was not tracked this month is recorded as an empty cell (null). Your reporting tool knows to exclude nulls from averages and include zeros - so the average reflects only the products you actually measured.

The reality in most businesses: Untracked products are stored as 0 because it was easier to leave the field blank and have the system default to zero. Every average that touches that data is wrong, and it looks completely normal.

Failure modes:

What is our average revenue per product line? - Products that were never tracked are stored as zero. They are included in the average and drag it down, making your real performers look worse than they are.
Which customers have never bought product X? - Customers with no purchase record are indistinguishable from customers whose data was never imported. You cannot tell the difference between "never bought" and "we don't know".
What is our no-show rate for booked appointments? - Appointments where the outcome was not recorded are treated as zeros rather than unknowns. Your no-show rate looks artificially low because the denominator includes appointments you have no data on.

10. Metadata: the key to unlocking unstructured data

Ideally, every unstructured asset your business holds - an email, a PDF contract, a call recording, a scanned invoice - has a structured set of labels attached to it: who it relates to, when it was created, what type of document it is, which job or customer it belongs to. In technical circles, this is called metadata - data about data. Without it, your unstructured files are a pile. With it, they become a searchable, usable archive connected to the rest of your business data.

What good looks like: Every contract in your system is tagged with a customer ID, a start date, a contract type, and a status. You can instantly find all contracts up for renewal in the next 90 days, or pull every document related to a specific client, without opening a single file.

The reality in most businesses: Contracts live in a shared drive organised by folder and filename. Finding everything related to a specific customer means searching by name and hoping the file was named consistently. There is no link between the document and the customer record in your CRM.

Failure modes:

Which customers have a contract expiring in the next 90 days? - Contracts are stored as PDFs in a folder. There is no expiry date field attached to them. Someone has to open each one to find out.
What did we agree with this client in our last three calls? - Call recordings exist but are not tagged to a customer record. Finding the right ones means scrolling through a chronological list and listening.
Have we received a signed contract from every new customer onboarded this year? - Documents are stored by filename with no consistent naming convention and no link to your onboarding records. You cannot run that check without doing it manually.

Your job as a leader is to know and use these concepts

Most business data is broken because either:

Nobody stepped back and designed the way data is captured and stored; or
People didn’t have the discipline to keep the data clean

Your job as a business leader is to play both of these roles. Ask the difficult questions when a system is being implemented. Call people out when they don’t keep their data clean. Build a culture of treating data as an asset, rather than an afterthought.

Opbox applies these concepts automatically

When you connect your business systems to Opbox, the data fundamentals are handled for you. The concepts that take most businesses years to get right — consistent references across systems, clean field types, protected source data, trustworthy aggregations — are applied automatically as part of building your single source of truth. You get a data foundation you can actually rely on, without needing a data team to build and maintain it.

10 things every SME owner needs to know about data

What is data?

Data without meaning is useless

Your data is certainly imperfect

1. Keys: how data gets connected

2. Types: how data stays consistent

3. Raw vs. processed data: protecting the original record

4. Data ownership: who is accountable when systems disagree

5. Single source of truth: one number you can defend

6. Grain: what one row actually means

7. Lineage: where does this number come from

8. Timestamps and history: knowing what changed and when

9. Null vs. zero vs. unknown: three things that look the same but aren't

10. Metadata: the key to unlocking unstructured data

Your job as a leader is to know and use these concepts

Opbox applies these concepts automatically

Comments