An auto-discovery pattern: eliminating hardcoded device-to-site mapping

We were trying to solve a fundamental issue in our system. It looks simple when written down, but it didn’t feel simple at all when we first started working on it.

In our system, there are around 50–100 IoT devices constantly pushing data. Each device sends more than 10 messages per second. At any given moment, at least 30 devices are actively publishing telemetry.

So even at a minimum level, the system is always receiving a steady stream of messages.

A typical message is published to the broker on a topic like:

system/locationX/device1

And the payload looks something like this:

{
  "data": {
    "amps_a": 0,
    "amps_b": 0,
    "amps_c": 0,
    "battery_voltage": 12.2,
    "coolant_temp": 0,
    "engine_hours": 485.15,
    "frequency": 0,
    "kvar": 0,
    "kw": 0,
    "kwh": 5869.8,
    "not_in_auto": 0,
    "oil_pressure": -0.72,
    "power_factor": 0,
    "rpm": 0,
    "volts_ab": 0,
    "volts_bc": 0,
    "volts_ca": 0
  },
  "device": {
    "id": "deviceX",
    "serialNum": "serial"
  }
}

The data itself isn’t the problem here. The real issue is where this data is sent and how the system understands what this device belongs to.

For every location, there can be multiple devices. In the example above, I’ve shown just one device in one location, but the real system is much more dynamic.


What’s the problem, really?

Inside the application (our backend system), devices are usually identified using internal identifiers. These identifiers are often database primary keys or some form of system-generated id.

This leads to a few common approaches.

1. ID-based device identification

In this approach, each device needs to know the database id of the site or system it belongs to. That means the device must be synced with information from the backend.

This is effectively a system → device sync.

It works, but it introduces friction:

  • Devices must be provisioned carefully.
  • Any mismatch between database ids and device configuration breaks ingestion.
  • Replacing or resetting devices becomes painful.
  • Environments (dev, staging, prod) need separate configs.

The system becomes tightly coupled to the device configuration.


2. String-based identifiers

Instead of numeric ids, we use string identifiers. These are easier to read and remember, especially when ids are auto-generated.

This is slightly better, but the core issue remains:

  • Devices still need prior knowledge about the system.
  • Configuration mistakes still happen.
  • The mapping logic still lives on the device side.

So while this improves readability, it doesn’t really solve the problem.


Rethinking the problem

Instead of asking “How does the device know where to send data?”, we flipped the question.

What if the device doesn’t need to know anything at all?

Devices already publish messages. They already have topics. They already send identifiers like serial numbers or device ids.

So we let devices publish messages as they normally would.

On the system side, the MQTT client simply listens to everything and writes all incoming topics into a table. At this stage, the system does not try to understand or process the data fully.

This becomes an auto-discovery phase.

The system passively discovers:

  • new devices
  • new topics
  • last seen timestamps
  • sample payloads

No hardcoded mapping. No pre-configuration.


Assigning devices to sites

Once topics are discovered, we still need to answer one question:

Which site does this device belong to?

This is where a human-in-the-loop step makes sense.

Newly discovered devices start in an unmapped state. They appear in an admin dashboard where someone can inspect them and decide what to do.

The lifecycle looks like this:

  • Unmapped

    • Device is discovered.
    • Messages are ignored.
    • Device appears in the admin UI.
  • Mapped

    • Admin assigns the device to a site.
    • Messages start getting processed.
    • Data is stored normally.
  • Ignored

    • Device is marked as test, invalid, or noise.
    • Messages are discarded.

This keeps ingestion safe by default and avoids accidental data pollution.

stateDiagram-v2
    [*] --> Unmapped: New device discovered

    Unmapped --> Mapped: Admin assigns to site
    Mapped --> Unmapped: Admin removes mapping
    
    Unmapped --> Ignored: Mark as test device
    Ignored --> Mapped: Re-enable & map

    note right of Unmapped
        Messages ignored
        Device appears in
        admin dashboard
    end note

    note right of Mapped
        Messages processed
        Data saved
    end note

    note right of Ignored
        Messages discarded
    end note

The state flow looks like this:

  • New device → Unmapped
  • Unmapped → Mapped (admin action)
  • Unmapped → Ignored
  • Ignored → Mapped (if re-enabled)

erDiagram
    DEVICE_MAPPING ||--o| SITES : "maps to"

    DEVICE_MAPPING {
        string device_id PK "sensor_alpha_42"
        int site_id FK "147"
        enum status "mapped | unmapped | ignored"
    }

    SITES {
        int site_id PK
        string name
    }

Data model (simple and intentional)

At the database level, the mapping is straightforward.

Each device has:

  • a device identifier
  • an optional site id
  • a status (mapped / unmapped / ignored)

The site table remains unchanged.

This separation makes it clear:

  • devices exist independently
  • mapping is an explicit decision
  • processing depends on status

Why this worked better for us

This approach removed a lot of hidden assumptions:

  • Devices no longer depend on database ids.
  • Provisioning became easier.
  • New devices can be added without touching backend configs.
  • Mistakes are visible and reversible.
  • Test devices don’t interfere with production data.

Most importantly, the system became observable first, instead of configured first.


Closing thoughts

This pattern won’t eliminate all complexity, but it moves it to a place where it’s easier to manage — the system side, not the device side.

By separating discovery from identity assignment, we made the system more flexible and safer to operate at scale.