I have been dealing with a lot of JSON data more than ever before. All of the data are coming from a Webhook provider upon certain events. One of the things I was not comfortable with was the nested JSON structure until I found there is a clean way to solve it. Our Django app uses Django REST Framework (DRF) and relies heavily on its serialization capabilities.

What are we going to see

  • Looking at the data (JSON/dict)
  • Looking at the model where the data is about to be stored
  • Writing DRF serializer to handle the nested data
    • More details of how its done using to_internal_value()
  • conclusion

A sample JSON data for context

This is a webhook event when a charging session is started and stopped. Looking at the JSON (data) itself - it has a max depth of 5.

{
    "_id": "ae1f90e0910ca",
    "data": {
        "_id": "5afca099e0445dc",
        "configurations": {
            "include_raw_energy_report": false,
            "include_id_tag": false
        },
        "user": "dc09a8ff10d2",
        "chargestation": "2cce1e46865115",
        "connector": "85946dc6ee4d29",
        "status": "Ended",
        "rate": "fca0bc98e8f55",
        "metrics": {
            "chargingStart": "2024-05-06T10:19:13.834Z",
            "meterStart": 60171,
            "timezone": "America/Los_Angeles",
            "chargingStop": "2024-05-06T10:24:52.879Z",
            "meterStop": 62525,
            "energyPeriod": 339.045,
            "energyConsumed": 2354
        },
        "energy_report": {
            "current": {
                "value": 0,
                "unit": "A"
            },
            "power": {
                "value": 0,
                "unit": "kW"
            },
            "energy_meter": {
                "value": 60171,
                "unit": "Wh"
            },
            "state_of_charge": {
                "unit": "Percent",
                "value": "0.00"
            },
            "timestamp": "2024-05-06T10:24:33.959Z"
        },
        "charging_activity": [
            {
                "startTime": "2024-05-06T10:19:13.834Z",
                "endTime": "2024-05-06T10:24:52.879Z",
                "status": "Charging"
            }
        ],
        "cost": {
            "amount": 0.7,
            "currency": "usd",
            "breakdown": [
                {
                    "type": "TIME",
                    "period": {
                        "startTime": "2024-05-06T10:19:12Z",
                        "endTime": "2024-05-06T10:24:50Z"
                    },
                    "rateName": "Demo Rate",
                    "quantity": 0.094,
                    "unit": "hours",
                    "unitPrice": 2,
                    "beforeTax": 0.188,
                    "tax": 0,
                    "totalTax": 0,
                    "totalCost": 0.188
                }
            ]
        },
        "createdAt": "2024-05-06T10:19:13.889Z",
        "updatedAt": "2024-05-06T10:24:53.078Z"
    },
    "object": "session",
    "type": "session.ended",
    "createdAt": "2024-05-06T10:24:53.403Z"
}

The values of those fields are not important but the structure of the JSON is what we are interested in. Because, the fields that we want are inside the main branch.

Now that we have the JSON data that we’re dealing with, lets look at the model where this data is eventually going to be stored.

The Django Model to store the payload

Since we are seeing a charger event, we have a table Session to store them.

class Session(models.Model):
    """To dump global session data across all APIs"""

    charger = models.ForeignKey(
        Chargers, on_delete=models.CASCADE, related_name="sessions", db_index=True
    )
    session_id = models.CharField(max_length=50, null=True, blank=True)
    connector = models.CharField(max_length=50, null=True, blank=True)
    status = models.CharField(max_length=20, null=True, blank=True)
    event_time = models.DateTimeField(null=True, blank=True, default=None, db_index=True)
    cost = models.JSONField(null=True, blank=True)
    metrics = models.JSONField(null=True, blank=True)
    energy_report = models.JSONField(null=True, blank=True)
    charging_activity = models.JSONField(null=True, blank=True)

    def __str__(self):
        return f"{self.session_id}"

    class Meta:
        verbose_name_plural = "Session"
        verbose_name = "Session"

Again, the fields are just for the context. Since we saw how the data looks, we would like to store some of the fields into its own columns so that we can query with by it later

Restruct nested data - to_internal_value()

I still would like to take advantage of the serializers.ModelSerializer. Its easy and straight as long as it gets the data as expected.

As i mentioned, the idea is to pop out what we need from the whole nested data and give it to the serializer as if it were a flat dict.

Here is the thing - there is no straight way to do this - nor DRF supports it out of the box like magic. Simply because it does not know where/how to find what we are looking for since its nested.

But we need to transform the data into a flat dict before we pass it to the create() method. The part where the transformation takes place before we pass it to the serializer validation is what we are doing in the to_internal_value() method.

DRF serializer has a method to_internal_value().

From the doc, it says:

Takes the unvalidated incoming data as input and should return the validated data that will be made available as serializer.validated_data. The return value will also be passed to the .create() or .update() methods if .save() is called on the serializer class.

Here’s a flow diagram itself:

+---------------------+
|   Incoming JSON     |
|   (nested data)     |
+----------+----------+
           |
           ▼
+----------+----------+
| serializer is called|
+----------+----------+
           |
           ▼
+----------+----------+
| to_internal_value() |
| (Transform/Raw Data)|
+----------+----------+
           |
           ▼
+----------+----------+
|   serializer.       |
|   validated_data    |
+----------+----------+
           |
           ▼
+----------+----------+
| create() or update()|
+----------+----------+
           |
           ▼
+---------------------+
|   Database Save     |
+---------------------+

In short:

  • The input to to_internal_value() is an unvalidated data (our data which has nested data - can’t be directly given to the Serializer).
  • The output: Validated, Python-native data (stored in serializer.validated_data).
  • to_internal_value() does:
    • Parse/transform raw input (e.g., JSON → Python types).
    • Validate field-level data (e.g., check if a field is required).
    • Modify incoming data before it reaches create()/update().

Writing the serializer itself to handle nested data

Okay, now what we know the nested data can be transformed in the to_internal_value(), the method itself sits inside the DRF serializer.

As you saw the Session Model above - there are some fields i would like to pop out of this, since they are stored in its own column.

For this, we consider the structure of the data - we simply use the get() if its not nested. Else, get_nested_value() handles the nested data (will explain this below).

class SessionSerializerV2(serializers.ModelSerializer):
    """eDRV Session data serializer"""

    def to_internal_value(self, data):
        """Modifies incoming data (like a tranform function) and passes to 'create' method"""
        event_time = data.get("createdAt")
        session_id = get_nested_value(data, ["data", "_id"])
        connector = get_nested_value(data, ["data", "connector"])
        status = get_nested_value(data, ["data", "status"])
        cost = get_nested_value(data, ["data", "cost"])
        metrics = get_nested_value(data, ["data", "metrics"])
        energy_report = get_nested_value(data, ["data", "energy_report"])
        charging_activity = get_nested_value(data, ["data", "charging_activity"])
        charger = data.get("charger_id")  # FK to charger

        internal_value = super().to_internal_value(
            {
                "session_id": session_id,
                "connector": connector,
                "status": status,
                "event_time": event_time,
                "cost": cost,
                "metrics": metrics,
                "charger": charger,
                "energy_report": energy_report,
                "charging_activity": charging_activity,
            }
        )

        return internal_value

    class Meta:
        fields = "__all__"
        model = Session

The final output of to_internal_value() is that flat dict that we need to pass to the serializer before the is_valid(). So, we pop out our fields and create the final dict to be returned.

Clean way to get fields from a nested dict/list[dict]

Now, lets do some Python..

I did not really enjoy using just the get() to extract field values since its nested.

In the above serializer, you’ve noticed that i use get_nested_value. This is a method that i wrote to make it easy to get whatever we need regardless whether its a dict or a list.

def get_nested_value(data: dict, keys: list):
    """
    Helper function to get nested value from a dictionary given a list of keys.
    Returns None if any key in the path doesn't exist.
    """
    try:
        for key in keys:
            if isinstance(data, dict):
                data = data.get(key)
                if data is None:
                    return None
            elif isinstance(data, list) and len(data) > 0:
                data = data[0].get(key) if isinstance(data[0], dict) else None
            else:
                return None
        return data
    except (AttributeError, TypeError, IndexError):
        return None

It takes 2 args. The data (json) itself and the path to find it.

Let’s consider this line from the serializer. cost = get_nested_value(data, ["data", "cost"])

Here, i’m passing the data and the path to find the cost. The keys are/is the path to find the cost.

I suggest you run this function alone with the data we have above: get_nested_value(data, ["data", "cost"]). This way you will understand the job of this function and how it works.

The basic idea is - if its a type(dict), we get it using the provided key. If its a dict inside a list, then we presume that we need the 0th dict, and we call the get() with the key.

What we finally want is a value with a key from a dict. get_nested_value() takes the list(key) to get to the value.

That is clean. But it can only take the 0th dict from the list. For me, this was enough.

Some final words

If you’re not using DRF, you would still write a helper function which does the transformation from nested structure to flat dict and pass it to the model instance.

If you ask me why would i write this transformation logic inside the Serializer itself, I would answer it as - to keep the code more cleaner and putting everything related to serializer in one place. I could easily put this method into a helper function and keep my serializer looks simple. The choice is ours.