Deep Learning on Structured Data: part 3

Over the course of 2018 I’ve made several attempts to apply what I have learned about deep learning to everyday problems in my job. This article describes how I applied a simple Keras model to solve the problem of predicting (and preventing) a bane for Db2 clients and Support managers: Duty Manager calls.
This article is part of a series describing my experience applying deep learning to problems involving tabular, structured data. If you are interested in an end-to-end examination of this topic involving an open data set, I am writing a book for Manning Publications, Deep Learning with Structured Data, an Early Access version of which is available online now. You can use the discount code fccryan to save on the cost of access to this book.
During the normal course of a Db2 support engagement, the client communicates with the Db2 support analyst working on the problem via a ticket. However, when there is a breakdown in communication, or when the impact of a problem becomes particularly severe, a client can request to speak to the Duty Manager. There is an active leader in the worldwide team acting as Duty Manager every hour of every day, 365 days a year. When the leader who is acting as Duty Manager gets a call, he or she calls the client back as quickly as possible and ensures that actions are taken to resolve the issue.

Duty Manager calls are inconvenient for clients and time-consuming for the Db2 team. If we could predict Duty Manager calls for active tickets, we could take proactive steps (such as pre-emptive calls to the client) to prevent the crisis before it happens.
Taking the same approach that I described in Deep Learning on Structured Data Part 1 and Part 2, and careful to avoid the Accuracy Paradox, I applied a simple deep learning model to historical Db2 ticket metadata to predict whether a given ticket would result in a Duty Manager call. You can see the complete notebook here.
To start with, we pull in two CSV files into dataframes, one containing summary information for all tickets that resulted in Duty Manager calls (dm_cases) and one containing complete metadata for all tickets (merged_data). Next, we join the two dataframes to get a target column in merged_data:
Next, for the target column, replace nulls with zeros:
Define the categories of columns:
- textcols: for text columns
- continuouscols: for columns with continuous (numeric) values
- collist: for columns with categorical values
From this point on, we iterate through the columns by type to replace missing values, encode text and categorical values, build the Keras input structure, and build the layers of the model. Maintaining these lists of columns by type makes the code flexible — it’s easy to add or drop features without breaking the code. In fact, the code for this problem was adapted from the code for the TTR problem (described in Deep Learning on Structured Data Part 1 and Part 2) with only modest updates despite the fact that the input data structure was significantly different for the two problems.
Replace missing values:
For columns with categorical values, define a list of label encodings:
Now the categorical values have been encoded:

Encode the text columns:
Sample text before encoding:
6238 C 5/24 db2 mpp v2 on cloud instance hung. RCA ...
1553 SQL0551N from db2look after restoring db from ...
6988 Proactive PMR
3112 GCGTSC@TM: Tracking DB2 deadlock
1357 DB2 WLB is not working correctly
6122 DPL PASSED
338 C-6/04-(0)-Poor and slow compression with LOGA...
3903 GCGTSC@ZY:Db2:instance crash
2557 HADR Error
1648 Uso da API "Entity Framework", da IBM, para ac...
Name: SUBJECT, dtype: object
and after encoding:
6238 [40, 8, 464, 1, 2889, 488, 4, 927, 35, 147, 15...
1553 [1152, 27, 440, 24, 546, 22, 27, 688, 59]
6988 [393, 135]
3112 [3, 201, 2890, 1, 429]
1357 [1, 1582, 14, 13, 107, 1153]
6122 [1336, 1154]
338 [40, 197, 97, 57, 841, 16, 72, 689, 12, 2000, 4]
3903 [3, 23, 1, 35, 50]
2557 [20, 9]
1648 [2891, 1155, 583, 1583, 1038, 1155, 45, 547, 2...
Name: SUBJECT, dtype: object
Build the Keras input structure:
In the model definition, build up Keras layers by column type (categorical, text, and continuous). Categorical inputs get embeddings and batch normalization. Text inputs get embeddings, batch normalization, and dropout.
Next the layers are concatenated together:
Then we define the output layer, the overall model and the optimizer. Finally, we use the optimizer to compile the model.
I experimented with various optimizers in the previous TTR prediction project and SGD produced the most stable results, so I stuck with it for this project.
Fit the model:
A key detail of the fit statement is the definition of class_weight.
class_weight = {0 : zero_weight, 1: one_weight}
Using the weights defined at the beginning of the notebook:
zero_weight = 1.0
one_weight = 72.8
These weights were calculated using the ratio of Duty Manager calls in the input data. By fitting the model with these weights we avoid the Accuracy Paradox, as shown by the confusion matrix for a 10 epoch run:

On a 10 epoch run we get to 77% validation accuracy with the prospect of improved accuracy for a longer run:

This example shows it’s possible to get decent results applying a simple Keras model to structured data. By categorizing the features as text, categorical, or continuous, we make it easy to write robust code to prepare the data and define the model. By using class_weight when we fit the model, we can avoid the Accuracy Paradox and get useful predictions.