Deep Learning on Structured Data: part 1

  • Keras-based — I thought that Keras would be the right framework for what I wanted to do because it’s widely used (and thus has a large community contributing answers & ideas) and at the Goldilocks level of abstraction. Tensor Flow was more complexity than I wanted to stomach at this stage, while more abstracted frameworks (like the fast.ai library featured in version 2 of Jeremy Howard’s Deep Learning course) had too much black box.
Table in -> deep learning result out
  • Table in -> deep learning result out. I was looking for a working, end-to-end example that started with structured data as input and output a useful result from a deep learning framework
  • Deal with three classes of data: continuous values (like elapsed time or temperature); categorical values (like country names or days of the week); and text. In particular, I wanted an example that would show how to deal with embeddings for categorical values and text.
  • start with a subset of features, excluding text features and categorical features, to work out kinks in the data, such as columns that I assumed to be numeric including strings
  • introduce categorical features with embeddings
  • introduce text features with embeddings
  • add additional input data to get the corpus up to close to 1 million records
A slow start on validation accuracy
  • Problems with the input data manifested themselves in unexpected ways. For example, I spent a couple of days struggling with the type of one of the features before I realised that commas in 3 out of the 180 k input records set the column type to string in the resulting pandas dataframe.
  • Don’t make assumptions about what values a column can contain. I did several iterations of the model before I realised that a type error meant that almost all the label (TTR) values were being set as NaNs. I also wasted several iterations because I assumed that one of the features (initial ticket severity) would always be 1, 2, 3, or 4, when in fact the input data also included values of ‘1’, ‘2’, ‘3’, and ‘4’!
  • Tuning hyperparameters helped out until it stopped helping out. I learned a lot about the impact of learning rate, dropout rate, lambda for regularization etc. by varying each in isolation, and by trying out different optimizers (Adam & SGD). However, a scattershot approach to varying hyperparameters didn’t get me any better validation accuracy after a few leaps.
A visualization of the model — not as complicated as it looks!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mark Ryan

Mark Ryan

Technical writing manager at Google. Opinions expressed are my own.