Roger Ferrod, Federica Cena, Luigi Di Caro, Dario Mana, Rossana Simeoni

 

Objective

Dialogue analysis and classification

We propose to automatically learn user’s features directly from the dialogue with the chatbot, in order to enable the adaptation of the response accordingly and thus improve the interaction with the user. We introduce a vocabulary-centered model joint with a Deep Learning method for the automatic classification of the users expertise in the telco domain (technical and commercial level). 

Teaser video

Real-time detection of user expertise  analysing the linguistig features of the language.

 
examples.png

 

Method

Vocabulary-centered Deep Learning method

Since the existing telco-oriented vocabularies only specify general and context-independent words, we manually constructed – in collaboration with domain experts – an ad hoc sentence-based annotation of 5715 terms over the 4 levels of expertise defined in below.

 

Screenshot_598.png

By observing the distribution obtained, a stronger co-occurrence of terms belonging to adjacent levels is noticeable.

 

lattice.png

Lattice of the level itemsets with the corresponding number of occurrences. The level-based lattice represents a hierarchically-ordered space of all combinations of the level itemsets and their frequencies.

 

State of the art Deep Learning method

Our task can be traced back to the sequence labeling type (e.g, Named Entity Recognition, Semantic Role Labeling, etc.). In our case, however, labels represent levels of expertise.The biLSTM-CRF model exploits the well known predictive abilities of CRFs (Conditional Random Fields) to improve the classification of tags (current state of the art in the Named Entity Recognition task).

 

net.png

We search for the optimal sequence of tags (levels) that maximizes the log-likelihood. This latter step can be easily accomplished by using a dynamic programming approach, in our case the Viterbi's algorithm. The biLSTM-CRF architecture is able to model both the input features (that is the semantic representation of the text) and the corresponding best tag sequence. Since there is bidirectionality in the LSTM layer, the CRF model is also able to exploit this knowledge (past and future tags) improving performance.

We assume that given a sequence of various different tags, the overall expertise is the maximum value of expertise in the set of all tags. We have also provided a purity score, to be associated with expertise, in order to evaluate the goodness of the overall level. The score is based on entropy and it is useful, in a production environment, to trigger different processes of revision and integration, especially in the cases of low values.

 

Experiments & Results

The messages correspond to 38290 words, whose distribution is the following:

  • 4.72%   level 0
  • 6.95%   level 1
  • 3.15%   level 2
  • 0.1%     level 3
  • 85.07% outsiders

Considering the excessive imbalance of the dataset and the problems that may arise from the under-representation of level 3, we decided to modify the distribution of classes by oversampling the minority class and balacing in this way the dataset.

 Screenshot_599.png

 

Screenshot_600.png

 

Conclusion & Future works

We tested the model with +5000 real messages, reaching good accuracy levels, thus demonstrating the feasibility of approach. The approach will be used to automatically adapt the dialogue to user’s expertise, for example providing explanations of concepts in relation to the level of knowledge, changing the style and terminology accordingly, or to request – if necessary – the intervention of a real person.

 

Among all future steps, we first plan

  1. to consider other markers such as the presence of lexical and syntactical errors, anthropomorphization of the chatbot and deictic references;
  2. to replicate the experiment with a larger training set;
  3. to consider more complex neural mechanisms (e.g., attention based);
  4. to integrate our model with Angie and implement the personalisation of the dialogue, both in the choice of the words used and the type of explanation provided, testing the improvement of the interaction with users;
  5. to carry on a user study to compare the expertise derived from our model with the actual user expertise.