- 06 Apr 2022
- 4 minutes to read
How does the recommendations engine work?
- Updated on 06 Apr 2022
- 4 minutes to read
The recommendations engine is coded in Python and is outlined in the flowchart below.
Each block of this flowchart is described in detail below.
The CSV exporter block is coded in PHP which reads user, content and interactions (user - content) data per tenant from the Totara database. The data is then saved as CSV files in the directory specified by the Site Administrator on the Recommender engine page (Quick-access menu > Plugins > Machine learning settings > Recommender engine) when the machine learning plugins are enabled.
The user and content data files include metadata of the users and the content, respectively while the interactions data is a record of whether a user has interacted positively with the content or interacted leisurely with the content and the time when the interaction happened.
The user metadata consists of
- ID in the database
- City/town (free text)
- Aspiring position
- Current competencies scale
- Profile description (free text)
The content metadata consists of
- Content type (one of course, workspace, article, micro learning article, and play lists)
- Text description (free text)
The interactions data consists of
- User ID
- Content ID
- Interaction value (0 or 1)
- Time of interaction
This block is a Python class that reads the CSV files for each tenant at a time and pipes it for further processing.
This is a decision block that is set by the Site Administrator from the Recommender engine page (Quick-access menu > Plugins > Machine learning settings > Recommender engine) when the machine learning plugins are enabled. The user can choose one mode from the following:
- Matrix Factorisation
- Partial Hybrid
- Full Hybrid
Depending on the recommendation mode selected by the Site Administrator, one of the data processors transforms the data into the compatible form that can be consumed by the subsequent process. The output of each of the data processors is in sparse matrices form, so that the memory is used efficiently.
Collaborative data processor
This block ignores the user and content data and transforms the interactions data into a format that the subsequent modules can consume.
Partial data processor
This block uses the user and content metadata as well as the interactions data and transforms it for consumption in the subsequent process. This block ignores only the free text fields of user (city/town and profile description) and content (text description) data.
Full data processor
This utilises all the data fields in the user, content, and the interactions data including the free text fields of the user and content data. The free text fields are passed through the Natural Language Processing pipeline where the text is cleaned, lemmatised (if possible) and then converted to a matrix of TF-IDF features. The data sets are then transformed into a compatible form so that these can be consumed for subsequent processing.
Depending on the recommendation mode selected by the Site Administrator, either the matrix factorisation (which is a sub-class of the collaborative filtering approach) or the content-based filtering approach is used for building the machine learning model for recommendations.
Matrix factorisation model
The model will be built using the standard matrix factorisation approach if the Site Administrator has chosen the matrix factorisation mode. During this process the model hyper-parameters, the lateral dimension and the number of epochs are tuned using the past interactions data of the users with the content. The final model is then built using the tuned hyper-parameters and forwarded to the next stage.
Content-based filtering model
If the administrator selects the partial hybrid or the full hybrid modes, the content-based filtering algorithm is used to build the model. The data input for this algorithm includes users' and items' metadata. The class of the modelling algorithm used is implemented via the LightFM library and is described in Maciej Kula, 2015. Again the hyper-parameters are tuned using the past interactions of the users with the content and the provided metadata of the users and the content. After which the final model is built using the tuned set of these hyper-parameters. Note that this algorithm accepts data from either of the Partial data processor and Full data processor blocks which means it can accept and use the processed Natural Language Processed data as well.
Depending on the Site Administrator's settings at Quick-access menu > Plugins > Machine learning settings > Recommender engine, this block uses one of the models built in the previous section to predict:
- A list of similar content based on patterns of how users interact with the content (and the metadata of the content if available when either partial hybrid or full hybrid mode is selected).
- A list of content for each user that is likely to be of interest for that user.
The amount of similar content produced for each content type can be set by navigating to Quick-access menu > Plugins > Machine learning settings > Recommender engine. The content is sorted in descending order by the cosine similarity score of each item with the given content.
The amount of recommended content for each user is also determined by the Site Administrator's settings. The recommended content for each user is sorted in descending order by the prediction score. The prediction scores (or rankings) themselves are not interpretable, they are simply a means of ranking the items. The order of the prediction scores is important though - content with higher prediction scores is more likely to be of interest to the user than content with lower prediction scores.
Both the outputs from the Obtain recommendations block (the list of similar content for each content, and the list of recommended content for each user) are written as CSV files by this block. The files are written in the same directory where the input datasets (in the form of CSV files) were stored and were set by the Site Administrator.
This block reads the CSV files written by the CSV writer block and loads the data into the Totara database relevant tables for each tenant.
© Copyright 2023 Totara Learning Solutions. All rights reserved.