API#
Core functions#
fit_transform#
- fit(sessions, items=None, number_of_recommendations=5, number_of_neighbors=10, sampling_strategy='common_items', sample_size=1000, weighting_func='linear', ranking_strategy='linear', return_events_from_session=True, required_sampling_event=None, required_sampling_event_index=None, sampling_str_event_weights_index=None, recommend_any=False)[source]
Sets input session-items and item-sessions maps.
- Parameters:
- sessionsDict or Sessions
>>> sessions = { ... session_id: ( ... [ sequence_of_items ], ... [ sequence_of_timestamps ], ... [ [OPTIONAL] sequence_of_event_types ], ... [ [OPTIONAL] sequence_of_event_weights] ... ) ...}
- itemsDict or Items, optional
If not provided then item-sessions map is created from the sessions parameter. >>> items = { … item_id: ( … [ sequence_of_sessions ], … [ sequence_of_the_first_session_timestamps ] … ) …}
- number_of_recommendationsint, default=5
The number of recommended items.
- number_of_neighborsint, default=10
The number of the closest sessions to choose the items from.
- sampling_strategystr, default=’common_items’
- How to filter the initial sample of sessions. Available strategies are:
‘common_items’: sample sessions with the same items as the input session,
‘recent’: sample the most actual sessions,
‘random’: get a random sample of sessions,
‘weighted_events’: select sessions based on the specific weights assigned to events.
- sample_sizeint, default=1000
How many sessions from the model are sampled to make a recommendation.
- weighting_funcstr, default=’linear’
The similarity measurement between sessions. Available options: ‘linear’, ‘log’ and ‘quadratic’.
- ranking_strategystr, default=’linear’
How we calculate an item rank (based on its position in a session sequence). Available options are: ‘inv’, ‘linear’, ‘log’, ‘quadratic’.
- return_events_from_sessionbool, default = True
Should algorithm return the same events as in session if this is only neighbor?
- required_sampling_eventint or str, default = None
Set this paramater to the event name if sessions with it must be included in the neighbors selection. For example, this event may be a “purchase”.
- required_sampling_event_indexint, default = None
If the required_sampling_event parameter is filled then you must pass an index of a row with event names.
- sampling_str_event_weights_indexint, default = None
If sampling_strategy is set to weighted_events then you must pass an index of a row with event weights.
- recommend_anybool, default = False
If recommender returns less than number of recommendations items then return random items.
- Returns:
- wsknnWSKNN
The trained Weighted session-based K-nn model.
Examples
>>> input_sessions = { ... 'session_x': ( ... ['a', 'b', 'c'], ... [10001, 10002, 10004], ... ['view', 'click', 'click'] ... ) ... } >>> input_items = { ... 'a': ( ... ['session_x'], ... [10001] ... ), ... 'b': ( ... ['session_x'], ... [10001] ... ), ... 'c': ( ... ['session_x'], ... [10001] ... ), ... } >>> fitted_model = fit(input_sessions, input_items)
predict#
- predict(model, sessions, settings=None)[source]
The function is an alias for the WSKNN.recommend() method.
- Parameters:
- modelWSKNN
Fitted WSKNN model.
- sessionsList
Sequence of items for recommendation. It must be a nested List of lists: >>> [ … [items], … [timestamps], … [(optional) event names], … [(optional) weights] … ]
- settingsDict
Settings of the model. It is worth noticing, that using this parameter allows to test different setups. Possible parameters are grouped in the
settings.yml
file under themodel
key.
- Returns:
- recommendationsList
Item recommendations and their ranks.
>>> [ ... [item a, rank a], [item b, rank b] ... ]
- Raises:
- ValueError
Model wasn’t fitted.
batch_predict#
- batch_predict(model, sessions, settings=None)[source]
The function predicts multiple sessions at once.
- Parameters:
- modelWSKNN
Fitted WSKNN model.
- sessionsList[Dict]
User-sessions for recommendations:
>>> [ ... {"user A": [*]}, ... {"user B": [**]} ... ]
where (*) might be:
>>> [ ... [sequence_of_items], ... [sequence_of_timestamps], ... [(optional) event names (types)], ... [(optional) weights] ... ]
- settingsDict
Settings of the model. It is worth noticing, that using this parameter allows to test different setups. Possible parameters are grouped in the
settings.yml
file under themodel
key.
- Returns:
- inferenceList[Dict]
Recommendations and their weights for each user.
>>> [ ... {"user A": [ ... [item a, rank a], ... [item b, rank b] ... ]}, ... {"user B": [...]}, ... ]
Core WSKNN class#
WSKNN#
- class WSKNN(number_of_recommendations=5, number_of_neighbors=10, sampling_strategy='common_items', sample_size=1000, weighting_func='linear', ranking_strategy='linear', return_events_from_session=True, required_sampling_event=None, required_sampling_event_index=None, sampling_event_weights_index=None, recommend_any=False)[source]#
The class represents the Weighted Session-Based k-nn model.
- Parameters:
- number_of_recommendationsint, default=5
The number of recommended items.
- number_of_neighborsint, default=10
The number of the closest sessions to choose the items from.
- sampling_strategystr, default=’common_items’
How to filter the initial sample of sessions. Available strategies are:
'common_items'
: sample sessions with the same items as the input session,'recent'
: sample the most actual sessions,'random'
: get a random sample of sessions,'weighted_events'
: select sessions based on the specific weights assigned to events.
- sample_sizeint, default=1000
How many sessions from the model are sampled to make a recommendation.
- weighting_funcstr, default=’linear’
The similarity measurement between sessions. Available options:
'linear'
,'log'
and'quadratic'
.- ranking_strategystr, default=’linear’
How we calculate an item rank (based on its position in a session sequence). Available options are:
'inv'
,'linear'
,'log'
,'quadratic'
.- return_events_from_sessionbool, default = True
Should algorithm return the same items as in session if there are no neighbors?
- required_sampling_eventint or str, default = None
Set this parameter to the event name if sessions with this event must be included in the neighbors’ selection. For example, an event name may be the
"purchase"
.- required_sampling_event_indexint, default = None
If
required_sampling_event
parameter is filled then you must pass an index of a row with event names.- sampling_event_weights_indexint, default = None
If
sampling_strategy
is set toweighted_events
then you must pass an index of a row with event weights.- recommend_anybool, default = False
If recommender picks fewer items than required by the
number_of_recommendations
then add random items to the results.
- Raises:
- InvalidDimensionsError
Wrong number of nested sequences within session-items or item-sessions maps.
- InvalidTimestampError
Wrong type of timestamp -
int
type is required.- TypeError
Wrong type of nested structures within session-items or item-sessions maps.
- IndexError
Wrong index of event names or wrong index of event weights.
- Attributes:
- weighting_functions: List
The weighting functions:
['linear', 'log', 'quadratic']
.- ranking_strategiesList
The ranking strategies:
['linear', 'log', 'quadratic', 'inv']
.- session_item_mapDict
The map of items that occur in the session and their timestamps, and (optional) their types and their weights. >>> sessions = { … session_id: ( … [sequence_of_items], … [sequence_of_timestamps], … [(optional) event names (types)], … [(optional) weights] … ) … }
- item_session_mapDict
The map of items and the sessions where those items are present, and the first timestamp of those sessions. >>> items = { … item_id: ( … [sequence_of_sessions], … [sequence_of_the_first_session_timestamps] … ) … }
- n_of_recommendationsint
The number of items recommended.
- number_of_closest_neighborsint
See
number_of_neighbors
parameter.- sampling_strategystr
See
sampling_strategy
parameter.- possible_neighbors_sample_sizeint
See
sample_size
parameter.- weighting_functionstr
See
weighting_func
parameter.- ranking_strategystr
See
ranking_strategy
parameter.- return_events_from_sessionbool, default = True
See
return_events_from_session
parameter.- required_sampling_eventUnion[int, str], default = None
See
required_sampling_event
parameter.- required_sampling_event_indexint, default = None
See
required_sampling_event_index
parameter.- sampling_event_weights_indexint, default = None
See
sampling_str_event_weights_index
parameter.- recommend_anybool, default = False
See
recommend_any
parameter.
Methods
fit()
Sets input session-items and item-sessions maps.
recommend()
The method predicts the
n
next recommendations from a given session.set_model_params()
Methods resets and maps the new model parameters.
- fit(sessions, items=None)[source]#
Sets input session-items and item-sessions maps.
- Parameters:
- sessionsDict
The map of items that occur in the session and their timestamps, and (optional) their types and their weights.
>>> sessions = { ... session_id: ( ... [sequence_of_items], ... [sequence_of_timestamps], ... [(optional) event names (types)], ... [(optional) weights] ... ) ... }
- itemsDict
The map of items and the sessions where those items are present, and the first timestamp of those sessions. If not provided then the item-sessions map is created from the
sessions
parameter.>>> items = { ... item_id: ( ... [sequence_of_sessions], ... [sequence_of_the_first_session_timestamps] ... ) ... }
- recommend(event_stream, settings=None)[source]#
The method predicts n next recommendations from a given session.
- Parameters:
- event_streamList, Dict
Sequence of items for recommendation. If list then it is treated as a single recommendation:
>>> [ ... [sequence_of_items], ... [sequence_of_timestamps], ... [(optional) event names (types)], ... [(optional) weights] ... ]
If it is a dictionary then recommendations are done in batch. Every key in a dictionary is user-index, and value is a list with sequence of items, timestamps and optional features.
>>> { ... "user A": [...], ... "user B": [...] ... }
- settingsDict, default = None
Model settings and parameters.
- Returns:
- recommendationsList, Dict
Output for the single input (list): >>> [ … (item a, rank a), (item b, rank b) … ]
Output for the input with multiple users (dictionary): >>> { … “user A”: [(item a, rank a), (item b, rank b)], … “…”: […] … }
- set_model_params(number_of_recommendations=None, number_of_neighbors=None, sampling_strategy=None, sample_size=None, weighting_func=None, ranking_strategy=None, return_events_from_session=None, required_sampling_event=None, recommend_any=False)[source]#
Methods resets and maps the new model parameters.
- Parameters:
- number_of_recommendationsint, default = None
- number_of_neighborsint, default = None
- sampling_strategystr, default = None
- sample_sizeint, default = None
- weighting_funcstr, default = None
- ranking_strategystr, default = None
- return_events_from_sessionbool, default = None
- required_sampling_eventstr or int, default = None
- recommend_anybool, default = None
- Raises:
- TypeError
Wrong input parameter type.
- KeyError
Wrong name of sampling strategy, ranking strategy or weighting function.
Core WSKNN Data Structures#
Sessions#
Class stores session-items mapping and its basic properties. The core object is a dictionary of unique sessions (keys) that are pointing to the session-items and their timestamps (lists).
Parameters#
- event_session_key
The name of a session key.
- event_product_key
The name of an item key.
- event_time_key
The name of a timestamp key.
- event_action_keystr, optional
The name of a key with actions that may be used for neighbors filtering.
- event_action_weightsDict, optional
Dictionary with weights that should be assigned to event actions.
Attributes#
- session_items_actions_mapDict
The session-items mapper
{session_id: [[items], [timestamps], [optional - actions], [optional - weights]]}
.- time_startint, default = 1_000_000_000_000_000
The initial timestamp, first event in whole dataset.
- time_endint, default = 0
The timestamp of the last event in dataset.
- longest_items_vector_size: int, default = 0
The longest sequence of products.
- number_of_sessionsint, default = 0
The number of sessions within a
session_items_actions_map
object.- event_session_key
See the
event_session_key
parameter.- event_product_key
See the
event_product_key
parameter.- event_time_key
See the
event_time_key
parameter.- event_action_key
See the
event_action_key
parameter.- action_weights
See the
event_action_weights
parameter.- metadatastr
A description of the
Sessions
class.
Methods#
- append(event)
Appends a single event to the session-items map.
- export(filename)
Method exports created mapping to pickled dictionary.
- load(filename)
Loads pickled
Sessions
object into a new instance of a class.- save_object(filename)
Users object is stored as a Python pickle binary file.
- update_weights_of_purchase_session()
Adds constant value to all weights if session ended up with purchase.
- __add__(Users)
Adds other Sessions object.
- __str__()
The basic info about the class.
Items#
Class stores item-sessions map and its basic properties. The core object is a dictionary of unique items (keys) that are pointing to the specific sessions and their timestamps (lists).
Parameters#
- event_session_key
The name of a session key.
- event_product_key
The name of an item key.
- event_time_key
The name of a timestamp key.
Attributes#
- item_sessions_mapDict
The item-sessions mapper
{item_id: [[sessions], [first timestamp of each session]]}
.- time_startint, default = 1_000_000_000_000_000
The initial timestamp, first event in whole dataset.
- time_endint, default = 0
The timestamp of the last event in dataset.
- longest_sessions_vector_size: int, default = 0
The longest sequence of sessions that contained the item.
- number_of_itemsint, default = 0
The number of items within the
item_sessions_map
object.- event_session_key
See the
event_session_key
parameter.- event_product_key
See the
event_product_key
parameter.- event_time_key
See the
event_time_key
parameter.- metadatastr
A description of the
Items
class.
Methods#
- append(event)
Appends a single event to the item-sessions map.
- export(filename)
Method exports created mapping to a pickled dictionary.
- load(filename)
Loads pickled
Items
object into a new instance of a class.- save_object(filename)
Items object is stored as a Python pickle binary file.
- __add__(Users)
Adds other Items object. It is a set operation. Therefore, sessions that are assigned to the same item within Items(1) and Items(2) won’t be duplicated.
- __str__()
The basic info about the class.
Raises#
- TypeError
Timestamps are not datatime objects or numerical objects.
Transform session-items map into item-sessions map#
- map_sessions_to_items(sessions_map, sort_items_map=True)[source]
Function transforms sessions map into items map.
- Parameters:
- sessions_mapDict
>>> sessions = { ... session_id: ( ... [ sequence_of_items ], ... [ sequence_of_timestamps ], ... [ [OPTIONAL] sequence_of_event_types ], ... [ [OPTIONAL] sequence_of_event_weights] ... ) ...}
- sort_items_mapbool, default = True
Sorts item-sessions map by the first timestamp of a session (then sessions are sorted in ascending order, from the oldest to the newest).
- Returns:
- items_mapDict
Mapped item-sessions dictionary. >>> items = { … item_id: ( … [ sequence_of_sessions ], … [ sequence_of_the_first_session_timestamps ] … ) …}
Save and Load model#
Evaluation metrics#
- score_model(sessions, trained_model, k=0, skip_short_sessions=True, calc_mrr=True, calc_precision=True, calc_recall=True, sliding_window=False)[source]
Function get Precision@k, Recall@k and MRR@k.
- Parameters:
- sessionsList of sessions
>>> [ ... [ ... [ sequence_of_items ], ... [ sequence_of_timestamps ], ... [ [OPTIONAL] sequence_of_event_type ] ... ], ... ]
- trained_modelWSKNN
The trained WSKNN model.
- kint, default=0
Number of top recommendations. Session must have n+1 items minimum to calculate MRR. Default is 0 and when it is set, then
k
is equal to the number of recommendations from a trained model. Ifk
is greater than the number of recommendations then the latter is adjusted to it.- skip_short_sessionsbool, default=True
Should the algorithm skip short sessions when calculating MRR or should it raise an error?
- calc_mrrbool, default = True
Should MRR be calculated?
- calc_precisionbool, default = True
Should precision be calculated?
- calc_recallbool, default = True
Should recall be calculated?
- sliding_windowbool, default = False
When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.
- Returns:
- scoresDict
{'MRR': float, 'Recall': float, 'Precision': float}
- get_mean_reciprocal_rank(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]
The function calculates the mean reciprocal rank of a top
k
recommendations. Given session must be longer thank
events.- Parameters:
- sessionsList
>>> [ ... [ ... [ sequence_of_items ], ... [ sequence_of_timestamps ], ... [ [OPTIONAL] sequence_of_event_type ] ... ], ... ]
- trained_modelWSKNN
The trained WSKNN model.
- kint, default=0
Number of top recommendations. Session must have n+1 items minimum to calculate MRR. Default is 0 and when it is set, then
k
is equal to the number of recommendations from a trained model. Ifk
is greater than the number of recommendations then the latter is adjusted to it.- skip_short_sessionsbool, default=True
Should the algorithm skip short sessions when calculating MRR or should it raise an error?
- sliding_windowbool, default = False
When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.
- Returns:
- mrrfloat
Mean Reciprocal Rank: The average score of MRR per
n
sessions.
- get_precision(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]
The function calculates the precision score of a top
k
recommendations. Given session must be longer thank
events.- Parameters:
- sessionsList
>>> [ ... [ ... [ sequence_of_items ], ... [ sequence_of_timestamps ], ... [ [OPTIONAL] sequence_of_event_type ] ... ], ... ]
- trained_modelWSKNN
The trained WSKNN model.
- kint, default=0
Number of top recommendations. Session must have n+1 items minimum to calculate precision. Default is 0 and when it is set, then
k
is equal to the number of recommendations from a trained model. Ifk
is greater than the number of recommendations then the latter is adjusted to it.- skip_short_sessionsbool, default=True
Should the algorithm skip short sessions when calculating precision or should it raise an error?
- sliding_windowbool, default = False
When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.
- Returns:
- precisionfloat
Precision: The average score of precision per
n
sessions.
Notes
Precision is defined as
(no of recommendations that are relevant) / (number of items recommended)
.
- get_recall(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]
The function calculates the recall score of a top
k
recommendations. Given session must be longer thank
events.- Parameters:
- sessionsList
>>> [ ... [ ... [ sequence_of_items ], ... [ sequence_of_timestamps ], ... [ [OPTIONAL] sequence_of_event_type ] ... ], ... ]
- trained_modelWSKNN
The trained WSKNN model.
- kint, default=0
Number of top recommendations. Session must have n+1 items minimum to calculate recall. Default is 0 and when it is set, then
k
is equal to the number of recommendations from a trained model. Ifk
is greater than the number of recommendations then the latter is adjusted to it.- skip_short_sessionsbool, default=True
Should the algorithm skip short sessions when calculating recall or should it raise an error?
- sliding_windowbool, default = False
When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.
- Returns:
- recallfloat
The average score of Recall per
n
sessions.
Notes
Recall is defined as (no of recommendations that are relevant) / (all relevant items for a user).
Preprocessing#
- parse_files(dataset, session_id_key, product_key, time_key, action_key=None, time_to_numeric=False, time_to_datetime=False, datetime_format='', allowed_actions=None, purchase_action_name=None, progress_bar=False)[source]
Function parses data from csv, json and gzip json into item-sessions and session-items maps.
- Parameters:
- datasetstr
The gzipped JSONL, JSON, CSV file with events.
- session_id_keystr
The name of the session key.
- product_keystr
The name of the product key.
- action_keystr
The name of the event action type key.
- time_keystr
The name of the event timestamp key.
- time_to_numericbool, default = False
Transforms input timestamps to float values.
- time_to_datetimebool, default = False
Transforms input timestamps to datatime objects. Setting
datetime_format
parameter is required.- datetime_formatstr
The format of datetime object.
- allowed_actionsDict, optional
Allowed actions and their weights.
- purchase_action_name: Any, optional
The name of the final action (it is required to apply weight into the session vector).
- progress_barbool, default = False
Show parsing progress.
- Returns:
- items, sessionsItems, Sessions
The mappings of item-session and session-items.
- parse_flat_file(dataset, sep, session_index, product_index, time_index, action_index=None, use_header_row=False, time_to_numeric=False, time_to_datetime=False, datetime_format='', allowed_actions=None, purchase_action_name=None, ignore_errors=True)[source]
Function parses data from flat file into item-sessions and session-items maps.
- Parameters:
- datasetstr
Input file.
- sepstr
Separator used to separate values.
- session_indexint
The index of the session.
- product_indexint
The index of the product.
- time_indexint
The index of the event timestamp.
- action_indexint, optional
The index of the event action.
- use_header_rowbool, default = False
Use first row values as a header.
- time_to_numericbool, default = False
Transforms input timestamps to float values.
- time_to_datetimebool, default = False
Transforms input timestamps to datatime objects. Setting
datetime_format
parameter is required.- datetime_formatstr
The format of datetime object.
- allowed_actionsDict, optional
Allowed actions and their weights.
- purchase_action_name: Any, optional
The name of the final action (it is required to apply weight into the session vector).
- ignore_errorsbool, default=True
Ignore rows that raise exceptions.
- Returns:
- items, sessionsItems, Sessions
The mappings of item-session and session-items.
- parse_fn(dataset, allowed_actions, purchase_action_name, session_id_key, product_key, action_key, time_key, time_to_numeric, time_to_datetime, datetime_format, progress_bar)[source]
Function parses given dataset into Sessions and Items objects.
- Parameters:
- datasetIterable
Object with events.
- allowed_actionsDict, optional
Allowed actions and their weights.
- purchase_action_name: Any, optional
The name of the final action (it is required to apply weight into the session vector).
- session_id_keystr
The name of the session key.
- product_keystr
The name of the product key.
- action_keystr
The name of the event action type key.
- time_keystr
The name of the event timestamp key.
- time_to_numericbool, default = True
Transforms input timestamps to float values.
- time_to_datetimebool, default = False
Transforms input timestamps to datatime objects. Setting datetime_format parameter is required.
- datetime_formatstr
The format of datetime object.
- progress_barbool
Show parsing progress.
- Returns:
- ItemsMap, SessionsMapItems, Sessions
- parse_pandas(df, session_id_key, product_key, time_key, action_key=None, allowed_actions=None, purchase_action_name=None, event_weights_key=None, min_session_length=3, get_items_map=True)[source]
Function parses given dataset into Sessions and Items objects.
- Parameters:
- dfpandas DataFrame
Dataframe with events and sessions.
- session_id_keystr
The name of the session key.
- product_keystr
The name of the product key.
- time_keystr
The name of the event timestamp key.
- action_keystr, default = None
The name of the event action type key.
- allowed_actionsList, optional
Allowed actions.
- purchase_action_name: Any, optional
The name of the final action (it is required to apply weight into the session vector).
- event_weights_keystr, optional
The name of weights column.
- min_session_lengthint, default = 3
Minimum length of a single session.
- get_items_mapbool, default = True
Should item-sessions map be created?
- Returns:
- : Dict
{“session-map”: Dict, “item-map”: Optional[Dict]}
Utilities#
- load_gzipped_jsonl(filename, encoding='UTF-8')[source]#
Function loads data stored in gzipped JSON Lines.
- Parameters:
- filenamestr
Path to the file.
- encodingstr, default = ‘utf-8’
- Returns:
- datadictdict
Python dictionary with unique records.
- load_gzipped_pickle(filename)[source]#
The function loads gzipped and pickled items / sessions object.
- Parameters:
- filenamestr
- Returns:
- pickled_objectdict