API#

Core functions#

fit_transform#

fit(sessions, items=None, number_of_recommendations=5, number_of_neighbors=10, sampling_strategy='common_items', sample_size=1000, weighting_func='linear', ranking_strategy='linear', return_events_from_session=True, required_sampling_event=None, required_sampling_event_index=None, sampling_str_event_weights_index=None, recommend_any=False)[source]

Sets input session-items and item-sessions maps.

Parameters:

sessionsDict or Sessions

>>> sessions = {
...    session_id: (
...        [ sequence_of_items ],
...        [ sequence_of_timestamps ],
...        [ [OPTIONAL] sequence_of_event_types ],
...        [ [OPTIONAL] sequence_of_event_weights]
...    )
...}

itemsDict or Items, optional

If not provided then item-sessions map is created from the sessions parameter. >>> items = { … item_id: ( … [ sequence_of_sessions ], … [ sequence_of_the_first_session_timestamps ] … ) …}

number_of_recommendationsint, default=5

The number of recommended items.

number_of_neighborsint, default=10

The number of the closest sessions to choose the items from.

sampling_strategystr, default=’common_items’

How to filter the initial sample of sessions. Available strategies are:

‘common_items’: sample sessions with the same items as the input session,
‘recent’: sample the most actual sessions,
‘random’: get a random sample of sessions,
‘weighted_events’: select sessions based on the specific weights assigned to events.

sample_sizeint, default=1000

How many sessions from the model are sampled to make a recommendation.

weighting_funcstr, default=’linear’

The similarity measurement between sessions. Available options: ‘linear’, ‘log’ and ‘quadratic’.

ranking_strategystr, default=’linear’

How we calculate an item rank (based on its position in a session sequence). Available options are: ‘inv’, ‘linear’, ‘log’, ‘quadratic’.

return_events_from_sessionbool, default = True

Should algorithm return the same events as in session if this is only neighbor?

required_sampling_eventint or str, default = None

Set this paramater to the event name if sessions with it must be included in the neighbors selection. For example, this event may be a “purchase”.

required_sampling_event_indexint, default = None

If the required_sampling_event parameter is filled then you must pass an index of a row with event names.

sampling_str_event_weights_indexint, default = None

If sampling_strategy is set to weighted_events then you must pass an index of a row with event weights.

recommend_anybool, default = False

If recommender returns less than number of recommendations items then return random items.

Returns:

wsknnWSKNN: The trained Weighted session-based K-nn model.

Examples

>>> input_sessions = {
...     'session_x': (
...         ['a', 'b', 'c'],
...         [10001, 10002, 10004],
...         ['view', 'click', 'click']
...     )
... }
>>> input_items = {
...     'a': (
...         ['session_x'],
...         [10001]
...     ),
...     'b': (
...         ['session_x'],
...         [10001]
...     ),
...     'c': (
...         ['session_x'],
...         [10001]
...     ),
... }
>>> fitted_model = fit(input_sessions, input_items)

predict#

predict(model, sessions, settings=None)[source]

The function is an alias for the WSKNN.recommend() method.

Parameters:

modelWSKNN: Fitted WSKNN model.
sessionsList: Sequence of items for recommendation. It must be a nested List of lists: >>> [ … [items], … [timestamps], … [(optional) event names], … [(optional) weights] … ]
settingsDict: Settings of the model. It is worth noticing, that using this parameter allows to test different setups. Possible parameters are grouped in the settings.yml file under the model key.

Returns:

recommendationsList

Item recommendations and their ranks.

>>> [
...     [item a, rank a], [item b, rank b]
... ]

Raises:

ValueError: Model wasn’t fitted.

batch_predict#

batch_predict(model, sessions, settings=None)[source]

The function predicts multiple sessions at once.

Parameters:

modelWSKNN

Fitted WSKNN model.

sessionsList[Dict]

User-sessions for recommendations:

>>> [
    ...     {"user A": [*]},
    ...     {"user B": [**]}
    ... ]

where (*) might be:

>>> [
    ...     [sequence_of_items],
    ...     [sequence_of_timestamps],
    ...     [(optional) event names (types)],
    ...     [(optional) weights]
    ... ]

settingsDict

Settings of the model. It is worth noticing, that using this parameter allows to test different setups. Possible parameters are grouped in the settings.yml file under the model key.

Returns:

inferenceList[Dict]

Recommendations and their weights for each user.

>>> [
...   {"user A": [
...     [item a, rank a],
...     [item b, rank b]
...   ]},
...   {"user B": [...]},
... ]

Core WSKNN class#

WSKNN#

class WSKNN(number_of_recommendations=5, number_of_neighbors=10, sampling_strategy='common_items', sample_size=1000, weighting_func='linear', ranking_strategy='linear', return_events_from_session=True, required_sampling_event=None, required_sampling_event_index=None, sampling_event_weights_index=None, recommend_any=False)[source]#

The class represents the Weighted Session-Based k-nn model.

Parameters:

number_of_recommendationsint, default=5

The number of recommended items.

number_of_neighborsint, default=10

The number of the closest sessions to choose the items from.

sampling_strategystr, default=’common_items’

How to filter the initial sample of sessions. Available strategies are:

'common_items': sample sessions with the same items as the input session,
'recent': sample the most actual sessions,
'random': get a random sample of sessions,
'weighted_events': select sessions based on the specific weights assigned to events.

sample_sizeint, default=1000

How many sessions from the model are sampled to make a recommendation.

weighting_funcstr, default=’linear’

The similarity measurement between sessions. Available options: 'linear', 'log' and 'quadratic'.

ranking_strategystr, default=’linear’

How we calculate an item rank (based on its position in a session sequence). Available options are: 'inv', 'linear', 'log', 'quadratic'.

return_events_from_sessionbool, default = True

Should algorithm return the same items as in session if there are no neighbors?

required_sampling_eventint or str, default = None

Set this parameter to the event name if sessions with this event must be included in the neighbors’ selection. For example, an event name may be the "purchase".

required_sampling_event_indexint, default = None

If required_sampling_event parameter is filled then you must pass an index of a row with event names.

sampling_event_weights_indexint, default = None

If sampling_strategy is set to weighted_events then you must pass an index of a row with event weights.

recommend_anybool, default = False

If recommender picks fewer items than required by the number_of_recommendations then add random items to the results.

Raises:

InvalidDimensionsError: Wrong number of nested sequences within session-items or item-sessions maps.
InvalidTimestampError: Wrong type of timestamp - int type is required.
TypeError: Wrong type of nested structures within session-items or item-sessions maps.
IndexError: Wrong index of event names or wrong index of event weights.

Attributes:

weighting_functions: List: The weighting functions: ['linear', 'log', 'quadratic'].
ranking_strategiesList: The ranking strategies: ['linear', 'log', 'quadratic', 'inv'].
session_item_mapDict: The map of items that occur in the session and their timestamps, and (optional) their types and their weights. >>> sessions = { … session_id: ( … [sequence_of_items], … [sequence_of_timestamps], … [(optional) event names (types)], … [(optional) weights] … ) … }
item_session_mapDict: The map of items and the sessions where those items are present, and the first timestamp of those sessions. >>> items = { … item_id: ( … [sequence_of_sessions], … [sequence_of_the_first_session_timestamps] … ) … }
n_of_recommendationsint: The number of items recommended.
number_of_closest_neighborsint: See number_of_neighbors parameter.
sampling_strategystr: See sampling_strategy parameter.
possible_neighbors_sample_sizeint: See sample_size parameter.
weighting_functionstr: See weighting_func parameter.
ranking_strategystr: See ranking_strategy parameter.
return_events_from_sessionbool, default = True: See return_events_from_session parameter.
required_sampling_eventUnion[int, str], default = None: See required_sampling_event parameter.
required_sampling_event_indexint, default = None: See required_sampling_event_index parameter.
sampling_event_weights_indexint, default = None: See sampling_str_event_weights_index parameter.
recommend_anybool, default = False: See recommend_any parameter.

Methods

fit()	Sets input session-items and item-sessions maps.
recommend()	The method predicts the `n` next recommendations from a given session.
set_model_params()	Methods resets and maps the new model parameters.

fit(sessions, items=None)[source]#

Sets input session-items and item-sessions maps.

Parameters:

sessionsDict

The map of items that occur in the session and their timestamps, and (optional) their types and their weights.

>>> sessions = {
...     session_id: (
...     [sequence_of_items],
...     [sequence_of_timestamps],
...     [(optional) event names (types)],
...     [(optional) weights]
...     )
... }

itemsDict

The map of items and the sessions where those items are present, and the first timestamp of those sessions. If not provided then the item-sessions map is created from the sessions parameter.

>>> items = {
...     item_id: (
...     [sequence_of_sessions],
...     [sequence_of_the_first_session_timestamps]
...     )
... }

recommend(event_stream, settings=None)[source]#

The method predicts n next recommendations from a given session.

Parameters:

event_streamList, Dict

Sequence of items for recommendation. If list then it is treated as a single recommendation:

>>> [
...     [sequence_of_items],
...     [sequence_of_timestamps],
...     [(optional) event names (types)],
...     [(optional) weights]
... ]

If it is a dictionary then recommendations are done in batch. Every key in a dictionary is user-index, and value is a list with sequence of items, timestamps and optional features.

>>> {
...     "user A": [...],
...     "user B": [...]
... }

settingsDict, default = None

Model settings and parameters.

Returns:

recommendationsList, Dict

Output for the single input (list): >>> [ … (item a, rank a), (item b, rank b) … ]

Output for the input with multiple users (dictionary): >>> { … “user A”: [(item a, rank a), (item b, rank b)], … “…”: […] … }

set_model_params(number_of_recommendations=None, number_of_neighbors=None, sampling_strategy=None, sample_size=None, weighting_func=None, ranking_strategy=None, return_events_from_session=None, required_sampling_event=None, recommend_any=False)[source]#

Methods resets and maps the new model parameters.

Parameters:

number_of_recommendationsint, default = None
number_of_neighborsint, default = None
sampling_strategystr, default = None
sample_sizeint, default = None
weighting_funcstr, default = None
ranking_strategystr, default = None
return_events_from_sessionbool, default = None
required_sampling_eventstr or int, default = None
recommend_anybool, default = None

Raises:

TypeError: Wrong input parameter type.
KeyError: Wrong name of sampling strategy, ranking strategy or weighting function.

Core WSKNN Data Structures#

Sessions#

Class stores session-items mapping and its basic properties. The core object is a dictionary of unique sessions (keys) that are pointing to the session-items and their timestamps (lists).

Parameters#

event_session_key: The name of a session key.
event_product_key: The name of an item key.
event_time_key: The name of a timestamp key.
event_action_keystr, optional: The name of a key with actions that may be used for neighbors filtering.
event_action_weightsDict, optional: Dictionary with weights that should be assigned to event actions.

Attributes#

session_items_actions_mapDict: The session-items mapper {session_id: [[items], [timestamps], [optional - actions], [optional - weights]]}.
time_startint, default = 1_000_000_000_000_000: The initial timestamp, first event in whole dataset.
time_endint, default = 0: The timestamp of the last event in dataset.
longest_items_vector_size: int, default = 0: The longest sequence of products.
number_of_sessionsint, default = 0: The number of sessions within a session_items_actions_map object.
event_session_key: See the event_session_key parameter.
event_product_key: See the event_product_key parameter.
event_time_key: See the event_time_key parameter.
event_action_key: See the event_action_key parameter.
action_weights: See the event_action_weights parameter.
metadatastr: A description of the Sessions class.

Methods#

append(event): Appends a single event to the session-items map.
export(filename): Method exports created mapping to pickled dictionary.
load(filename): Loads pickled Sessions object into a new instance of a class.
save_object(filename): Users object is stored as a Python pickle binary file.
update_weights_of_purchase_session(): Adds constant value to all weights if session ended up with purchase.
__add__(Users): Adds other Sessions object.
__str__(): The basic info about the class.

Items#

Class stores item-sessions map and its basic properties. The core object is a dictionary of unique items (keys) that are pointing to the specific sessions and their timestamps (lists).

Parameters#

event_session_key: The name of a session key.
event_product_key: The name of an item key.
event_time_key: The name of a timestamp key.

Attributes#

item_sessions_mapDict: The item-sessions mapper {item_id: [[sessions], [first timestamp of each session]]}.
time_startint, default = 1_000_000_000_000_000: The initial timestamp, first event in whole dataset.
time_endint, default = 0: The timestamp of the last event in dataset.
longest_sessions_vector_size: int, default = 0: The longest sequence of sessions that contained the item.
number_of_itemsint, default = 0: The number of items within the item_sessions_map object.
event_session_key: See the event_session_key parameter.
event_product_key: See the event_product_key parameter.
event_time_key: See the event_time_key parameter.
metadatastr: A description of the Items class.

Methods#

append(event): Appends a single event to the item-sessions map.
export(filename): Method exports created mapping to a pickled dictionary.
load(filename): Loads pickled Items object into a new instance of a class.
save_object(filename): Items object is stored as a Python pickle binary file.
__add__(Users): Adds other Items object. It is a set operation. Therefore, sessions that are assigned to the same item within Items(1) and Items(2) won’t be duplicated.
__str__(): The basic info about the class.

Raises#

TypeError: Timestamps are not datatime objects or numerical objects.

Transform session-items map into item-sessions map#

map_sessions_to_items(sessions_map, sort_items_map=True)[source]

Function transforms sessions map into items map.

Parameters:

sessions_mapDict

>>> sessions = {
...    session_id: (
...        [ sequence_of_items ],
...        [ sequence_of_timestamps ],
...        [ [OPTIONAL] sequence_of_event_types ],
...        [ [OPTIONAL] sequence_of_event_weights]
...    )
...}

sort_items_mapbool, default = True

Sorts item-sessions map by the first timestamp of a session (then sessions are sorted in ascending order, from the oldest to the newest).

Returns:

items_mapDict: Mapped item-sessions dictionary. >>> items = { … item_id: ( … [ sequence_of_sessions ], … [ sequence_of_the_first_session_timestamps ] … ) …}

Save and Load model#

Evaluation metrics#

score_model(sessions, trained_model, k=0, skip_short_sessions=True, calc_mrr=True, calc_precision=True, calc_recall=True, sliding_window=False)[source]

Function get Precision@k, Recall@k and MRR@k.

Parameters:

sessionsList of sessions

>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]

trained_modelWSKNN

The trained WSKNN model.

kint, default=0

Number of top recommendations. Session must have n+1 items minimum to calculate MRR. Default is 0 and when it is set, then k is equal to the number of recommendations from a trained model. If k is greater than the number of recommendations then the latter is adjusted to it.

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating MRR or should it raise an error?

calc_mrrbool, default = True

Should MRR be calculated?

calc_precisionbool, default = True

Should precision be calculated?

calc_recallbool, default = True

Should recall be calculated?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:

scoresDict: {'MRR': float, 'Recall': float, 'Precision': float}

get_mean_reciprocal_rank(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]

The function calculates the mean reciprocal rank of a top k recommendations. Given session must be longer than k events.

Parameters:

sessionsList

>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]

trained_modelWSKNN

The trained WSKNN model.

kint, default=0

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating MRR or should it raise an error?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:

mrrfloat: Mean Reciprocal Rank: The average score of MRR per n sessions.

get_precision(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]

The function calculates the precision score of a top k recommendations. Given session must be longer than k events.

Parameters:

sessionsList

>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]

trained_modelWSKNN

The trained WSKNN model.

kint, default=0

Number of top recommendations. Session must have n+1 items minimum to calculate precision. Default is 0 and when it is set, then k is equal to the number of recommendations from a trained model. If k is greater than the number of recommendations then the latter is adjusted to it.

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating precision or should it raise an error?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:

precisionfloat: Precision: The average score of precision per n sessions.

Notes

Precision is defined as (no of recommendations that are relevant) / (number of items recommended).

get_recall(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]

The function calculates the recall score of a top k recommendations. Given session must be longer than k events.

Parameters:

sessionsList

>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]

trained_modelWSKNN

The trained WSKNN model.

kint, default=0

Number of top recommendations. Session must have n+1 items minimum to calculate recall. Default is 0 and when it is set, then k is equal to the number of recommendations from a trained model. If k is greater than the number of recommendations then the latter is adjusted to it.

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating recall or should it raise an error?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:

recallfloat: The average score of Recall per n sessions.

Notes

Recall is defined as (no of recommendations that are relevant) / (all relevant items for a user).

Preprocessing#

parse_files(dataset, session_id_key, product_key, time_key, action_key=None, time_to_numeric=False, time_to_datetime=False, datetime_format='', allowed_actions=None, purchase_action_name=None, progress_bar=False)[source]

Function parses data from csv, json and gzip json into item-sessions and session-items maps.

Parameters:

datasetstr: The gzipped JSONL, JSON, CSV file with events.
session_id_keystr: The name of the session key.
product_keystr: The name of the product key.
action_keystr: The name of the event action type key.
time_keystr: The name of the event timestamp key.
time_to_numericbool, default = False: Transforms input timestamps to float values.
time_to_datetimebool, default = False: Transforms input timestamps to datatime objects. Setting datetime_format parameter is required.
datetime_formatstr: The format of datetime object.
allowed_actionsDict, optional: Allowed actions and their weights.
purchase_action_name: Any, optional: The name of the final action (it is required to apply weight into the session vector).
progress_barbool, default = False: Show parsing progress.

Returns:

items, sessionsItems, Sessions: The mappings of item-session and session-items.

parse_flat_file(dataset, sep, session_index, product_index, time_index, action_index=None, use_header_row=False, time_to_numeric=False, time_to_datetime=False, datetime_format='', allowed_actions=None, purchase_action_name=None, ignore_errors=True)[source]

Function parses data from flat file into item-sessions and session-items maps.

Parameters:

datasetstr: Input file.
sepstr: Separator used to separate values.
session_indexint: The index of the session.
product_indexint: The index of the product.
time_indexint: The index of the event timestamp.
action_indexint, optional: The index of the event action.
use_header_rowbool, default = False: Use first row values as a header.
time_to_numericbool, default = False: Transforms input timestamps to float values.
time_to_datetimebool, default = False: Transforms input timestamps to datatime objects. Setting datetime_format parameter is required.
datetime_formatstr: The format of datetime object.
allowed_actionsDict, optional: Allowed actions and their weights.
purchase_action_name: Any, optional: The name of the final action (it is required to apply weight into the session vector).
ignore_errorsbool, default=True: Ignore rows that raise exceptions.

Returns:

items, sessionsItems, Sessions: The mappings of item-session and session-items.

parse_fn(dataset, allowed_actions, purchase_action_name, session_id_key, product_key, action_key, time_key, time_to_numeric, time_to_datetime, datetime_format, progress_bar)[source]

Function parses given dataset into Sessions and Items objects.

Parameters:

datasetIterable: Object with events.
allowed_actionsDict, optional: Allowed actions and their weights.
purchase_action_name: Any, optional: The name of the final action (it is required to apply weight into the session vector).
session_id_keystr: The name of the session key.
product_keystr: The name of the product key.
action_keystr: The name of the event action type key.
time_keystr: The name of the event timestamp key.
time_to_numericbool, default = True: Transforms input timestamps to float values.
time_to_datetimebool, default = False: Transforms input timestamps to datatime objects. Setting datetime_format parameter is required.
datetime_formatstr: The format of datetime object.
progress_barbool: Show parsing progress.

Returns:

ItemsMap, SessionsMapItems, Sessions

parse_pandas(df, session_id_key, product_key, time_key, action_key=None, allowed_actions=None, purchase_action_name=None, event_weights_key=None, min_session_length=3, get_items_map=True)[source]

Function parses given dataset into Sessions and Items objects.

Parameters:

dfpandas DataFrame: Dataframe with events and sessions.
session_id_keystr: The name of the session key.
product_keystr: The name of the product key.
time_keystr: The name of the event timestamp key.
action_keystr, default = None: The name of the event action type key.
allowed_actionsList, optional: Allowed actions.
purchase_action_name: Any, optional: The name of the final action (it is required to apply weight into the session vector).
event_weights_keystr, optional: The name of weights column.
min_session_lengthint, default = 3: Minimum length of a single session.
get_items_mapbool, default = True: Should item-sessions map be created?

Returns:

: Dict: {“session-map”: Dict, “item-map”: Optional[Dict]}

Utilities#

load_gzipped_jsonl(filename, encoding='UTF-8')[source]#

Function loads data stored in gzipped JSON Lines.

Parameters:

filenamestr: Path to the file.
encodingstr, default = ‘utf-8’

Returns:

datadictdict: Python dictionary with unique records.

load_gzipped_pickle(filename)[source]#

The function loads gzipped and pickled items / sessions object.

Parameters:

filenamestr

Returns:

pickled_objectdict

load_jsonl(filename)[source]#

Function loads data stored in JSON Lines.

Parameters:

filenamestr: Path to the file.

Returns:

datadictdict: Python dictionary with unique records.

load_pickled(filename)[source]#

The function loads pickled items / sessions object.

Parameters:

filenamestr

Returns:

pickled_objectdict