API#

Core functions#

fit_transform#

fit(sessions, items=None, number_of_recommendations=5, number_of_neighbors=10, sampling_strategy='common_items', sample_size=1000, weighting_func='linear', ranking_strategy='linear', return_events_from_session=True, required_sampling_event=None, required_sampling_event_index=None, sampling_str_event_weights_index=None, recommend_any=False)[source]

Sets input session-items and item-sessions maps.

Parameters:
sessionsDict or Sessions
>>> sessions = {
...    session_id: (
...        [ sequence_of_items ],
...        [ sequence_of_timestamps ],
...        [ [OPTIONAL] sequence_of_event_types ],
...        [ [OPTIONAL] sequence_of_event_weights]
...    )
...}
itemsDict or Items, optional

If not provided then item-sessions map is created from the sessions parameter. >>> items = { … item_id: ( … [ sequence_of_sessions ], … [ sequence_of_the_first_session_timestamps ] … ) …}

number_of_recommendationsint, default=5

The number of recommended items.

number_of_neighborsint, default=10

The number of the closest sessions to choose the items from.

sampling_strategystr, default=’common_items’
How to filter the initial sample of sessions. Available strategies are:
  • ‘common_items’: sample sessions with the same items as the input session,

  • ‘recent’: sample the most actual sessions,

  • ‘random’: get a random sample of sessions,

  • ‘weighted_events’: select sessions based on the specific weights assigned to events.

sample_sizeint, default=1000

How many sessions from the model are sampled to make a recommendation.

weighting_funcstr, default=’linear’

The similarity measurement between sessions. Available options: ‘linear’, ‘log’ and ‘quadratic’.

ranking_strategystr, default=’linear’

How we calculate an item rank (based on its position in a session sequence). Available options are: ‘inv’, ‘linear’, ‘log’, ‘quadratic’.

return_events_from_sessionbool, default = True

Should algorithm return the same events as in session if this is only neighbor?

required_sampling_eventint or str, default = None

Set this paramater to the event name if sessions with it must be included in the neighbors selection. For example, this event may be a “purchase”.

required_sampling_event_indexint, default = None

If the required_sampling_event parameter is filled then you must pass an index of a row with event names.

sampling_str_event_weights_indexint, default = None

If sampling_strategy is set to weighted_events then you must pass an index of a row with event weights.

recommend_anybool, default = False

If recommender returns less than number of recommendations items then return random items.

Returns:
wsknnWSKNN

The trained Weighted session-based K-nn model.

Examples

>>> input_sessions = {
...     'session_x': (
...         ['a', 'b', 'c'],
...         [10001, 10002, 10004],
...         ['view', 'click', 'click']
...     )
... }
>>> input_items = {
...     'a': (
...         ['session_x'],
...         [10001]
...     ),
...     'b': (
...         ['session_x'],
...         [10001]
...     ),
...     'c': (
...         ['session_x'],
...         [10001]
...     ),
... }
>>> fitted_model = fit(input_sessions, input_items)

predict#

predict(model, sessions, settings=None)[source]

The function is an alias for the WSKNN.recommend() method.

Parameters:
modelWSKNN

Fitted WSKNN model.

sessionsList

Sequence of items for recommendation. It must be a nested List of lists: >>> [ … [items], … [timestamps], … [(optional) event names], … [(optional) weights] … ]

settingsDict

Settings of the model. It is worth noticing, that using this parameter allows to test different setups. Possible parameters are grouped in the settings.yml file under the model key.

Returns:
recommendationsList

Item recommendations and their ranks.

>>> [
...     [item a, rank a], [item b, rank b]
... ]
Raises:
ValueError

Model wasn’t fitted.

batch_predict#

batch_predict(model, sessions, settings=None)[source]

The function predicts multiple sessions at once.

Parameters:
modelWSKNN

Fitted WSKNN model.

sessionsList[Dict]

User-sessions for recommendations:

>>> [
    ...     {"user A": [*]},
    ...     {"user B": [**]}
    ... ]

where (*) might be:

>>> [
    ...     [sequence_of_items],
    ...     [sequence_of_timestamps],
    ...     [(optional) event names (types)],
    ...     [(optional) weights]
    ... ]
settingsDict

Settings of the model. It is worth noticing, that using this parameter allows to test different setups. Possible parameters are grouped in the settings.yml file under the model key.

Returns:
inferenceList[Dict]

Recommendations and their weights for each user.

>>> [
...   {"user A": [
...     [item a, rank a],
...     [item b, rank b]
...   ]},
...   {"user B": [...]},
... ]

Core WSKNN class#

WSKNN#

class WSKNN(number_of_recommendations=5, number_of_neighbors=10, sampling_strategy='common_items', sample_size=1000, weighting_func='linear', ranking_strategy='linear', return_events_from_session=True, required_sampling_event=None, required_sampling_event_index=None, sampling_event_weights_index=None, recommend_any=False)[source]#

The class represents the Weighted Session-Based k-nn model.

Parameters:
number_of_recommendationsint, default=5

The number of recommended items.

number_of_neighborsint, default=10

The number of the closest sessions to choose the items from.

sampling_strategystr, default=’common_items’

How to filter the initial sample of sessions. Available strategies are:

  • 'common_items': sample sessions with the same items as the input session,

  • 'recent': sample the most actual sessions,

  • 'random': get a random sample of sessions,

  • 'weighted_events': select sessions based on the specific weights assigned to events.

sample_sizeint, default=1000

How many sessions from the model are sampled to make a recommendation.

weighting_funcstr, default=’linear’

The similarity measurement between sessions. Available options: 'linear', 'log' and 'quadratic'.

ranking_strategystr, default=’linear’

How we calculate an item rank (based on its position in a session sequence). Available options are: 'inv', 'linear', 'log', 'quadratic'.

return_events_from_sessionbool, default = True

Should algorithm return the same items as in session if there are no neighbors?

required_sampling_eventint or str, default = None

Set this parameter to the event name if sessions with this event must be included in the neighbors’ selection. For example, an event name may be the "purchase".

required_sampling_event_indexint, default = None

If required_sampling_event parameter is filled then you must pass an index of a row with event names.

sampling_event_weights_indexint, default = None

If sampling_strategy is set to weighted_events then you must pass an index of a row with event weights.

recommend_anybool, default = False

If recommender picks fewer items than required by the number_of_recommendations then add random items to the results.

Raises:
InvalidDimensionsError

Wrong number of nested sequences within session-items or item-sessions maps.

InvalidTimestampError

Wrong type of timestamp - int type is required.

TypeError

Wrong type of nested structures within session-items or item-sessions maps.

IndexError

Wrong index of event names or wrong index of event weights.

Attributes:
weighting_functions: List

The weighting functions: ['linear', 'log', 'quadratic'].

ranking_strategiesList

The ranking strategies: ['linear', 'log', 'quadratic', 'inv'].

session_item_mapDict

The map of items that occur in the session and their timestamps, and (optional) their types and their weights. >>> sessions = { … session_id: ( … [sequence_of_items], … [sequence_of_timestamps], … [(optional) event names (types)], … [(optional) weights] … ) … }

item_session_mapDict

The map of items and the sessions where those items are present, and the first timestamp of those sessions. >>> items = { … item_id: ( … [sequence_of_sessions], … [sequence_of_the_first_session_timestamps] … ) … }

n_of_recommendationsint

The number of items recommended.

number_of_closest_neighborsint

See number_of_neighbors parameter.

sampling_strategystr

See sampling_strategy parameter.

possible_neighbors_sample_sizeint

See sample_size parameter.

weighting_functionstr

See weighting_func parameter.

ranking_strategystr

See ranking_strategy parameter.

return_events_from_sessionbool, default = True

See return_events_from_session parameter.

required_sampling_eventUnion[int, str], default = None

See required_sampling_event parameter.

required_sampling_event_indexint, default = None

See required_sampling_event_index parameter.

sampling_event_weights_indexint, default = None

See sampling_str_event_weights_index parameter.

recommend_anybool, default = False

See recommend_any parameter.

Methods

fit()

Sets input session-items and item-sessions maps.

recommend()

The method predicts the n next recommendations from a given session.

set_model_params()

Methods resets and maps the new model parameters.

fit(sessions, items=None)[source]#

Sets input session-items and item-sessions maps.

Parameters:
sessionsDict

The map of items that occur in the session and their timestamps, and (optional) their types and their weights.

>>> sessions = {
...     session_id: (
...     [sequence_of_items],
...     [sequence_of_timestamps],
...     [(optional) event names (types)],
...     [(optional) weights]
...     )
... }
itemsDict

The map of items and the sessions where those items are present, and the first timestamp of those sessions. If not provided then the item-sessions map is created from the sessions parameter.

>>> items = {
...     item_id: (
...     [sequence_of_sessions],
...     [sequence_of_the_first_session_timestamps]
...     )
... }
recommend(event_stream, settings=None)[source]#

The method predicts n next recommendations from a given session.

Parameters:
event_streamList, Dict

Sequence of items for recommendation. If list then it is treated as a single recommendation:

>>> [
...     [sequence_of_items],
...     [sequence_of_timestamps],
...     [(optional) event names (types)],
...     [(optional) weights]
... ]

If it is a dictionary then recommendations are done in batch. Every key in a dictionary is user-index, and value is a list with sequence of items, timestamps and optional features.

>>> {
...     "user A": [...],
...     "user B": [...]
... }
settingsDict, default = None

Model settings and parameters.

Returns:
recommendationsList, Dict

Output for the single input (list): >>> [ … (item a, rank a), (item b, rank b) … ]

Output for the input with multiple users (dictionary): >>> { … “user A”: [(item a, rank a), (item b, rank b)], … “…”: […] … }

set_model_params(number_of_recommendations=None, number_of_neighbors=None, sampling_strategy=None, sample_size=None, weighting_func=None, ranking_strategy=None, return_events_from_session=None, required_sampling_event=None, recommend_any=False)[source]#

Methods resets and maps the new model parameters.

Parameters:
number_of_recommendationsint, default = None
number_of_neighborsint, default = None
sampling_strategystr, default = None
sample_sizeint, default = None
weighting_funcstr, default = None
ranking_strategystr, default = None
return_events_from_sessionbool, default = None
required_sampling_eventstr or int, default = None
recommend_anybool, default = None
Raises:
TypeError

Wrong input parameter type.

KeyError

Wrong name of sampling strategy, ranking strategy or weighting function.

Core WSKNN Data Structures#

Sessions#

Class stores session-items mapping and its basic properties. The core object is a dictionary of unique sessions (keys) that are pointing to the session-items and their timestamps (lists).

Parameters#

event_session_key

The name of a session key.

event_product_key

The name of an item key.

event_time_key

The name of a timestamp key.

event_action_keystr, optional

The name of a key with actions that may be used for neighbors filtering.

event_action_weightsDict, optional

Dictionary with weights that should be assigned to event actions.

Attributes#

session_items_actions_mapDict

The session-items mapper {session_id: [[items], [timestamps], [optional - actions], [optional - weights]]}.

time_startint, default = 1_000_000_000_000_000

The initial timestamp, first event in whole dataset.

time_endint, default = 0

The timestamp of the last event in dataset.

longest_items_vector_size: int, default = 0

The longest sequence of products.

number_of_sessionsint, default = 0

The number of sessions within a session_items_actions_map object.

event_session_key

See the event_session_key parameter.

event_product_key

See the event_product_key parameter.

event_time_key

See the event_time_key parameter.

event_action_key

See the event_action_key parameter.

action_weights

See the event_action_weights parameter.

metadatastr

A description of the Sessions class.

Methods#

append(event)

Appends a single event to the session-items map.

export(filename)

Method exports created mapping to pickled dictionary.

load(filename)

Loads pickled Sessions object into a new instance of a class.

save_object(filename)

Users object is stored as a Python pickle binary file.

update_weights_of_purchase_session()

Adds constant value to all weights if session ended up with purchase.

__add__(Users)

Adds other Sessions object.

__str__()

The basic info about the class.

Items#

Class stores item-sessions map and its basic properties. The core object is a dictionary of unique items (keys) that are pointing to the specific sessions and their timestamps (lists).

Parameters#

event_session_key

The name of a session key.

event_product_key

The name of an item key.

event_time_key

The name of a timestamp key.

Attributes#

item_sessions_mapDict

The item-sessions mapper {item_id: [[sessions], [first timestamp of each session]]}.

time_startint, default = 1_000_000_000_000_000

The initial timestamp, first event in whole dataset.

time_endint, default = 0

The timestamp of the last event in dataset.

longest_sessions_vector_size: int, default = 0

The longest sequence of sessions that contained the item.

number_of_itemsint, default = 0

The number of items within the item_sessions_map object.

event_session_key

See the event_session_key parameter.

event_product_key

See the event_product_key parameter.

event_time_key

See the event_time_key parameter.

metadatastr

A description of the Items class.

Methods#

append(event)

Appends a single event to the item-sessions map.

export(filename)

Method exports created mapping to a pickled dictionary.

load(filename)

Loads pickled Items object into a new instance of a class.

save_object(filename)

Items object is stored as a Python pickle binary file.

__add__(Users)

Adds other Items object. It is a set operation. Therefore, sessions that are assigned to the same item within Items(1) and Items(2) won’t be duplicated.

__str__()

The basic info about the class.

Raises#

TypeError

Timestamps are not datatime objects or numerical objects.

Transform session-items map into item-sessions map#

map_sessions_to_items(sessions_map, sort_items_map=True)[source]

Function transforms sessions map into items map.

Parameters:
sessions_mapDict
>>> sessions = {
...    session_id: (
...        [ sequence_of_items ],
...        [ sequence_of_timestamps ],
...        [ [OPTIONAL] sequence_of_event_types ],
...        [ [OPTIONAL] sequence_of_event_weights]
...    )
...}
sort_items_mapbool, default = True

Sorts item-sessions map by the first timestamp of a session (then sessions are sorted in ascending order, from the oldest to the newest).

Returns:
items_mapDict

Mapped item-sessions dictionary. >>> items = { … item_id: ( … [ sequence_of_sessions ], … [ sequence_of_the_first_session_timestamps ] … ) …}

Save and Load model#

Evaluation metrics#

score_model(sessions, trained_model, k=0, skip_short_sessions=True, calc_mrr=True, calc_precision=True, calc_recall=True, sliding_window=False)[source]

Function get Precision@k, Recall@k and MRR@k.

Parameters:
sessionsList of sessions
>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]
trained_modelWSKNN

The trained WSKNN model.

kint, default=0

Number of top recommendations. Session must have n+1 items minimum to calculate MRR. Default is 0 and when it is set, then k is equal to the number of recommendations from a trained model. If k is greater than the number of recommendations then the latter is adjusted to it.

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating MRR or should it raise an error?

calc_mrrbool, default = True

Should MRR be calculated?

calc_precisionbool, default = True

Should precision be calculated?

calc_recallbool, default = True

Should recall be calculated?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:
scoresDict

{'MRR': float, 'Recall': float, 'Precision': float}

get_mean_reciprocal_rank(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]

The function calculates the mean reciprocal rank of a top k recommendations. Given session must be longer than k events.

Parameters:
sessionsList
>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]
trained_modelWSKNN

The trained WSKNN model.

kint, default=0

Number of top recommendations. Session must have n+1 items minimum to calculate MRR. Default is 0 and when it is set, then k is equal to the number of recommendations from a trained model. If k is greater than the number of recommendations then the latter is adjusted to it.

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating MRR or should it raise an error?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:
mrrfloat

Mean Reciprocal Rank: The average score of MRR per n sessions.

get_precision(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]

The function calculates the precision score of a top k recommendations. Given session must be longer than k events.

Parameters:
sessionsList
>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]
trained_modelWSKNN

The trained WSKNN model.

kint, default=0

Number of top recommendations. Session must have n+1 items minimum to calculate precision. Default is 0 and when it is set, then k is equal to the number of recommendations from a trained model. If k is greater than the number of recommendations then the latter is adjusted to it.

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating precision or should it raise an error?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:
precisionfloat

Precision: The average score of precision per n sessions.

Notes

Precision is defined as (no of recommendations that are relevant) / (number of items recommended).

get_recall(sessions, trained_model, k=0, skip_short_sessions=True, sliding_window=False)[source]

The function calculates the recall score of a top k recommendations. Given session must be longer than k events.

Parameters:
sessionsList
>>> [
...     [
...         [ sequence_of_items ],
...         [ sequence_of_timestamps ],
...         [ [OPTIONAL] sequence_of_event_type ]
...     ],
... ]
trained_modelWSKNN

The trained WSKNN model.

kint, default=0

Number of top recommendations. Session must have n+1 items minimum to calculate recall. Default is 0 and when it is set, then k is equal to the number of recommendations from a trained model. If k is greater than the number of recommendations then the latter is adjusted to it.

skip_short_sessionsbool, default=True

Should the algorithm skip short sessions when calculating recall or should it raise an error?

sliding_windowbool, default = False

When calculating metrics slide through a single session up to the point when it is not possible to have the same number of evaluation products as the number of recommendations.

Returns:
recallfloat

The average score of Recall per n sessions.

Notes

Recall is defined as (no of recommendations that are relevant) / (all relevant items for a user).

Preprocessing#

parse_files(dataset, session_id_key, product_key, time_key, action_key=None, time_to_numeric=False, time_to_datetime=False, datetime_format='', allowed_actions=None, purchase_action_name=None, progress_bar=False)[source]

Function parses data from csv, json and gzip json into item-sessions and session-items maps.

Parameters:
datasetstr

The gzipped JSONL, JSON, CSV file with events.

session_id_keystr

The name of the session key.

product_keystr

The name of the product key.

action_keystr

The name of the event action type key.

time_keystr

The name of the event timestamp key.

time_to_numericbool, default = False

Transforms input timestamps to float values.

time_to_datetimebool, default = False

Transforms input timestamps to datatime objects. Setting datetime_format parameter is required.

datetime_formatstr

The format of datetime object.

allowed_actionsDict, optional

Allowed actions and their weights.

purchase_action_name: Any, optional

The name of the final action (it is required to apply weight into the session vector).

progress_barbool, default = False

Show parsing progress.

Returns:
items, sessionsItems, Sessions

The mappings of item-session and session-items.

parse_flat_file(dataset, sep, session_index, product_index, time_index, action_index=None, use_header_row=False, time_to_numeric=False, time_to_datetime=False, datetime_format='', allowed_actions=None, purchase_action_name=None, ignore_errors=True)[source]

Function parses data from flat file into item-sessions and session-items maps.

Parameters:
datasetstr

Input file.

sepstr

Separator used to separate values.

session_indexint

The index of the session.

product_indexint

The index of the product.

time_indexint

The index of the event timestamp.

action_indexint, optional

The index of the event action.

use_header_rowbool, default = False

Use first row values as a header.

time_to_numericbool, default = False

Transforms input timestamps to float values.

time_to_datetimebool, default = False

Transforms input timestamps to datatime objects. Setting datetime_format parameter is required.

datetime_formatstr

The format of datetime object.

allowed_actionsDict, optional

Allowed actions and their weights.

purchase_action_name: Any, optional

The name of the final action (it is required to apply weight into the session vector).

ignore_errorsbool, default=True

Ignore rows that raise exceptions.

Returns:
items, sessionsItems, Sessions

The mappings of item-session and session-items.

parse_fn(dataset, allowed_actions, purchase_action_name, session_id_key, product_key, action_key, time_key, time_to_numeric, time_to_datetime, datetime_format, progress_bar)[source]

Function parses given dataset into Sessions and Items objects.

Parameters:
datasetIterable

Object with events.

allowed_actionsDict, optional

Allowed actions and their weights.

purchase_action_name: Any, optional

The name of the final action (it is required to apply weight into the session vector).

session_id_keystr

The name of the session key.

product_keystr

The name of the product key.

action_keystr

The name of the event action type key.

time_keystr

The name of the event timestamp key.

time_to_numericbool, default = True

Transforms input timestamps to float values.

time_to_datetimebool, default = False

Transforms input timestamps to datatime objects. Setting datetime_format parameter is required.

datetime_formatstr

The format of datetime object.

progress_barbool

Show parsing progress.

Returns:
ItemsMap, SessionsMapItems, Sessions
parse_pandas(df, session_id_key, product_key, time_key, action_key=None, allowed_actions=None, purchase_action_name=None, event_weights_key=None, min_session_length=3, get_items_map=True)[source]

Function parses given dataset into Sessions and Items objects.

Parameters:
dfpandas DataFrame

Dataframe with events and sessions.

session_id_keystr

The name of the session key.

product_keystr

The name of the product key.

time_keystr

The name of the event timestamp key.

action_keystr, default = None

The name of the event action type key.

allowed_actionsList, optional

Allowed actions.

purchase_action_name: Any, optional

The name of the final action (it is required to apply weight into the session vector).

event_weights_keystr, optional

The name of weights column.

min_session_lengthint, default = 3

Minimum length of a single session.

get_items_mapbool, default = True

Should item-sessions map be created?

Returns:
: Dict

{“session-map”: Dict, “item-map”: Optional[Dict]}

Utilities#

load_gzipped_jsonl(filename, encoding='UTF-8')[source]#

Function loads data stored in gzipped JSON Lines.

Parameters:
filenamestr

Path to the file.

encodingstr, default = ‘utf-8’
Returns:
datadictdict

Python dictionary with unique records.

load_gzipped_pickle(filename)[source]#

The function loads gzipped and pickled items / sessions object.

Parameters:
filenamestr
Returns:
pickled_objectdict
load_jsonl(filename)[source]#

Function loads data stored in JSON Lines.

Parameters:
filenamestr

Path to the file.

Returns:
datadictdict

Python dictionary with unique records.

load_pickled(filename)[source]#

The function loads pickled items / sessions object.

Parameters:
filenamestr
Returns:
pickled_objectdict