Kerch Module

The kerch.feature.Module class is aimed at modules that must be trained trough gradient descent. It extends the torch.nn.modules.module.Module to add the logging features of the kerch.feature.Logger class.

Functionalities

It also adds the following functionalities necessary for more complex gradient descent than what PyTorch offers. In particular:

Before and After Step Operations

The methods before_step() and after_step() for operations that must be executed before and after a parameter update through an optimization step.

Different Parameter Types

The support for various groups of parameters that require specific learning rates or lie on the Stiefel manifold. The following group types are available

Euclidean
Parameters lying on the Euclidean manifold (standard optimization). The optimization is done onto \(\mathbb{R}^{n \times m}\), \(n\) and \(m\) depending on the size of each parameter.
Stiefel
Parameters that must lie on the Stiefel manifold (optimization is done onto that manifold). The Stiefel manifold corresponds to the orthonormal parameters \(U \in \mathrm{St}(n,m)\), i.e., all \(U \in \mathbb{R}^{n \times m}\) such that \(U^\top U = I\). The dimensions \(n\) and \(m\) are proper to each parameter.
Slow
Parameters lying on the Euclidean manifold (standard optimization). The optimization is done onto \(\mathbb{R}^{n \times m}\), \(n\) and \(m\) depending on the size of each parameter. The specificity of these slow Euclidean parameters is that they are better trained with a lower learning rate that the others, hence their name and the necessity to group them apart.

Hyperparameters Dictionaries

This is relevant for automatically recording values before, during of after the training. All the relevant hyperparameters are listed into two dictionaries.

Fixed Hyperparameters
The attribute hparams_fixed return the fixed hyperparameters of the module. By contrast with hparams_variable, these are the values that are fixed and cannot possibly change during the training. If applicable, these can be specific architecture values for example.
Variable Hyperparameters
The attribute hparams_variable return the fixed hyperparameters of the module. By contrast with hparams_fixed, these are the values that are may change during the training and may be monitored at various stages during the training. If applicable, these can be kernel bandwidth parameters for example.

Abstract Class

class kerch.feature.Module(*args, **kwargs)[source]

Bases: Logger, Module, object

Parameters:: logging_level (int, optional) – Logging level for this specific instance. If the value is None, the current default kerch global log level will be used. Defaults to None (default kerch logging level). We refer to the Logging in Kerch documentation for further information.

_euclidean_parameters(recurse=True) → Iterator[Parameter][source]

Iterator yielding all parameters lying on the Euclidean manifold (standard optimization). The optimization is done onto \(\mathbb{R}^{n \times m}\), \(n\) and \(m\) depending on the size of each parameter.

Parameters:: recurse (bool, optional) – If True, yields both the Euclidean parameters of this module and its potential children. otherwise, it only yields Euclidean parameters from this module. Defaults to True.
Returns:: Euclidean parameters
Return type:: Iterator[torch.nn.Parameter]

_slow_parameters(recurse=True) → Iterator[Parameter][source]

Iterator yielding all parameters lying on the Euclidean manifold (standard optimization). The optimization is done onto \(\mathbb{R}^{n \times m}\), \(n\) and \(m\) depending on the size of each parameter.

The specificity of these slow Euclidean parameters is that they are better trained with a lower learning rate that the others, hence their name and the necessity to group them apart.

Parameters:: recurse (bool, optional) – If True, yields both the slow (Euclidean) parameters of this module and its potential children. otherwise, it only yields slow (Euclidean) parameters from this module. Defaults to True.
Returns:: Slow (Euclidean) parameters
Return type:: Iterator[torch.nn.Parameter]

_stiefel_parameters(recurse=True) → Iterator[Parameter][source]

Iterator yielding all parameters that must lie on the Stiefel manifold (optimization is done onto that manifold). The Stiefel manifold corresponds to the orthonormal parameters \(U \in \mathrm{St}(n,m)\), i.e., all \(U \in \mathbb{R}^{n \times m}\) such that \(U^\top U = I\). The dimensions \(n\) and \(m\) are proper to each parameter.

Parameters:: recurse (bool, optional) – If True, yields both the Stiefel parameters of this module and its potential children. otherwise, it only yields Stiefel parameters from this module. Defaults to True.
Returns:: Stiefel parameters
Return type:: Iterator[torch.nn.Parameter]

after_step() → None[source]: Specific operations to be performed after a training step. We refer to the documentation of Kerch Module for further information.

before_step() → None[source]: Specific operations to be performed before a training step. We refer to the documentation of Kerch Module for further information.

property hparams_fixed: dict: Fixed hyperparameters of the module. By contrast with hparams_variable, these are the values that are fixed and cannot possibly change during the training. If applicable, these can be specific architecture values for example. We refer to the documentation of Kerch Module for further information.

property hparams_variable: dict

Variable hyperparameters of the module. By contrast with hparams_fixed, these are the values that are may change during the training and may be monitored at various stages during the training. If applicable, these can be kernel bandwidth parameters for example.

Note

All parameters that are potentially trainable, like a kernel bandwidth \(\sigma\) for example, are included in this dictionary, even if the corresponding trainable argument is set to False. In the latter case, they will be not evolve during training iterations, but will still be included in this dictionary.

We refer to the documentation of Kerch Module for further information.

property logging_level: int: Logging level of this specific instance. If the value is None, the current default kerch global log Level will be used. Defaults to None (default global kerch level). We refer to the Logging in Kerch documentation for further information.

manifold_parameters(recurse=True, type='euclidean') → Iterator[Parameter][source]

Iterator yielding the parameters of a specific type. A distinction is made between three types:

'euclidean':
parameters lying on the Euclidean manifold (standard optimization). The optimization is done onto \(\mathbb{R}^{n \times m}\), \(n\) and \(m\) depending on the size of each parameter.
'stiefel':
parameters that must lie on the Stiefel manifold (optimization is done onto that manifold). The Stiefel manifold corresponds to the orthonormal parameters \(U \in \mathrm{St}(n,m)\), i.e., all \(U \in \mathbb{R}^{n \times m}\) such that \(U^\top U = I\). The dimensions \(n\) and \(m\) are proper to each parameter.
'slow':
parameters lying on the Euclidean manifold (standard optimization). The optimization is done onto \(\mathbb{R}^{n \times m}\), \(n\) and \(m\) depending on the size of each parameter. The specificity of these slow Euclidean parameters is that they are better trained with a lower learning rate that the others, hence their name and the necessity to group them apart.

We refer to the documentation of Kerch Module for further information.

Parameters:

type (str, optional) – Which parameters group the method must return. The three values above are accepted. Defaults to 'euclidean'.
recurse (bool, optional) – If True, yields both the specified parameters of this module and its potential children. otherwise, it only yields the specified parameters from this module. Defaults to True.

Returns:

Parameters of the specified type

Return type:

Iterator[torch.nn.Parameter]

Examples

KPCA

In the following example, we create a kerch.level.KPCA level based on a kerch.kernel.RBF kernel where we specify that only the level parameters are trainable by gradient descent.

import kerch
import torch

x = torch.randn(5, 3)
kpca = kerch.level.KPCA(sample=x,                 # random sample
                        kernel_type='rbf',        # we use a rbf kernel (this is the default value, but we specify it for clarity)
                        sigma=2,                  # we specify a RBF bandwidth value
                        representation='dual',    # we work in dual representation (also default value, but specified for clarity)
                        dim_output=2,             # we want an output dimension of 2 (the input is 3)
                        sample_trainable=False,   # the sample can be trained, but we don't want that: we want it fixed
                        sigma_trainable=False,    # the sigma can also be trained, but we don't want that either
                        level_trainable=True)     # the level parameters are trainable, meaning that the eigenvectors are trainable by gradient

We indeed see that from all parameters printed, only the eigenvectors hidden (lying on the Stiefel manifold) have requires_grad=True. The Euclidean parameters correspond here to the sample sample, that we do not want to be trained. The slow (Euclidean) parameters correspond to the bandwidth of the kernel sigma, which we also want to remain fixed and not optimized through gradient descent.

# Euclidean parameters
print('EUCLIDEAN PARAMETERS:')
for p in kpca.manifold_parameters(type='euclidean'):
    print(p)

# Stiefel parameters
print('\nSTIEFEL PARAMETERS:')
for p in kpca.manifold_parameters(type='stiefel'):
    print(p)

# Slow (Euclidean) parameters
print('\nSLOW (EUCLIDEAN) PARAMETERS:')
for p in kpca.manifold_parameters(type='slow'):
    print(p)

EUCLIDEAN PARAMETERS:
Parameter containing:
tensor([[-0.2198,  0.2534,  1.1931],
        [-1.3320, -0.3936, -1.1128],
        [-0.2864, -1.6942, -0.3753],
        [ 1.4518, -1.2399, -0.8248],
        [-0.3917,  0.1601, -1.9797]])

STIEFEL PARAMETERS:
Parameter containing:
tensor([[-0.5873, -0.1407,  0.6471, -0.1022,  0.4541],
        [ 0.5422, -0.0858, -0.0535, -0.5499,  0.6272]], requires_grad=True)

SLOW (EUCLIDEAN) PARAMETERS:
Parameter containing:
tensor(2.)

We can have a look at the hyperparameters. The parameter sigma, even if not trainable is always listed in the variable hyperparameters. Its value will just not change during the training.

print('FIXED HYPERPARAMETERS:')
for key, value in kpca.hparams_fixed.items():
    print(key, ":", value)

print('\nVARIABLE HYPERPARAMETERS:')
for key, value in kpca.hparams_variable.items():
    print(key, ":", value)

FIXED HYPERPARAMETERS:
Level type : KPCA
Level eta : 1.0
Kernel : Exponential
Squared exp. distance : True
Trainable sigma : False
Default kernel transforms : []
Output dimension : 2
Representation : dual
Parameters trainable : True
Constraint : soft
Input dimension : 3
Trainable sample : False
Default sample transforms : []

VARIABLE HYPERPARAMETERS:
Kernel parameter sigma : 2.0

Creating a Module

In this example, we create a module containing a parameter on the Euclidean manifold. We therefore overwrite the _euclidean_parameters() method and not forget to call the inherited classes to not forget to return all parameters returned by the mother classes.

import kerch
import torch
from typing import Iterator


class MyModule(kerch.feature.Module):
    def __init__(self, *args, **kwargs):
        super(MyModule, self).__init__(*args, **kwargs)

        # we recover the parameter size by the argument param_size
        param_size = kwargs.pop('param_size', (1, 1))

        # we create our parameter of float type kerch.FTYPE
        # (this value can be modified and ensures that all floating types are the same throughout the code)
        self.my_param = torch.nn.Parameter(torch.randn(param_size, dtype=kerch.FTYPE), requires_grad=True)

    def _euclidean_parameters(self, recurse=True) -> Iterator[torch.nn.Parameter]:
        # important not to forget, otherwise the parameters returned by mother classes will be skipped
        yield from super(MyModule, self)._euclidean_parameters(recurse=recurse)

        # we yield our additional new parameter
        yield self.my_param

    def after_step(self):
        # after each training step, we want the columns to be centered
        with torch.no_grad():
            self.my_param.data = self.my_param - torch.mean(self.my_param, dim=0)

    @property
    def hparams_fixed(self) -> dict:
        # we add the shape of our parameter to the fixed hyperparameters
        # we don't forget to return the other possible hyperparameters issued by parent classes
        return {'my_param size': self.my_param.shape,
                **super(MyModule, self).hparams_fixed}

# We instantiate our class
my_module = MyModule(param_size=(2, 3))

We can have a look at the parameters. The same can be done for the _stiefel_parameters() and _slow_parameters() methods.

# Euclidean parameters
print('EUCLIDEAN PARAMETERS:')
for p in my_module.manifold_parameters(type='euclidean'):
    print(p)

# Stiefel parameters
print('\nSTIEFEL PARAMETERS:')
for p in my_module.manifold_parameters(type='stiefel'):
    print(p)

# Slow (Euclidean) parameters
print('\nSLOW (EUCLIDEAN) PARAMETERS:')
for p in my_module.manifold_parameters(type='slow'):
    print(p)

EUCLIDEAN PARAMETERS:
Parameter containing:
tensor([[ 1.5410, -0.2934, -2.1788],
        [ 0.5684, -1.0845, -1.3986]], requires_grad=True)

STIEFEL PARAMETERS:

SLOW (EUCLIDEAN) PARAMETERS:

If after_step() is called, we can observe that the parameter is centered along the columns.

# we suppose that an optimization step has been performed
my_module.after_step()

# Euclidean parameters
print('EUCLIDEAN PARAMETERS:')
for p in my_module.manifold_parameters(type='euclidean'):
    print(p)

EUCLIDEAN PARAMETERS:
Parameter containing:
tensor([[ 0.4863,  0.3955, -0.3901],
        [-0.4863, -0.3955,  0.3901]], requires_grad=True)

Similarly, let us print the hyperparameters:

print('FIXED HYPERPARAMETERS:')
for key, value in my_module.hparams_fixed.items():
    print(key, ":", value)

print('\nVARIABLE HYPERPARAMETERS:')
for key, value in my_module.hparams_variable.items():
    print(key, ":", value)

FIXED HYPERPARAMETERS:
my_param size : torch.Size([2, 3])

VARIABLE HYPERPARAMETERS:

Inheritance Diagram

digraph inheritance8ea479fc96 { bgcolor=transparent; fontsize=12; rankdir=TB; size="16.0, 20.0"; "kerch.feature.Logger" [URL="logger.html#kerch.feature.Logger",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip=":param logging_level: Logging level for this specific instance."]; "kerch.feature.Module" [URL="#kerch.feature.Module",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip=":param logging_level: Logging level for this specific instance."]; "kerch.feature.Logger" -> "kerch.feature.Module" [arrowsize=0.5,style="setlinewidth(0.5)"]; "torch.nn.modules.module.Module" -> "kerch.feature.Module" [arrowsize=0.5,style="setlinewidth(0.5)"]; "torch.nn.modules.module.Module" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Base class for all neural network modules."]; }