Polynomial Kernel
Class
- class kerch.kernel.Polynomial(*args, **kwargs)[source]
-
Polynomial kernel of degree \(\alpha \geq 0\) and parameter \(\beta\).
\[k(x,y) = \left(x^\top y + \beta\right)^\alpha.\]Provided the degree \(\alpha\) is a natural number, this kernel accepts both an explicit feature map and an equivalent kernel formulation not depending on the inner product of the explicit feature maps (implicit). Its components are given by
\[\left[\phi(x)\right]_k = \sqrt{\frac{\alpha!}{j_0!j_1! \ldots j_\texttt{dim_input}!}}x_0^{j_0}x_1^{j_1}\ldots x_{\texttt{dim_input}-1}^{j_{\texttt{dim_input}-1}}\sqrt{\beta}^{j_\texttt{dim_input}},\]where \(k = 0, \ldots, \texttt{dim_feature}-1\) correspond to all permutations satisfying
\[j_0 + j_1 + \ldots + j_{\texttt{dim_input}} = \alpha,\]and
\[\begin{split}\texttt{dim_feature} = \left(\begin{array}{c}\texttt{dim_input} + \alpha \\ \alpha\end{array}\right).\end{split}\]One can verify that \(k(x,y) = \phi(x)^\top\phi(y)\). An example is also given in the Example section of its documentation.
Note
For a natural number degree \(\alpha\) . The computation of a kernel matrix of \(n\) points is typically more efficient to compute as an inner product of the explicit feature map if \(\texttt{dim_feature} < n\) and using the kernel formula otherwise. If the degree is not a natural number, only the latter is possible as the explicit feature map does not exist.
This can be specified when calling
Polynomial.k()by specifying the booleanexplicittoTrue(using the explicit feature map) orFalse(directly using the kernel formula).Other considerations may come into play. If a centered or normalized kernel on an out-of-sample is required, this may require extra computations when directly using the kernel matrix as doing it on the explicit feature is more straightforward.
- Parameters:
alpha (double, optional) – Degree \(\alpha\) of the polynomial kernel. Defaults to 2.
beta (double, optional) – Value \(\beta\) of the polynomial kernel. Defaults to 1.
alpha_trainable (bool, optional) – True if the gradient of the degree is to be computed. If so, a graph is computed and the degree can be updated. False just leads to a static computation., defaults to False
beta_trainable (bool, optional) – True if the gradient of the degree is to be computed. If so, a graph is computed and the degree can be updated. False just leads to a static computation., defaults to False
kernel_transform (List[str]) – A list composed of the elements ‘normalize’ or ‘center’. For example a centered cosine kernel which is centered and normalized in order to get a covariance matrix for example can be obtained by invoking a linear kernel with default_transform = [‘normalize’, ‘center’, ‘normalize’] or just a cosine kernel with default_transform = [‘center’, ‘normalize’]. Redundancy is automatically handled., defaults to [].
sample (Tensor(num_sample, dim_input), optional) – Sample points used to compute the kernel matrix. When an out-of-sample computation is asked, it will be given relative to these samples., defaults to None
sample_trainable (bool, optional) – True if the gradients of the sample points are to be computed. If so, a graph is computed and the sample can be updated. False just leads to a static computation., defaults to False
num_sample (int, optional) – Number of sample points. This parameter is neglected if sample is not None and overwritten by the number of points contained in sample., defaults to 1
dim_input (int, optional) – Dimension of each sample point. This parameter is neglected if sample is not None and overwritten by the dimension of the sample points., defaults to 1
idx_sample (float, optional) – Initializes the indices of the samples to be updated. All indices are considered if both idx_stochastic and prop_stochastic are None., defaults to None
prop_sample – Instead of giving indices, specifying a proportion of the original sample set is also possible. The indices will be uniformly randomly chosen without replacement. The value must be chosen such that \(0 <\) prop_stochastic \(\leq 1\). All indices are considered if both idx_stochastic and prop_stochastic are None., defaults to None.
sample_transform (List[str]) – TODO
cache_level (str, optional) – Cache level for saving temporary execution results during the execution. The higher the cache, the more is saved. Defaults to
'normal'. We refer to the Cache Management documentation for further information.logging_level (int, optional) – Logging level for this specific instance. If the value is
None, the current default kerch global log level will be used. Defaults toNone(default kerch logging level). We refer to the Logging in Kerch documentation for further information.
- property C: Tensor
Returns the explicit matrix on the sample datapoints.
\[C = \frac{1}{\texttt{num_idx}}\sum_i^\texttt{num_idx} \phi(x_i)\phi(x_i)^\top.\]
- property K: Tensor
Returns the kernel matrix on the sample data. Same result as calling
k(), but faster. It is loaded from memory if already computed and unchanged since then, to avoid re-computation when recurrently called.\[K_{ij} = k(x_i,x_j).\]
- property Phi: Tensor
Returns the explicit feature map \(\phi(\cdot)\) of the sample datapoints. Same as calling
phi(), but slightly faster. It is loaded from memory if already computed and unchanged since then, to avoid re-computation when recurrently called.
- property alpha
Degree of the polynomial. This is argument plays a similar role to the bandwidth of an exponential kernel, such as the RBF kernel.
Note
The explicit feature map only exists if the degree is a natural number.
- property alpha_trainable: bool
Boolean indicating if the alpha/degree is trainable. In other words, this argument provides or not a gradient to the degree for potential gradient-based training.
- property beta
Beta of the polynomial.
- c(x=None, transform=None) Tensor
Out-of-sample explicit matrix.
\[C = \frac{1}{\texttt{num_x}}\sum_{i}^{\texttt{num_x}} \phi(x_i)\phi(x_i)^\top.\]- Parameters:
x (Tensor(num_x,dim_input), optional) – Out-of-sample points (first dimension). If None, the default sample will be used., defaults to None
- Returns:
Explicit matrix
- Return type:
Tensor[dim_feature, dim_feature]
- cache_keys(private: bool = False) Iterable[str]
Returns an iterable containing the different cache keys. We refer to the Cache Management documentation for more information.
- Parameters:
private (bool, optional) – Some cache elements are private and are not returned unless set to
True. Defaults toFalse.
- property cache_level: str
Cache level for saving temporary execution results during the execution. The higher the cache, the more is saved. Defaults to
'normal'unless set otherwise during instantiation. The different possible values are:"none": the cache is non-existent and everything is computed on the go."light": the cache is very light. For example, only the kernel matrix and statistics of the sample points are saved."normal": same as light, but the statistics of the out-of-sample points are also saved."heavy": in addition to the statistics, the final kernel matrices of the out-of-sample points are saved."total": every step of any computation is saved.
We refer to the Cache Management documentation for further information.
- property centered: bool
Indicates whether the feature map is centered relative to its sample or equivalently is the kernel in centered in its RKHS space, spanned by the sample.
- corr(x=None) Tensor
Returns the correlation matrix fo the provided input.
- Parameters:
x (Tensor[N, dim_input], optional) – Out-of-sample points (first dimension). If None, the default sample will be used., defaults to None
- Returns:
Correlation matrix
- Return type:
Tensor[dim_feature, dim_feature]
- cov(x=None) Tensor
Returns the covariance matrix fo the provided input.
- Parameters:
x (Tensor[N, dim_input], optional) – Out-of-sample points (first dimension). If None, the default sample will be used. Defaults to None.
- Returns:
Covariance matrix
- Return type:
Tensor[dim_feature, dim_feature]
- property current_sample_projected: Tensor
Returns the sample that is currently used in the computations and for the normalizing and centering statistics if relevant.
- property dim_feature: int | inf
Feature dimension. Provided the degree \(\alpha\) is a natural number, it is given by
\[\begin{split}\texttt{dim_feature} = \left(\begin{array}{c}\texttt{dim_input} + \alpha \\ \alpha\end{array}\right).\end{split}\]If the degree \(\alpha\) is not a natural number, the explicit feature does not exist and by consequence the feature dimension is infinite.
- property explicit: bool
True if the method has an explicit formulation, False otherwise. This is the case only if the degree \(\alpha\) is a natural number.
- explicit_preimage(phi: Tensor)[source]
Computes a pre-image of an explicit feature map of the kernel, given by
phi_image. Different methods are available:'explicit': Uses an explicit implementation specific to the kernel (if available). This is always preferable if available.'knn': Nearest neighbors. We refer tokerch.method.knn()for more details.'smoother': Kernel smoothing. We refer tokerch.method.smoother()for more details'iterative': Iterative optimization. We refer tokerch.method.iterative_preimage_phi()for more details
- Parameters:
phi_image (torch.Tensor [num_points, dim_feature], optional) – Explicit feature map image to be inverted. If not specified (
None), the explicit feature map on the sample is used.method (str, optional) – Pre-image method to be used. Defaults to
'explicit'.**kwargs (dict, optional) – Additional parameters of the pre-image method used. Please refer to its documentation for further details.
- Returns:
Pre-image
- Return type:
torch.Tensor [num_points, dim_input]
- forward(x, representation='dual') Tensor
Passes datapoints through the kernel.
- Parameters:
x (Tensor(,dim_input)) – Datapoints to be passed through the kernel.
representation (str, optional) – Chosen representation. If dual, an out-of-sample kernel matrix is returned. If primal is specified, it returns the explicit feature map., defaults to dual
- Returns:
Out-of-sample kernel matrix or explicit feature map depending on representation.
- Raises:
RepresentationError
- property hparams_fixed
Dictionnary containing the hyper-parameters and their values. This can be relevant for monitoring.
- property hparams_variable
Dictionnary containing the parameters and their values. This can be relevant for monitoring.
- property idx: Tensor
Indices used when performing various operations. This is only relevant in the case of stochastic training.
- implicit_preimage(k_image: Tensor | None = None, method: str = 'knn', **kwargs)
Computes a pre-image of coefficients in the RKHS of the kernel, given by
k_image. Different methods are available:'knn': Nearest neighbors. We refer tokerch.method.knn()for more details.'smoother': Kernel smoothing. We refer tokerch.method.smoother()for more details'iterative': Iterative optimization. We refer tokerch.method.iterative_preimage_k()for more details
- Parameters:
k_image (torch.Tensor [num_points, num_idx], optional) – RKHS coefficients to be inverted. If not specified (
None), the kernel matrix on the sample is used.method (str, optional) – Pre-image method to be used. Defaults to
'knn'.**kwargs (dict, optional) – Additional parameters of the pre-image method used. Please refer to its documentation for further details.
- Returns:
Pre-image
- Return type:
torch.Tensor [num_points, dim_input]
- init_sample(sample=None, idx_sample=None, prop_sample=None)
Initializes the sample set (and the stochastic indices).
- Parameters:
sample (Tensor, optional) – Sample points used for the various computations. When an out-of-sample computation is asked, it will be given relative to these samples. In case of overwriting a current sample, num_sample and dim_input are also overwritten. If None is specified, the sample data will be initialized according to num_sample and dim_input specified during the construction. If a previous sample set has been used, it will keep the same dimension by consequence. A last case occurs when sample is of the class torch.nn.Parameter: the sample will then use those values, and they can thus be shared with the level calling this method., defaults to None
idx_sample (int[], optional) – Initializes the indices of the samples to be updated. All indices are considered if both idx_sample and prop_sample are None., defaults to None
prop_sample – Instead of giving indices, specifying a proportion of the original sample set is also possible. The indices will be uniformly randomly chosen without replacement. The value must be chosen such that \(0 <\) prop_sample \(\leq 1\). All indices are considered if both idx_sample and prop_sample are None., defaults to None.
- k(x=None, y=None, explicit=None, transform=None) Tensor[source]
Note
For the specific case of the polynomial kernel, the optimal value for
explicitis automatically determined based on the size of the inputs ifexplicit=None. This does not take into account possible transforms.Returns a kernel matrix, either of the sample, either out-of-sample, either fully out-of-sample.
\[K = [k(x_i,y_j)]_{i,j=1}^{\texttt{num_x}, \texttt{num_y}},\]with \(\{x_i\}_{i=1}^\texttt{num_x}\) the out-of-sample points (x) and \(\{y_i\}_{j=1}^\texttt{num_y}\) the sample points (y).
Note
In the case of centered kernels on an out-of-sample, this computation is more expensive as it requires to center according to the sample data, which implies computing a statistic on the out-of-sample kernel matrix and thus also computing it.
- Parameters:
x (Tensor[num_x, dim_input], optional) – Out-of-sample points (first dimension). If None, the default sample will be used. Defaults to
Noney (Tensor[num_y, dim_input], optional) – Out-of-sample points (second dimension). If None, the default sample will be used. Defaults to
None
- Returns:
Kernel matrix
- Return type:
Tensor[num_x, num_y]
- Raises:
ExplicitError
- property kernel_transform: TransformTree
Default transform performed on the kernel
- property normalized: bool
Indicates whether the feature map is centered relative to its sample or equivalently is the kernel in centered in its RKHS space, spanned by the sample.
- property num_idx: int
Number of selected indices when performing various operations. This is only relevant in the case of stochastic training.
- phi(x=None, transform=None) Tensor
Returns the explicit feature map \(\phi(x)\) of the specified points.
- Parameters:
x (Tensor[num_x, dim_input], optional) – The datapoints serving as input of the explicit feature map. If None, the sample will be used. Defaults to
None.- Returns:
Explicit feature map \(\phi(x)\) of the specified points.
- Return type:
Tensor[num_x, dim_feature]
- Raises:
ExplicitError
- print_cache(private: bool = False) None
Prints the cache content. We refer to the Cache Management documentation for further information.
- Parameters:
private (bool, optional) – Some cache elements are private and are not returned unless set to
True. Defaults toFalse.
- reset(recurse=False, reset_persisting=True) None
Resets the cache to be empty. We refer to the Cache Management documentation for more information.
- Parameters:
recurse (bool, optional) – If
True, resets the cache of this module and also of its potential children. otherwise, it only resets the cache for this module. Defaults toTrue.reset_persisting (bool, optional) – Persisting elements are meant to resist to a cache reset (see
_save()). The option allows to also reset them ifTrue. Defaults toTrue.
- property sample: Parameter
Full original raw sample without any transform or potential stochastic selection.
- property sample_transform: TransformTree
Default transform used by the sample.
- stochastic(idx=None, prop=None)
Resets which subset of the samples are to be used until the next call of this function. This is relevant in the case of stochastic training.
- Parameters:
idx (int[], optional) – Indices of the sample subset relative to the original sample set., defaults to None
prop (double, optional) – Instead of giving indices, passing a proportion of the original sample set is also possible. The indices will be uniformly randomly chosen without replacement. The value must be chosen such that \(0 <\) prop_stochastic \(\leq 1\)., defaults to None.
If None is specified for both idx_stochastic and prop_stochastic, all samples are used and the subset equals the original sample set. This is also the default behavior if this function is never called, nor the parameters specified during initialization.
Note
Both idx_stochastic and prop_stochastic cannot be filled together as conflict would arise.
- train(mode=True)
Activates the training mode, which disables the gradients computation and disables stochasticity. For the gradients and other things, we refer to the torch.nn.Module documentation. For the stochastic part, when put in evaluation mode (False), all the sample points are used for the computations, regardless of the previously specified indices.
- transform_sample_revert(data) Tensor
Get back the original value from a projected value, by using the same transform as the sample, but in reverse. This is not always feasible, depending on the transform used (normalizations are typically not invertible as they are transform which are not bijective).
- update_sample(sample_values, idx_sample=None)
Updates the sample set. In contradiction to init_samples, this only updates the values of the sample and sets the gradients of the updated values to zero if relevant.
- Parameters:
sample_values (Tensor) – Values given to the updated samples.
idx_sample (int[], optional) – Indices of the samples to be updated. All indices are considered if None., defaults to None
Examples
Sine
import kerch
import numpy as np
from matplotlib import pyplot as plt
x = np.sin(np.arange(50) / np.pi) + 1.5
plt.figure(0)
plt.plot(x)
k = kerch.kernel.Polynomial(sample=x)
plt.figure(1)
plt.imshow(k.K)
plt.title(f"Alpha = {k.alpha}, Beta = {k.beta}")
plt.colorbar()
Factory
The following lines are equivalent:
k = kerch.kernel.Polynomial(**kwargs)
k = kerch.kernel.factory(kernel_type='polynomial', **kwargs)
Influence of the parameters
import kerch
import numpy as np
from matplotlib import pyplot as plt
x = np.sin(np.arange(50) / np.pi)
k1 = kerch.kernel.Polynomial(sample=x, alpha=2, beta=1)
k2 = kerch.kernel.Polynomial(sample=x, alpha=2, beta=5)
k3 = kerch.kernel.Polynomial(sample=x, alpha=5, beta=1)
k4 = kerch.kernel.Polynomial(sample=x, alpha=5, beta=5)
fig, axs = plt.subplots(2, 2)
axs[0,0].imshow(k1.K)
axs[0,0].set_title(f"Alpha = {k1.alpha}, Beta = {k1.beta}")
axs[0,1].imshow(k2.K)
axs[0,1].set_title(f"Alpha = {k2.alpha}, Beta = {k2.beta}")
axs[1,0].imshow(k3.K)
axs[1,0].set_title(f"Alpha = {k3.alpha}, Beta = {k3.beta}")
im = axs[1,1].imshow(k4.K)
axs[1,1].set_title(f"Alpha = {k4.alpha}, Beta = {k4.beta}")
for ax in axs.flat:
ax.set_xticks([])
ax.set_yticks([])
fig.colorbar(im, ax=axs.ravel().tolist(), orientation='horizontal')
(Source code, png, hires.png, pdf)
In the following example, the kernels are also to ease the readability of the effects. The diagonal always becomes the unity when normalizing, hence the more pronounced difference with the non-normalized example above.
import kerch
import numpy as np
from matplotlib import pyplot as plt
x = np.sin(np.arange(50) / np.pi)
k1 = kerch.kernel.Polynomial(sample=x, alpha=2, beta=1, kernel_transform=['normalize'])
k2 = kerch.kernel.Polynomial(sample=x, alpha=2, beta=5, kernel_transform=['normalize'])
k3 = kerch.kernel.Polynomial(sample=x, alpha=5, beta=1, kernel_transform=['normalize'])
k4 = kerch.kernel.Polynomial(sample=x, alpha=5, beta=5, kernel_transform=['normalize'])
fig, axs = plt.subplots(2, 2)
axs[0,0].imshow(k1.K)
axs[0,0].set_title(f"Alpha = {k1.alpha}, Beta = {k1.beta}")
axs[0,1].imshow(k2.K)
axs[0,1].set_title(f"Alpha = {k2.alpha}, Beta = {k2.beta}")
axs[1,0].imshow(k3.K)
axs[1,0].set_title(f"Alpha = {k3.alpha}, Beta = {k3.beta}")
im = axs[1,1].imshow(k4.K)
axs[1,1].set_title(f"Alpha = {k4.alpha}, Beta = {k4.beta}")
for ax in axs.flat:
ax.set_xticks([])
ax.set_yticks([])
fig.colorbar(im, ax=axs.ravel().tolist(), orientation='horizontal')
(Source code, png, hires.png, pdf)
Explicit and Implicit
The polynomial kernel can have its kernel matrix computed through the explicit feature map as \(k(x,y) = \phi(x)^\top\phi(y)\) and implicitly using \(k(x,y) = \left(x^\top y + \beta\right)^\alpha\). The following confirms that both are equivalent.
import kerch
import torch
from matplotlib import pyplot as plt
num, dim_input = 10,3
x = torch.randn(num, dim_input)
oos = torch.randn(num, dim_input)
k = kerch.kernel.Polynomial(sample=x, alpha=3, kernel_transform=['center', 'normalize'])
fig, axs = plt.subplots(2, 2)
axs[0,0].imshow(k.k(explicit=True))
axs[0,0].set_title("Explicit (sample)")
axs[0,1].imshow(k.k(explicit=False))
axs[0,1].set_title("Implicit (sample)")
axs[1,0].imshow(k.k(x=oos, y=oos, explicit=True))
axs[1,0].set_title("Implicit (out-of-sample)")
im = axs[1,1].imshow(k.k(x=oos, y=oos, explicit=False))
axs[1,1].set_title("Explicit (out-of-sample)")
for ax in axs.flat:
ax.set_xticks([])
ax.set_yticks([])
fig.colorbar(im, ax=axs.ravel().tolist(), orientation='horizontal')
(Source code, png, hires.png, pdf)