Commit 14296494 authored by ljia's avatar ljia

* ADD *treelet kernel* and its result on dataset Asyclic. - linlin

* MOD the way to calculate WL subtree kernel, correct its results. - linlin
* ADD *kernel_train_test* and *split_train_test* to wrap training and testing process. - linlin
* MOD readme.md file, add detailed results of each kernel. - linlin
parent c0fea85d
# py-graph
a python package for graph kernels.
A python package for graph kernels.
## requirements
## Requirements
* numpy - 1.13.3
* scipy - 1.0.0
......@@ -10,18 +10,24 @@ a python package for graph kernels.
* sklearn - 0.19.1
* tabulate - 0.8.2
## results with minimal test RMSE for each kernel on dataset Asyclic
-- All the kernels are tested on dataset Asyclic, which consists of 185 molecules (graphs).
-- The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
-- For predition we randomly divide the data in train and test subset, where 90% of entire dataset is for training and rest for testing. 10 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
## Results with minimal test RMSE for each kernel on dataset Asyclic
All kernels are tested on dataset Asyclic, which consists of 185 molecules (graphs).
| Kernels | RMSE(℃) | std(℃) | parameter | k_time |
The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
For predition we randomly divide the data in train and test subset, where 90% of entire dataset is for training and rest for testing. 10 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
| Kernels | RMSE(℃) | STD(℃) | Parameter | k_time |
|---------------|:---------:|:--------:|-------------:|-------:|
| shortest path | 36.40 | 5.35 | - | - |
| marginalized | 17.90 | 6.59 | p_quit = 0.1 | - |
| path | 14.27 | 6.37 | - | - |
| WL subtree | 9.00 | 6.37 | height = 1 | 0.85" |
| Shortest path | 35.19 | 4.50 | - | 14.58" |
| Marginalized | 18.02 | 6.29 | p_quit = 0.1 | 4'19" |
| Path | 14.00 | 6.93 | - | 36.21" |
| WL subtree | 7.55 | 2.33 | height = 1 | 0.84" |
| Treelet | 8.31 | 3.38 | - | 49.58" |
**In each line, paremeter is the one with which the kenrel achieves the best results.
In each line, k_time is the time spent on building the kernel matrix.
See detail results in [results.md](pygraph/kernels/results.md).**
* RMSE stands for arithmetic mean of the root mean squared errors on all splits.
* STD stands for standard deviation of the root mean squared errors on all splits.
* Paremeter is the one with which the kenrel achieves the best results.
* k_time is the time spent on building the kernel matrix.
* The targets of training data are normalized before calculating *path kernel* and *treelet kernel*.
* See detail results in [results.md](pygraph/kernels/results.md).
This diff is collapsed.
This diff is collapsed.
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The line_profiler extension is already loaded. To reload it, use:\n",
" %reload_ext line_profiler\n",
"\n",
" --- This is a regression problem ---\n",
"\n",
"\n",
"\n",
" Loading dataset from file...\n",
"\n",
" Calculating kernel matrix, this could take a while...\n",
"--- shortest path kernel matrix of size 185 built in 14.576777696609497 seconds ---\n",
"[[ 3. 1. 3. ..., 1. 1. 1.]\n",
" [ 1. 6. 1. ..., 0. 0. 3.]\n",
" [ 3. 1. 3. ..., 1. 1. 1.]\n",
" ..., \n",
" [ 1. 0. 1. ..., 55. 21. 7.]\n",
" [ 1. 0. 1. ..., 21. 55. 7.]\n",
" [ 1. 3. 1. ..., 7. 7. 55.]]\n",
"\n",
" Saving kernel matrix to file...\n",
"\n",
" Mean performance on train set: 28.360361\n",
"With standard deviation: 1.357183\n",
"\n",
" Mean performance on test set: 35.191954\n",
"With standard deviation: 4.495767\n",
"\n",
"\n",
" RMSE_test std_test RMSE_train std_train k_time\n",
"----------- ---------- ------------ ----------- --------\n",
" 35.192 4.49577 28.3604 1.35718 14.5768\n"
]
}
],
"source": [
"%load_ext line_profiler\n",
"\n",
"import sys\n",
"sys.path.insert(0, \"../\")\n",
"from pygraph.utils.utils import kernel_train_test\n",
"from pygraph.kernels.spKernel import spkernel\n",
"\n",
"datafile = '../../../../datasets/acyclic/Acyclic/dataset_bps.ds'\n",
"kernel_file_path = 'kernelmatrices_path_acyclic/'\n",
"\n",
"kernel_para = dict(edge_weight = 'atom')\n",
"\n",
"kernel_train_test(datafile, kernel_file_path, spkernel, kernel_para, normalize = False)\n",
"\n",
"# %lprun -f spkernel \\\n",
"# kernel_train_test(datafile, kernel_file_path, spkernel, kernel_para, normalize = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# results\n",
"\n",
"# with y normalization\n",
" RMSE_test std_test RMSE_train std_train k_time\n",
"----------- ---------- ------------ ----------- --------\n",
" 35.6337 5.23183 32.3805 3.92531 14.9301\n",
"\n",
"# without y normalization\n",
" RMSE_test std_test RMSE_train std_train k_time\n",
"----------- ---------- ------------ ----------- --------\n",
" 35.192 4.49577 28.3604 1.35718 14.5768"
]
},
{
"cell_type": "code",
"execution_count": 5,
......
This diff is collapsed.
This source diff could not be displayed because it is too large. You can view the blob instead.
# py-graph
a python package for graph kernels.
A python package for graph kernels.
## requirements
## Requirements
* numpy - 1.13.3
* scipy - 1.0.0
......@@ -10,23 +10,34 @@ a python package for graph kernels.
* sklearn - 0.19.1
* tabulate - 0.8.2
## results with minimal test RMSE for each kernel on dataset Asyclic
-- All the kernels are tested on dataset Asyclic, which consists of 185 molecules (graphs).
-- The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
-- For predition we randomly divide the data in train and test subset, where 90% of entire dataset is for training and rest for testing. 10 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
## Results with minimal test RMSE for each kernel on dataset Asyclic
All kernels are tested on dataset Asyclic, which consists of 185 molecules (graphs).
| Kernels | RMSE(℃) | std(℃) | parameter | k_time |
The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
For predition we randomly divide the data in train and test subset, where 90% of entire dataset is for training and rest for testing. 10 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
| Kernels | RMSE(℃) | STD(℃) | Parameter | k_time |
|---------------|:---------:|:--------:|-------------:|-------:|
| shortest path | 36.40 | 5.35 | - | - |
| marginalized | 17.90 | 6.59 | p_quit = 0.1 | - |
| path | 14.27 | 6.37 | - | - |
| WL subtree | 9.00 | 6.37 | height = 1 | 0.85" |
| Shortest path | 35.19 | 4.50 | - | 14.58" |
| Marginalized | 18.02 | 6.29 | p_quit = 0.1 | 4'19" |
| Path | 14.00 | 6.93 | - | 36.21" |
| WL subtree | 7.55 | 2.33 | height = 1 | 0.84" |
| Treelet | 8.31 | 3.38 | - | 49.58" |
**In each line, paremeter is the one with which the kenrel achieves the best results.
In each line, k_time is the time spent on building the kernel matrix.
See detail results in [results.md](pygraph/kernels/results.md).**
* RMSE stands for arithmetic mean of the root mean squared errors on all splits.
* STD stands for standard deviation of the root mean squared errors on all splits.
* Paremeter is the one with which the kenrel achieves the best results.
* k_time is the time spent on building the kernel matrix.
* The targets of training data are normalized before calculating *path kernel* and *treelet kernel*.
* See detail results in [results.md](pygraph/kernels/results.md).
## updates
## Updates
### 2018.01.16
* ADD *treelet kernel* and its result on dataset Asyclic. - linlin
* MOD the way to calculate WL subtree kernel, correct its results. - linlin
* ADD *kernel_train_test* and *split_train_test* to wrap training and testing process. - linlin
* MOD readme.md file, add detailed results of each kernel. - linlin
### 2017.12.22
* ADD calculation of the time spend to acquire kernel matrices for each kernel. - linlin
* MOD floydTransformation function, calculate shortest paths taking into consideration user-defined edge weight. - linlin
......@@ -35,13 +46,13 @@ See detail results in [results.md](pygraph/kernels/results.md).**
### 2017.12.21
* MOD Weisfeiler-Lehman subtree kernel and the test code. - linlin
### 2017.12.20
* ADD Weisfeiler-Lehman subtree kernel and its result on dataset Asyclic. - linlin
* ADD *Weisfeiler-Lehman subtree kernel* and its result on dataset Asyclic. - linlin
### 2017.12.07
* ADD mean average path kernel and its result on dataset Asyclic. - linlin
* ADD *mean average path kernel* and its result on dataset Asyclic. - linlin
* ADD delta kernel. - linlin
* MOD reconstruction the code of marginalized kernel. - linlin
### 2017.12.05
* ADD marginalized kernel and its result. - linlin
* ADD *marginalized kernel* and its result. - linlin
* ADD list required python packages in file README.md. - linlin
### 2017.11.24
* ADD shortest path kernel and its result. - linlin
* ADD *shortest path kernel* and its result. - linlin
......@@ -8,7 +8,7 @@ import time
from pygraph.kernels.deltaKernel import deltakernel
def marginalizedkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
def marginalizedkernel(*args, node_label = 'atom', edge_label = 'bond_type', p_quit = 0.5, itr = 20):
"""Calculate marginalized graph kernels between graphs.
Parameters
......@@ -18,14 +18,14 @@ def marginalizedkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
node_label : string
node attribute used as label. The default node label is atom.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
p_quit : integer
the termination probability in the random walks generating step
itr : integer
time of iterations to calculate R_inf
node_label : string
node attribute used as label. The default node label is atom.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
Return
------
......@@ -36,7 +36,7 @@ def marginalizedkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
----------
[1] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003.
"""
if len(args) == 3: # for a list of graphs
if len(args) == 1: # for a list of graphs
Gn = args[0]
Kmatrix = np.zeros((len(Gn), len(Gn)))
......@@ -44,7 +44,7 @@ def marginalizedkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
for i in range(0, len(Gn)):
for j in range(i, len(Gn)):
Kmatrix[i][j] = _marginalizedkernel_do(Gn[i], Gn[j], node_label, edge_label, args[1], args[2])
Kmatrix[i][j] = _marginalizedkernel_do(Gn[i], Gn[j], node_label, edge_label, p_quit, itr)
Kmatrix[j][i] = Kmatrix[i][j]
run_time = time.time() - start_time
......@@ -56,7 +56,7 @@ def marginalizedkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
start_time = time.time()
kernel = _marginalizedkernel_do(args[0], args[1], node_label, edge_label, args[2], args[3])
kernel = _marginalizedkernel_do(args[0], args[1], node_label, edge_label, p_quit, itr)
run_time = time.time() - start_time
print("\n --- marginalized kernel built in %s seconds ---" % (run_time))
......@@ -64,7 +64,7 @@ def marginalizedkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
return kernel, run_time
def _marginalizedkernel_do(G1, G2, node_label = 'atom', edge_label = 'bond_type', p_quit, itr):
def _marginalizedkernel_do(G1, G2, node_label, edge_label, p_quit, itr):
"""Calculate marginalized graph kernels between 2 graphs.
Parameters
......
......@@ -32,6 +32,10 @@ def pathkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
----------
[1] Suard F, Rakotomamonjy A, Bensrhair A. Kernel on Bag of Paths For Measuring Similarity of Shapes. InESANN 2007 Apr 25 (pp. 355-360).
"""
some_graph = args[0][0] if len(args) == 1 else args[0] # only edge attributes of type int or float can be used as edge weight to calculate the shortest paths.
some_weight = list(nx.get_edge_attributes(some_graph, edge_label).values())[0]
weight = edge_label if isinstance(some_weight, float) or isinstance(some_weight, int) else None
if len(args) == 1: # for a list of graphs
Gn = args[0]
Kmatrix = np.zeros((len(Gn), len(Gn)))
......@@ -40,7 +44,7 @@ def pathkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
for i in range(0, len(Gn)):
for j in range(i, len(Gn)):
Kmatrix[i][j] = _pathkernel_do(Gn[i], Gn[j], node_label, edge_label)
Kmatrix[i][j] = _pathkernel_do(Gn[i], Gn[j], node_label, edge_label, weight = weight)
Kmatrix[j][i] = Kmatrix[i][j]
run_time = time.time() - start_time
......@@ -51,7 +55,7 @@ def pathkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
else: # for only 2 graphs
start_time = time.time()
kernel = _pathkernel_do(args[0], args[1], node_label, edge_label)
kernel = _pathkernel_do(args[0], args[1], node_label, edge_label, weight = weight)
run_time = time.time() - start_time
print("\n --- mean average path kernel built in %s seconds ---" % (run_time))
......@@ -59,7 +63,7 @@ def pathkernel(*args, node_label = 'atom', edge_label = 'bond_type'):
return kernel, run_time
def _pathkernel_do(G1, G2, node_label = 'atom', edge_label = 'bond_type'):
def _pathkernel_do(G1, G2, node_label = 'atom', edge_label = 'bond_type', weight = None):
"""Calculate mean average path kernels between 2 graphs.
Parameters
......@@ -70,6 +74,8 @@ def _pathkernel_do(G1, G2, node_label = 'atom', edge_label = 'bond_type'):
node attribute used as label. The default node label is atom.
edge_label : string
edge attribute used as label. The default edge label is bond_type.
weight : string/None
edge attribute used as weight to calculate the shortest path. The default edge label is None.
Return
------
......@@ -81,13 +87,13 @@ def _pathkernel_do(G1, G2, node_label = 'atom', edge_label = 'bond_type'):
num_nodes = G1.number_of_nodes()
for node1 in range(num_nodes):
for node2 in range(node1 + 1, num_nodes):
sp1.append(nx.shortest_path(G1, node1, node2, weight = edge_label))
sp1.append(nx.shortest_path(G1, node1, node2, weight = weight))
sp2 = []
num_nodes = G2.number_of_nodes()
for node1 in range(num_nodes):
for node2 in range(node1 + 1, num_nodes):
sp2.append(nx.shortest_path(G2, node1, node2, weight = edge_label))
sp2.append(nx.shortest_path(G2, node1, node2, weight = weight))
# calculate kernel
kernel = 0
......
# results with minimal test RMSE for each kernel on dataset Asyclic
-- All the kernels are tested on dataset Asyclic, which consists of 185 molecules (graphs).
-- The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
-- For predition we randomly divide the data in train and test subset, where 90% of entire dataset is for training and rest for testing. 10 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
# Results with minimal test RMSE for each kernel on dataset Asyclic
All kernels are tested on dataset Asyclic, which consists of 185 molecules (graphs).
## summary
The criteria used for prediction are SVM for classification and kernel Ridge regression for regression.
| Kernels | RMSE(℃) | std(℃) | parameter | k_time |
For predition we randomly divide the data in train and test subset, where 90% of entire dataset is for training and rest for testing. 10 splits are performed. For each split, we first train on the train data, then evaluate the performance on the test set. We choose the optimal parameters for the test set and finally provide the corresponding performance. The final results correspond to the average of the performances on the test sets.
## Summary
| Kernels | RMSE(℃) | STD(℃) | Parameter | k_time |
|---------------|:---------:|:--------:|-------------:|-------:|
| shortest path | 36.40 | 5.35 | - | - |
| marginalized | 17.90 | 6.59 | p_quit = 0.1 | - |
| path | 14.27 | 6.37 | - | - |
| WL subtree | 9.00 | 6.37 | height = 1 | 0.85" |
| Shortest path | 35.19 | 4.50 | - | 14.58" |
| Marginalized | 18.02 | 6.29 | p_quit = 0.1 | 4'19" |
| Path | 14.00 | 6.94 | - | 37.58" |
| WL subtree | 7.55 | 2.33 | height = 1 | 0.84" |
| Treelet | 8.31 | 3.38 | - | 49.58" |
* RMSE stands for arithmetic mean of the root mean squared errors on all splits.
* STD stands for standard deviation of the root mean squared errors on all splits.
* Paremeter is the one with which the kenrel achieves the best results.
* k_time is the time spent on building the kernel matrix.
* The targets of training data are normalized before calculating *path kernel* and *treelet kernel*.
**In each line, paremeter is the one with which the kenrel achieves the best results.
In each line, k_time is the time spent on building the kernel matrix.**
## Detailed results of each kernel
In each table below:
* The unit of the *RMSEs* and *stds* is *℃*, The unit of the *k_time* is *s*.
* k_time is the time spent on building the kernel matrix.
## detailed results of WL subtree kernel.
### shortest path kernel
```
RMSE_test std_test RMSE_train std_train k_time
----------- ---------- ------------ ----------- --------
35.192 4.49577 28.3604 1.35718 14.5768
```
### Marginalized kernel
The table below shows the results of the marginalized under different termimation probability.
```
p_quit RMSE_test std_test RMSE_train std_train k_time
-------- ----------- ---------- ------------ ----------- --------
0.1 18.0243 6.29247 12.1863 7.03899 258.77
0.2 18.3376 5.85454 13.9554 7.54407 256.327
0.3 18.496 5.73492 13.9391 7.95812 255.614
0.4 19.4491 5.3713 16.2593 6.69358 254.897
0.5 19.7857 5.55054 17.0181 6.84437 256.757
0.6 20.1922 5.59122 17.6618 6.56718 256.557
0.7 21.6614 6.02685 20.5882 5.74601 254.953
0.8 22.996 6.08347 23.5943 3.80637 252.804
0.9 24.4241 4.95119 25.8082 3.31207 256.738
```
### Path kernel
**The targets of training data are normalized before calculating the kernel.**
```
RMSE_test std_test RMSE_train std_train k_time
----------- ---------- ------------ ----------- --------
14.0015 6.93602 3.76191 0.702594 37.5759
```
### Weisfeiler-Lehman subtree kernel
The table below shows the results of the WL subtree under different subtree heights.
```
height RMSE_test std_test RMSE_train std_train k_time
-------- ----------- ---------- ------------ ----------- --------
0 36.2108 7.33179 141.419 1.08284 0.392911
1 9.00098 6.37145 140.065 0.877976 0.812077
2 19.8113 4.04911 140.075 0.928821 1.36955
3 25.0455 4.94276 140.198 0.873857 1.78629
4 28.2255 6.5212 140.272 0.838915 2.30847
5 30.6354 6.73647 140.247 0.86363 2.8258
6 32.1027 6.85601 140.239 0.872475 3.1542
7 32.9709 6.89606 140.094 0.917704 3.46081
8 33.5112 6.90753 140.076 0.931866 4.08857
9 33.8502 6.91427 139.913 0.928974 4.25243
10 34.0963 6.93115 139.894 0.942612 5.02607
```
**The unit of the *RMSEs* and *stds* is *℃*, The unit of the *k_time* is *s*.
k_time is the time spent on building the kernel matrix.**
0 15.6859 4.1392 17.6816 0.713183 0.360443
1 7.55046 2.33179 6.27001 0.654734 0.837389
2 9.72847 2.05767 4.45068 0.882129 1.25317
3 11.2961 2.79994 2.27059 0.481516 1.79971
4 12.8083 3.44694 1.07403 0.637823 2.35346
5 14.0179 3.67504 0.700602 0.57264 2.78285
6 14.9184 3.80535 0.691515 0.56462 3.20764
7 15.6295 3.86539 0.691516 0.56462 3.71648
8 16.2144 3.92876 0.691515 0.56462 3.99213
9 16.7257 3.9931 0.691515 0.56462 4.26315
10 17.1864 4.05672 0.691516 0.564621 5.00918
```
### Treelet kernel
**The targets of training data are normalized before calculating the kernel.**
```
RMSE_test std_test RMSE_train std_train k_time
----------- ---------- ------------ ----------- --------
8.3079 3.37838 2.90887 1.2679 49.5814
```
import sys
import pathlib
sys.path.insert(0, "../")
import networkx as nx
import numpy as np
import time
from pygraph.utils.utils import getSPGraph
def spkernel(*args, edge_weight = 'bond_type'):
"""Calculate shortest-path kernels between graphs.
Parameters
----------
Gn : List of NetworkX graph
List of graphs between which the kernels are calculated.
/
G1, G2 : NetworkX graphs
2 graphs between which the kernel is calculated.
edge_weight : string
edge attribute corresponding to the edge weight. The default edge weight is bond_type.
Return
------
Kmatrix/Kernel : Numpy matrix/int
Kernel matrix, each element of which is the sp kernel between 2 praphs. / SP Kernel between 2 graphs.
References
----------
[1] Borgwardt KM, Kriegel HP. Shortest-path kernels on graphs. InData Mining, Fifth IEEE International Conference on 2005 Nov 27 (pp. 8-pp). IEEE.
"""
if len(args) == 1: # for a list of graphs
Gn = args[0]
Kmatrix = np.zeros((len(Gn), len(Gn)))
Sn = [] # get shortest path graphs of Gn
for i in range(0, len(Gn)):
Sn.append(getSPGraph(Gn[i], edge_weight = edge_weight))
start_time = time.time()
for i in range(0, len(Gn)):
for j in range(i, len(Gn)):
for e1 in Sn[i].edges(data = True):
for e2 in Sn[j].edges(data = True):
if e1[2]['cost'] != 0 and e1[2]['cost'] == e2[2]['cost'] and ((e1[0] == e2[0] and e1[1] == e2[1]) or (e1[0] == e2[1] and e1[1] == e2[0])):
Kmatrix[i][j] += 1
Kmatrix[j][i] += (0 if i == j else 1)
run_time = time.time() - start_time
print("--- shortest path kernel matrix of size %d built in %s seconds ---" % (len(Gn), run_time))
return Kmatrix, run_time
else: # for only 2 graphs
G1 = getSPGraph(args[0], edge_weight = edge_weight)
G2 = getSPGraph(args[1], edge_weight = edge_weight)
kernel = 0
start_time = time.time()
for e1 in G1.edges(data = True):
for e2 in G2.edges(data = True):
if e1[2]['cost'] != 0 and e1[2]['cost'] == e2[2]['cost'] and ((e1[0] == e2[0] and e1[1] == e2[1]) or (e1[0] == e2[1] and e1[1] == e2[0])):
kernel += 1
# print("--- shortest path kernel built in %s seconds ---" % (time.time() - start_time))
return kernel
\ No newline at end of file
This diff is collapsed.
......@@ -129,6 +129,7 @@ def _wl_subtreekernel_do(*args, node_label = 'atom', edge_label = 'bond_type', h
Kernel matrix, each element of which is the Weisfeiler-Lehman kernel between 2 praphs.
"""
height = int(height)
Gn = args[0]
Kmatrix = np.zeros((len(Gn), len(Gn)))
all_num_of_labels_occured = 0 # number of the set of letters that occur before as node labels at least once in all graphs
......@@ -233,6 +234,7 @@ def _weisfeilerlehmankernel_do(G1, G2, height = 0):
"""
# init.
height = int(height)
kernel = 0 # init kernel
num_nodes1 = G1.number_of_nodes()
num_nodes2 = G2.number_of_nodes()
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment