diff --git a/README.md b/README.md index 0b8d01ab4114a27a2bb2eac9a66ec2c4113ee54f..945d26b206b98b67ae2b33ed3c5fc069569a45be 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,184 @@ -# CTCModel : A transparent CTC implementation for Keras + +# CTCModel : A transparent CTC implementation for Keras + +## Description + +CTCModel makes the training of a RNN with the Connectionnist Temporal Classification approach completely transparent. + +It directly inherits from the traditionnal Keras 2 Model and uses the TensorFlow implementation of the CTC loss and decoding functions. + +## Dependencies +- Keras +- Tensorflow + +## Installation +$ git clone https://github.com/litislab/CTCModel +$ cd CTCModel + +## Getting started +Example of a standard recurrent neural network with CTCModel in Keras. + +<code> +from keras.layers import LSTM, TimeDistributed, Dense, Activation, Input +from keras.optimizers import Adam +from numpy import zeros +from CTCModel import CTCModel + +input_layer = Input((None, h_features)) +lstm0 = LSTM(128, return_sequences=True)(input_layer) +lstm1 = LSTM(128, return_sequences=True)(lstm0) +dense = TimeDistributed(Dense(nb_labels))(lstm1) +output_layer = Activation("sigmoid") + +model = CTCModel(input_layer, output_layer) +model.compile(optimizer=Adam(lr=1e-4)) +</code> + + +---------- + + +The standard inputs x and y of a Keras Model, where x is the observations and y the labels, are here defined differently. In CTCModel, you must provide as x: + + - the **input observations** + - the **labels** + - the **lengths of the input sequences** + - the **lengths of the label sequences** + +Here, y is not used in a standard way and must be defined for Keras methods (as the labels or an empty structure of length equal to the length of labels). +Let *x_train*, *y_train*, *x_train_len* and *y_train_len* those terms. Fit, evaluate and predict methods can be used as follow: + +<code> +model.fit(x=[x_train,y_train,x_train_len,y_train_len], y=zeros(nb_train), batch_size=64) + +print(model.evaluate(x=[x_test,y_test,x_test_len,y_test_len], batch_size=64)) + +model.predict([x_test, x_test_len]) +</code> + +## Example + +The file example.py is an exemple of the use of CTCModel. The dataset is composed of sequence of digits. This is images from the MNIST datasets [Lecun 98] that have been concatenated to get observation sequences and label sequences. +The example shows how to use the standard fit, predict and evaluate methods. From the observation and label sequences, we create two list per dataset containing the length of each sequence, one list for the observations and one for the labels. Then data are padded in order to provide inputs of fixed-size to the Keras methods. +A standard Reccurent Neural Network with bidirectional layers is defined and trained using the *fit* method of CTCModel. Then the *evaluate* method is performed to compute the loss, the label error rate and the sequence error rate on the test set. The output of the *evaluate* method is thus a list containing the values of each metric. Finally, the *predict* method is applied to get the predictions on the test set. The first predicted sequence are printed in order to compare the predicted labels with the ground truth. + +## Under the hood +CTCModel works by adding three additionnal output layers to a recurrent network for computing the CTC loss, decoding and evaluating using standard metrics for sequence analysis (the sequence error rate and label error rate). Each one can be applied in a blind manner, by the use of standard Keras methods such as *fit*, *predict* and *evaluate*. Note that methods based on generator have been defined and can be used in a standard way, provided that input x and label y that are return by the generator have the specific structure seen above. + +Except the three specific layers, CTCModel works as a standard Keras Model and most of the overriden methods just select the right output layer and call the related Keras Model method. There is also additional methods to save or load model parameters and other ones to get specific computations, e.g. the loss using *get_loss* or the input probabilities using *get_probas* (and the related *on_batch* and *generator* methods). + +## Credits and licence +CTCModel was developped at the LITIS laboratory, Normandie University (http://www.litislab.fr) by Cyprien RUFFINO and Yann SOULLARD, under the supervision of Thierry PAQUET. + +CTCModel is under the terms of the GPL-3.0 licence. + +[//]: # (If you use CTCModel for research purposes, please consider adding the following citation to your paper: + +<code> +@misc{ctcmodel, +author = {Soullard, Yann and Ruffino, Cyprien and Paquet, Thierry}, +howpublished = {$\backslash$url{\{}https://arxiv.org/link}, +title = {{CTCModel: Connectionist Temporal Classification in Keras}}, +year = {2018} +} +</code> +) + +## References +F. Chollet et al.. Keras: Deep Learning for Python, https://github.com/keras-team/keras, 2015. +A. Graves, S. Fernández, F. Gomez, J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (pp. 369-376). ACM, June 2006. +LeCun, Y. (1998). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. + +# CTCModel : A transparent CTC implementation for Keras + +## Description + +CTCModel makes the training of a RNN with the Connectionnist Temporal Classification approach completely transparent. + +It directly inherits from the traditionnal Keras 2 Model and uses the TensorFlow implementation of the CTC loss and decoding functions. + +## Dependencies +- Keras +- Tensorflow + +## Installation +$ git clone https://github.com/litislab/CTCModel +$ cd CTCModel + +## Getting started +Example of a standard recurrent neural network with CTCModel in Keras. + +<code> +from keras.layers import LSTM, TimeDistributed, Dense, Activation, Input +from keras.optimizers import Adam +from numpy import zeros +from CTCModel import CTCModel + +input_layer = Input((None, h_features)) +lstm0 = LSTM(128, return_sequences=True)(input_layer) +lstm1 = LSTM(128, return_sequences=True)(lstm0) +dense = TimeDistributed(Dense(nb_labels))(lstm1) +output_layer = Activation("sigmoid") + +model = CTCModel(input_layer, output_layer) +model.compile(optimizer=Adam(lr=1e-4)) +</code> + + +---------- + + +The standard inputs x and y of a Keras Model, where x is the observations and y the labels, are here defined differently. In CTCModel, you must provide as x: + + - the **input observations** + - the **labels** + - the **lengths of the input sequences** + - the **lengths of the label sequences** + +Here, y is not used in a standard way and must be defined for Keras methods (as the labels or an empty structure of length equal to the length of labels). +Let *x_train*, *y_train*, *x_train_len* and *y_train_len* those terms. Fit, evaluate and predict methods can be used as follow: + +<code> +model.fit(x=[x_train,y_train,x_train_len,y_train_len], y=zeros(nb_train), batch_size=64) + +print(model.evaluate(x=[x_test,y_test,x_test_len,y_test_len], batch_size=64)) + +model.predict([x_test, x_test_len]) +</code> + +## Example + +The file example.py is an exemple of the use of CTCModel. The dataset is composed of sequence of digits. This is images from the MNIST datasets [Lecun 98] that have been concatenated to get observation sequences and label sequences. +The example shows how to use the standard fit, predict and evaluate methods. From the observation and label sequences, we create two list per dataset containing the length of each sequence, one list for the observations and one for the labels. Then data are padded in order to provide inputs of fixed-size to the Keras methods. +A standard Reccurent Neural Network with bidirectional layers is defined and trained using the *fit* method of CTCModel. Then the *evaluate* method is performed to compute the loss, the label error rate and the sequence error rate on the test set. The output of the *evaluate* method is thus a list containing the values of each metric. Finally, the *predict* method is applied to get the predictions on the test set. The first predicted sequence are printed in order to compare the predicted labels with the ground truth. + +## Under the hood +CTCModel works by adding three additionnal output layers to a recurrent network for computing the CTC loss, decoding and evaluating using standard metrics for sequence analysis (the sequence error rate and label error rate). Each one can be applied in a blind manner, by the use of standard Keras methods such as *fit*, *predict* and *evaluate*. Note that methods based on generator have been defined and can be used in a standard way, provided that input x and label y that are return by the generator have the specific structure seen above. + +Except the three specific layers, CTCModel works as a standard Keras Model and most of the overriden methods just select the right output layer and call the related Keras Model method. There is also additional methods to save or load model parameters and other ones to get specific computations, e.g. the loss using *get_loss* or the input probabilities using *get_probas* (and the related *on_batch* and *generator* methods). + +## Credits and licence +CTCModel was developped at the LITIS laboratory, Normandie University (http://www.litislab.fr) by Cyprien RUFFINO and Yann SOULLARD, under the supervision of Thierry PAQUET. + +CTCModel is under the terms of the GPL-3.0 licence. + +[//]: # (If you use CTCModel for research purposes, please consider adding the following citation to your paper: + +<code> +@misc{ctcmodel, +author = {Ruffino, Cyprien and Soullard, Yann and Paquet, Thierry}, +howpublished = {$\backslash$url{\{}https://arxiv.org/link}, +title = {{CTCModel : nom de l'article}}, +year = {2017} +} +</code> +) + +## References +F. Chollet et al.. Keras: Deep Learning for Python, https://github.com/keras-team/keras, 2015. +A. Graves, S. Fernández, F. Gomez, J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (pp. 369-376). ACM, June 2006. +LeCun, Y. (1998). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. +# CTCModel : A transparent CTC implementation for Keras ## Description diff --git a/example.py b/example.py new file mode 100644 index 0000000000000000000000000000000000000000..aa3598371249c26d6b87194d7d81250261a04dff --- /dev/null +++ b/example.py @@ -0,0 +1,83 @@ + +from keras.layers import TimeDistributed, Activation, Dense, Input, Bidirectional, LSTM, Masking, GaussianNoise +from keras.optimizers import Adam +from CTCModel import CTCModel +import pickle +from keras.preprocessing import sequence +import numpy as np + + +def create_network(nb_features, nb_labels, padding_value): + + # Define the network architecture + input_data = Input(name='input', shape=(None, nb_features)) # nb_features = image height + + masking = Masking(mask_value=padding_value)(input_data) + noise = GaussianNoise(0.01)(masking) + blstm = Bidirectional(LSTM(128, return_sequences=True, dropout=0.1))(noise) + blstm = Bidirectional(LSTM(128, return_sequences=True, dropout=0.1))(blstm) + blstm = Bidirectional(LSTM(128, return_sequences=True, dropout=0.1))(blstm) + + dense = TimeDistributed(Dense(nb_labels + 1, name="dense"))(blstm) + outrnn = Activation('softmax', name='softmax')(dense) + + network = CTCModel([input_data], [outrnn]) + network.compile(Adam(lr=0.0001)) + + return network + + + +if __name__ == '__main__': + """ Example of recurrent neural network using CTCModel + applied on sequences of digits. Digits are images from the MNIST dataset that have been concatenated + to get observation sequences and label sequences of different lengths (from 2 to 5).""" + + # load data from a pickle file + (x_train, y_train), (x_test, y_test) = pickle.load(open('./seqDigits.pkl', 'rb')) + + nb_labels = 10 # number of labels (10, this is digits) + batch_size = 32 # size of the batch that are considered + padding_value = 255 # value for padding input observations + nb_epochs = 10 # number of training epochs + nb_train = len(x_train) + nb_test = len(x_test) + nb_features = len(x_train[0][0]) + + + # create list of input lengths + x_train_len = np.asarray([len(x_train[i]) for i in range(nb_train)]) + x_test_len = np.asarray([len(x_test[i]) for i in range(nb_test)]) + y_train_len = np.asarray([len(y_train[i]) for i in range(nb_train)]) + y_test_len = np.asarray([len(y_test[i]) for i in range(nb_test)]) + + # pad inputs + x_train_pad = sequence.pad_sequences(x_train, value=float(padding_value), dtype='float32', + padding="post", truncating='post') + x_test_pad = sequence.pad_sequences(x_test, value=float(padding_value), dtype='float32', + padding="post", truncating='post') + y_train_pad = sequence.pad_sequences(y_train, value=float(nb_labels), + dtype='float32', padding="post") + y_test_pad = sequence.pad_sequences(y_test, value=float(nb_labels), + dtype='float32', padding="post") + + + + # define a recurrent network using CTCModel + network = create_network(nb_features, nb_labels, padding_value) + + + # CTC training + network.fit(x=[x_train_pad, y_train_pad, x_train_len, y_train_len], y=np.zeros(nb_train), \ + batch_size=batch_size, epochs=nb_epochs) + + + # Evaluation: loss, label error rate and sequence error rate are requested + eval = network.evaluate(x=[x_test_pad, y_test_pad, x_test_len, y_test_len],\ + batch_size=batch_size, metrics=['loss', 'ler', 'ser']) + + + # predict label sequences + pred = network.predict([x_test_pad, x_test_len], batch_size=batch_size, max_value=padding_value) + for i in range(10): # print the 10 first predictions + print("Prediction :", pred[i], " -- Label : ", y_test[i]) # [j for j in pred[i] if j!=-1] \ No newline at end of file diff --git a/seqDigits.pkl b/seqDigits.pkl new file mode 100644 index 0000000000000000000000000000000000000000..af2bea35b2d47663c20aee7fb27086b239d8def9 Binary files /dev/null and b/seqDigits.pkl differ