Jun 01, 2021 Article blog
This article was reproduced to Know ID: Charles (Bai Lu) knows his personal column
This article will introduce the LSTM network on this basis. F inally, give an example like "Python Writing" to achieve text generation, such as generating poetry, novels, and so on.
Let's start happily
Understanding LSTM Networks:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Baidu web download link:
https://pan.baidu.com/s/16hjmw1NtU9Oa4iwyj4qJuQ
Password: gmpi
Python version: 3.6.4
Related modules:
tensorflow-gpu module;
numpy module;
and some Python's own modules.
The TensorFlow-GPU version is:
1.7.0
Install Python and add it to the environment variable, and pip installs the relevant modules that are required.
In addition, TensorFlow-GPU environment construction please refer to the relevant network tutorials, pay attention to the version and drive strict correspondence.
I. RNNs
People's thinking is always continuous, for example, when you read this article, your understanding of each word depends on some of the words you see earlier, rather than discarding everything you see before and understanding each word.
Traditional neural networks (CNN) can't do this, so there's a circulatory neural network (RNNs).
In RNNs, there are loops that enable them to preserve what they learned earlier:
In the network structure shown above, for that part of Rectangular Block A, by entering the Xt (feature vector of the t-moment), it outputs a result ht (state or output of the t-moment), and the circular structure in the network makes the current state part of the next input.
Expand the RNNs on the time step to get the following image:
That is, the RNNs chain structure used in the article
"Python implements
a simple machine translation model". O
bviously, such a structure is conducive to dealing with sequence-related issues.
In recent years, it has achieved great success in speech recognition, language translation and other fields.
The success of RNNs is largely due to the use of this special RNNs structure, such as LSTMs, rather than the ordinary RNNs structure.
LSTMs
Full name Long Short Term Term Network Memorys.
That is, short- and long-term memory networks.
The limitation of ordinary RNNs is that it is difficult for ordinary RNNs to correlate them when there is a large gap between what we are predicting and the relevant information:
Although in theory, as long as the parameters are appropriate, or can solve the long-term dependency can not be well linked to the problem, but the concrete implementation does not seem easy, at least so far is not easy.
Fortunately, LSTMs are a good solution to this problem. It was designed to remember information over a long period of time.
Circulating neural networks are formed by copying neural network modules of the same structure. In standard RNNs, the structure of neural network modules is very simple, for example, it can consist of a single tanh layer:
LSTMs have a similar structure, but the structure of neural network modules becomes somewhat more complex:
Next, let's take a closer look at this structure.
First, let's define the symbols used:
Pink circles:
Represents point-by-point operations such as vector addition;
Yellow rectangular box:
Represents the neural network layer;
Normal line:
Used to carry and pass vectors;
Merged lines:
Represents the consolidation of vectors carried on the two lines;
Separate lines:
Represents copying the vectors carried on the line and passing them to two places.
2.1 The core idea of LSTMs
Suppose a green box is a cell.
Vectors pass through the entire cell through the horizontal line that runs through the cell at the top of the structure diagram, and the cell does only a few linear operations on it:
Obviously, this structure makes it easy to get information through the whole cell without changing.
Of course, there is only one horizontal line that cannot be implemented to add or remove information, that is, the implementation allows information to selectively pass through the cell, which needs to be done through a structure called gates.
The door structure is mainly implemented by a sigmoid neural network layer and a point-by-point multiplication:
The vector output of the sigmoid layer each element is a real number between 0 and 1, which represents the weight of the information passed at this time, which means "no information is allowed to pass at this time" when it is 0, and 1 means "let all information pass at this time". Each LSTM has three such door structures to protect and control information.
2.2 Gradual understanding of LSTM
Forget the door:
First, the LSTM needs to decide what information needs to be discarded and what information needs to be retained. T
his is achieved through a sigmoid layer called the Forgotten Door.
Its inputs are
ht-1
and
xt,
and the output is a vector with values between 0 and 1, indicating the weight of the information in each part of
Ct-1,
0 meaning that the part of the information is not allowed to pass, and 1 is the value that allows that part of the information to pass.
Specifically, in a language model, for example, we predict the next word based on all the contextual information. I n this case, the gender information for the current subject should be included in each cell's state. I n this way, we can then use pronouns correctly. However, when we begin to describe a new subject, we should discard the gender of the previous subject.
Getting started (input gate layer):
Second, LSTM will decide what new information to add to the cell state.
The implementation is done in two steps:
(1) Generate an alternative vector with a tanh layer that represents all the addable information obtained;
(2) Use a sigmoid layer called Getting Started to determine the respective weights of the addable information obtained in step (1).
Specifically, in a language model, for example, we need to add gender information from a new subject to the cell state to replace the previous subject gender information.
With forgotten doors and pass-throughs, we'll be able to update the status of the cell by updating
Ct-1
to
Ct.
Or take the language model as an example, assuming that our model has just output a pronoun, and then we might want to output a verb, should the verb be in singular or plural form? Obviously, we need to add pronoun-related information and current predictive information to the state of the cell in order to make the correct prediction.
The calculation is as follows:
Output:
Finally, we need to decide on the output value. The output value is calculated as follows:
(1) Use the sigmoid layer to determine/calculate which part of the information in Ct will be output;
(2) Using the tanh layer to compress the value of Ct to between -1 and 1;
(3) Multiplying the output of the tanh layer by the output of the sigmoid layer is the final output.
Variants of LSTMs
(1) Use the state of the cell as part of the door structure input.
(2) Couple the forgotten door with the introduction, i.e. no longer decide separately to forget and add information.
③ GRU
The GRU model "simplifies" the design of the LSTM model, where
rt
is combined with the forgotten door in the LSTM, called the reset door, and
zt
is the update door, which acts as the output door in the LSTM.
Practical application
In order to implement the idea of combining theory with practice, this paper will give a simple example of a model used similar to "Python Writing", which is no longer an unnecessary introduction.
The implementation process is detailed in the source code in the relevant file.
Use the demo
Model training:
Run the 'train.py' file in the cmd window:
If necessary, modify the parameters yourself:
The model uses:
Just run the generate.py file in the cmd window.
Note that the model parameters need to be consistent with the model parameters in the train.py file:
The results are shown
To generate English text:
Results from training material for Shakespeare's work:
To generate Chinese text:
Results obtained from Jay Chou's work as training material:
Code as of 2018-06-24 test is correct.
Model is relatively simple, interested friends can be optimized on this basis, of course, the role of RNN can not only be text generation Oh
Later have the opportunity to give other examples