Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Python RNN implements text generation


Jun 01, 2021 Article blog



This article was reproduced to Know ID: Charles (Bai Lu) knows his personal column

Lead

This article will introduce the LSTM network on this basis. F inally, give an example like "Python Writing" to achieve text generation, such as generating poetry, novels, and so on.

Let's start happily


References

Understanding LSTM Networks:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/


Related documents

Baidu web download link: https://pan.baidu.com/s/16hjmw1NtU9Oa4iwyj4qJuQ

Password: gmpi


Develop tools

Python version: 3.6.4

Related modules:

tensorflow-gpu module;

numpy module;

and some Python's own modules.

The TensorFlow-GPU version is:

1.7.0


Environment construction

Install Python and add it to the environment variable, and pip installs the relevant modules that are required.

In addition, TensorFlow-GPU environment construction please refer to the relevant network tutorials, pay attention to the version and drive strict correspondence.


Introduction to the principle

I. RNNs

People's thinking is always continuous, for example, when you read this article, your understanding of each word depends on some of the words you see earlier, rather than discarding everything you see before and understanding each word. Traditional neural networks (CNN) can't do this, so there's a circulatory neural network (RNNs).

In RNNs, there are loops that enable them to preserve what they learned earlier:

 Python RNN implements text generation1

In the network structure shown above, for that part of Rectangular Block A, by entering the Xt (feature vector of the t-moment), it outputs a result ht (state or output of the t-moment), and the circular structure in the network makes the current state part of the next input.

Expand the RNNs on the time step to get the following image:

 Python RNN implements text generation2

That is, the RNNs chain structure used in the article "Python implements a simple machine translation model". O bviously, such a structure is conducive to dealing with sequence-related issues. In recent years, it has achieved great success in speech recognition, language translation and other fields.

The success of RNNs is largely due to the use of this special RNNs structure, such as LSTMs, rather than the ordinary RNNs structure.

LSTMs

Full name Long Short Term Term Network Memorys.

That is, short- and long-term memory networks.

The limitation of ordinary RNNs is that it is difficult for ordinary RNNs to correlate them when there is a large gap between what we are predicting and the relevant information:

 Python RNN implements text generation3

Although in theory, as long as the parameters are appropriate, or can solve the long-term dependency can not be well linked to the problem, but the concrete implementation does not seem easy, at least so far is not easy.

Fortunately, LSTMs are a good solution to this problem. It was designed to remember information over a long period of time.

Circulating neural networks are formed by copying neural network modules of the same structure. In standard RNNs, the structure of neural network modules is very simple, for example, it can consist of a single tanh layer:

 Python RNN implements text generation4

LSTMs have a similar structure, but the structure of neural network modules becomes somewhat more complex:

 Python RNN implements text generation5

Next, let's take a closer look at this structure. First, let's define the symbols used:

 Python RNN implements text generation6

Pink circles:

Represents point-by-point operations such as vector addition;

Yellow rectangular box:

Represents the neural network layer;

Normal line:

Used to carry and pass vectors;

Merged lines:

Represents the consolidation of vectors carried on the two lines;

Separate lines:

Represents copying the vectors carried on the line and passing them to two places.

2.1 The core idea of LSTMs

Suppose a green box is a cell.

Vectors pass through the entire cell through the horizontal line that runs through the cell at the top of the structure diagram, and the cell does only a few linear operations on it:

 Python RNN implements text generation7

Obviously, this structure makes it easy to get information through the whole cell without changing.

Of course, there is only one horizontal line that cannot be implemented to add or remove information, that is, the implementation allows information to selectively pass through the cell, which needs to be done through a structure called gates.

The door structure is mainly implemented by a sigmoid neural network layer and a point-by-point multiplication:

 Python RNN implements text generation8

The vector output of the sigmoid layer each element is a real number between 0 and 1, which represents the weight of the information passed at this time, which means "no information is allowed to pass at this time" when it is 0, and 1 means "let all information pass at this time". Each LSTM has three such door structures to protect and control information.

2.2 Gradual understanding of LSTM

Forget the door:

First, the LSTM needs to decide what information needs to be discarded and what information needs to be retained. T his is achieved through a sigmoid layer called the Forgotten Door. Its inputs are ht-1 and xt, and the output is a vector with values between 0 and 1, indicating the weight of the information in each part of Ct-1, 0 meaning that the part of the information is not allowed to pass, and 1 is the value that allows that part of the information to pass.

Specifically, in a language model, for example, we predict the next word based on all the contextual information. I n this case, the gender information for the current subject should be included in each cell's state. I n this way, we can then use pronouns correctly. However, when we begin to describe a new subject, we should discard the gender of the previous subject.

 Python RNN implements text generation9

Getting started (input gate layer):

Second, LSTM will decide what new information to add to the cell state. The implementation is done in two steps:

(1) Generate an alternative vector with a tanh layer that represents all the addable information obtained;

(2) Use a sigmoid layer called Getting Started to determine the respective weights of the addable information obtained in step (1).

Specifically, in a language model, for example, we need to add gender information from a new subject to the cell state to replace the previous subject gender information.

 Python RNN implements text generation10

With forgotten doors and pass-throughs, we'll be able to update the status of the cell by updating Ct-1 to Ct.

Or take the language model as an example, assuming that our model has just output a pronoun, and then we might want to output a verb, should the verb be in singular or plural form? Obviously, we need to add pronoun-related information and current predictive information to the state of the cell in order to make the correct prediction.

The calculation is as follows:

 Python RNN implements text generation11

Output:

Finally, we need to decide on the output value. The output value is calculated as follows:

(1) Use the sigmoid layer to determine/calculate which part of the information in Ct will be output;

(2) Using the tanh layer to compress the value of Ct to between -1 and 1;

(3) Multiplying the output of the tanh layer by the output of the sigmoid layer is the final output.

 Python RNN implements text generation12

Variants of LSTMs

(1) Use the state of the cell as part of the door structure input.

 Python RNN implements text generation13

(2) Couple the forgotten door with the introduction, i.e. no longer decide separately to forget and add information.

 Python RNN implements text generation14

③ GRU

 Python RNN implements text generation15

The GRU model "simplifies" the design of the LSTM model, where rt is combined with the forgotten door in the LSTM, called the reset door, and zt is the update door, which acts as the output door in the LSTM.

Practical application

In order to implement the idea of combining theory with practice, this paper will give a simple example of a model used similar to "Python Writing", which is no longer an unnecessary introduction.

The implementation process is detailed in the source code in the relevant file.

Use the demo

Model training:

Run the 'train.py' file in the cmd window:

 Python RNN implements text generation16

If necessary, modify the parameters yourself:

 Python RNN implements text generation17

The model uses:

Just run the generate.py file in the cmd window.

Note that the model parameters need to be consistent with the model parameters in the train.py file:

 Python RNN implements text generation18

The results are shown

To generate English text:

Results from training material for Shakespeare's work:

 Python RNN implements text generation19

To generate Chinese text:

Results obtained from Jay Chou's work as training material:

 Python RNN implements text generation20


more

Code as of 2018-06-24 test is correct.

Model is relatively simple, interested friends can be optimized on this basis, of course, the role of RNN can not only be text generation Oh

Later have the opportunity to give other examples