Python implements a simple machine translation model

Jun 01, 2021 Article blog

This article was reproduced to Know ID: Charles (Bai Lu) knows his personal column

Download the W3Cschool Mobile App, 0 Foundation Anytime, Anywhere Learning Programming >> Poke this to learn

Lead

The guilt of not tweeting for days led me to decide to come to water an article today.

Like the previous tweet, "Python Plays CartPole," this is a simple example from PyTorch's official tutorial.

To demonstrate my sincerity, I'll still cover the basic models used in this article in depth: Seq2Seq and the Attention mechanism.

The content will still be very long

Hope to be helpful for new NLP/Deep Learning children's shoes

Nonsense doesn't say much, go straight into the main topic

References

Official English tutorial links:

http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

Other than that:

Students who have difficulty reading English literature need not worry, I have translated this tutorial into Chinese put into the relevant documents.

Develop tools

System: Windows10

Python version: 3.6.4

Related modules:

torch module;

numpy module;

matplotlib module;

and some Python's own modules.

The PyTorch version is:

0.3.0

Environment construction

Install Python and add it to the environment variable, and pip installs the relevant modules that are required.

Additional notes:

PyTorch does not support direct pip installations for the time being.

There are two options:

(1) after the installation of anaconda3 installed in the environment of anaconda3 (direct pip installation can be);

2 Use the compiled whl file installation, download link is:

https://pan.baidu.com/s/1dF6ayLr#list/path=%2Fpytorch

Introduction to the principle

PS：

Some of the content refers to relevant web blogs and books.

(1) Single-tier network

The structure of a single-tier network is similar to the following:

Python implements a simple machine translation model1

The input x is transformed by wx-b and the activation function f gets the output y.

Students who believe that they have a preliminary understanding of machine learning/deep learning know that this is in fact a single-layer perception machine

For convenience, let's paint it like this (please ignore my poor drawing level):

Python implements a simple machine translation model2

x is the input vector, y is the output vector, and the arrow represents a transformation, i.e. y sf (Wx s b).

(2) Classic RNN

In practice, we encounter a lot of sequential data:

X1,X2,X3,X4...

For example, in our machine translation model, X1 can be thought of as the first word, X2 can be thought of as the second word, and so on.

The original neural network did not handle the sequential data very well, so the savior RNN appeared, which introduced the concept of hidden state h, using the h-pair sequence to extract features, and then converted to output. Here's a detailed description of how it was calculated (h0 in the figure below is an initial hidden state, and for simplicity, let's assume that it is a reasonable value set according to the specific model):

Python implements a simple machine translation model3

thereinto:

Again, all letters are vectors, and arrows represent a transformation of vectors.

H2 is calculated similar to h1, and the parameters P, Q, and b are used at each step, which means that the parameters of each step are shared:

Python implements a simple machine translation model5

thereinto:

And so on (remember that the parameters are the same!!! /b20> ), this calculation can be sustained indefinitely (not limited to the length of 4 !!! in the figure ）。

So how does the output of RNN get it?

The output value of RNN is calculated by h:

Python implements a simple machine translation model7

thereinto:

Python implements a simple machine translation model8

Similarly, there are y2, y3, y4... ：

Python implements a simple machine translation model9

Of course, as before, the parameters W and c here are shared.

These are the most classic RNN structures, and we can see that they have a fatal drawback:

The input and output sequences must be equal length!

This shortcoming has led to a less wide range of classic RNNs than expected.

(3) Improve the classic RNN

Scenario 1 (input N, output 1):

Suppose our question requires us to enter a sequence and output a separate number. So let's just do the output transformation on the last h:

Python implements a simple machine translation model10

Scenario 2 (input 1, output N):

What happens when the input is a single number, not a sequence?

We can only start the input calculation at the beginning of the sequence:

Python implements a simple machine translation model11

Of course, you can also enter input information x as input for each stage:

Python implements a simple machine translation model12

Scenario 3 (input is N and output is M):

This is one of the most important variants of RNN, and this structure is also known as:

The Encoder-Decoder model, or Seq2Seq model.

Our machine translation model is based on it.

The Seq2Seq structure first encodes the input data into a context volume c:

Python implements a simple machine translation model13

thereinto:

Python implements a simple machine translation model14

That is, the amount of context c can be directly equal to the last hidden state, can also be the last hidden state to do a transformation V to get, of course, can also be all the hidden state to do a transformation V to get and so on.

The above RNN structure is generally referred to as Encoder.

Once we get c, we need another RNN network to decode it, Decoder. You can enter this c into Decoder as the initial state h'0:

Python implements a simple machine translation model15

Of course you can also use c as input to Decoder every step of the way:

Python implements a simple machine translation model16

Come on, add it:

The missing part (e.g. some blue squares do not have x input) you can calculate the output as 0 and then in the formula listed in the classic RNN, and the others are similar.

(4) Attention mechanism

In the Encoder-Decoder structure, Encoder encodes all the input sequences into a uniform semantic feature c and then decodes them, and when the input sequence is long, c is likely to be unable to store all the information in the input sequence.

The Attention mechanism solves these problems well. It enters a different c at each step of Decoder:

Python implements a simple machine translation model17

Where c is generated from h in Encoder:

Aij represents the relevance of the hj in stage j in Encoder and phase i in Decoder.

So how do these weights determine aij? b20> and we generally think of it as related to the hidden state of Encoder's j-stage and Decoder's i-1 phase' hidden state.

For example, we're going to calculate a1j:

Python implements a simple machine translation model19

Then we need to calculate a2j:

Python implements a simple machine translation model20

And so on.

(5) Final task: French translated into English

With the front paving, I believe we can all understand the official website tutorial.

Here we only do a brief introduction, detailed modeling and implementation process can refer to my translation of the official documents.

The Encoder network is:

Python implements a simple machine translation model21

The Decoder network is:

Python implements a simple machine translation model22

Where encoder's last hidden state is the initial hidden state of decoder. T he weight calculation of the attention mechanism is similar to that described in (4). The structure of the GRU network is:

Python implements a simple machine translation model23

GRU network structure here will not be introduced in detail, the length is too long to estimate that no one can see it, first of all, this is the case

In the relevant documents I also provide 4 related papers for interested parties to read and study. (T_T pure English . .

The results are shown

Run the Translation.py file in the cmd window.

Error curve:

Python implements a simple machine translation model24

Output of cmd window during training:

Python implements a simple machine translation model25

Model testing:

Python implements a simple machine translation model26

Python implements a simple machine translation model27

As a comparison:

Python implements a simple machine translation model28

It's exactly the same as the last test result, with wood and !!!

Of course, some translation results are not very satisfactory. Because the model and training data is too simple (T_T here is no example)

Attention diagram of the last four sentences:

Python implements a simple machine translation model29

Python implements a simple machine translation model30

Python implements a simple machine translation model31

Python implements a simple machine translation model32

That's all~~~

Interested students can further modify the model to get better results, of course, you can also find other data sets to make such as the model of the Chinese and British

Python implements a simple machine translation model

This article was reproduced to Know ID: Charles (Bai Lu) knows his personal column

Download the W3Cschool Mobile App, 0 Foundation Anytime, Anywhere Learning Programming >> Poke this to learn

Lead

Related documents

References

Environment construction

Introduction to the principle

more

Cookie Consent