Asynchronous application of the ES6 Generator function

Traditional methods

Before ES6 was born, there were probably four ways to program asynchronously.

Callback function
Event monitoring
Publish/subscribe
The Promise object

The Generator function takes JavaScript asynchronous programming to a whole new stage.

Basic concepts

Asynchronous

The "异步" simply said that a task is not completed continuously, can be understood as the task is man-made into two paragraphs, first to perform the first paragraph, and then to carry out other tasks, and so on ready, and then go back to the second paragraph.

For example, there is a task that reads files for processing, and the first segment of a task is to make a request to the operating system to read the files. T he program then performs other tasks, waiting for the operating system to return the file, followed by the second paragraph of the task (working with the file). This unseth consecutive execution is called asynchronous.

Accordingly, continuous execution is called synchronization. Because it is continuously executed and no other tasks can be inserted, the program can only wait while the operating system reads files from the hard disk.

Callback function

The JavaScript language's implementation of asynchronous programming is 回调函数 T he so-called callback function is to write the second paragraph of the task in a single function, and when the task is re-executed, the function is called directly. Callback, the English name of the callback function, translates as "重新调用"

Read the file for processing, as written.

fs.readFile('/etc/passwd', 'utf-8', function (err, data) {
  if (err) throw err;
  console.log(data);
});

In the code above, the third argument of the readFile function is the callback function, which is the second paragraph of the task. The callback function does not execute until the operating system returns the /etc/passwd file.

An interesting question is, why does the Node convention, the first argument of the callback function, have to be the wrong object err (if there is no error, the argument is null)?

The reason is that the execution is divided into two segments, and after the first paragraph is completed, the context in which the task is located is over. After this throw error, the original context has been unable to catch, can only be used as an argument, passed into the second paragraph.

Promise

The callback function itself is not a problem, its problem occurs in the nesting of multiple callback functions. Suppose you read the A file and then read the B file, as follows.

fs.readFile(fileA, 'utf-8', function (err, data) {
  fs.readFile(fileB, 'utf-8', function (err, data) {
    // ...
  });
});

It's not hard to imagine multiple nesting if you read more than two files in turn. C ode does not develop vertically, but horizontally, and can quickly become a mess that cannot be managed. B ecause multiple asynchronous operations form a strong coupling, as long as there is an operation that needs to be modified, its upper callback function and lower callback function may have to be modified. This is called callback hell.

The Promise object was proposed to solve this problem. I t is not a new syntax feature, but a new way of writing that allows callback functions to be nested into chain calls. With Promise, multiple files are read in a row and written as follows.

var readFile = require('fs-readfile-promise');
readFile(fileA)
.then(function (data) {
  console.log(data.toString());
})
.then(function () {
  return readFile(fileB);
})
.then(function (data) {
  console.log(data.toString());
})
.catch(function (err) {
  console.log(err);
});

In the code above, I used the fs-readfile-promise module, which is designed to return a promise version of the readFile function. Promise provides the then method load callback function, which catches errors thrown during execution.

As you can see, Promise's writing is only an improvement on the callback function, and with the then method, the two-segment execution of the asynchronous task is more clear, and nothing new.

Promise's biggest problem is code redundancy, the original task is wrapped by Promise, no matter what the operation, at a glance is a bunch of then, the original semantics become very unclear.

So, is there a better way to write it?

3. Generator function

Co-program

Traditional programming languages have long had asynchronous programming solutions (in fact, multitaste solutions). One of these is "协程" which means that multiple threads work together to accomplish asynchronous tasks.

Co-programs are a 函数 functions, and they're a 线程 It runs roughly as follows.

The first step, Co-equation A, begins.
In the second step, Co-Program A executes to halfway through, goes into a pause, and the executive power is transferred to Co-Program B.
The third step, (after a period of time) co-program B returns the executive power.
Step 4, Co-Equation A resumes execution.

Co-equation A of the above process is an asynchronous task because it is performed in two (or more) segments.

For example, the co-writing of a read file is as follows.

function* asyncJob() {
  // ...其他代码
  var f = yield readFile(fileA);
  // ...其他代码
}

The function of the above code, asyncJob, is a co-program in which the mystery lies in the yield command. I t indicates that execution is here and that the executive power will be handed over to other co-programs. That is, the yield command is the asynchronous two-stage boundary.

The co-program is paused when it encounters the yield command, waits until the execution returns, and then continues from where it was suspended. Its greatest advantage is that the code is written very much like a synchronization operation, and if you remove the yield command, it's exactly the same.

The generator function implementation of the co-program

The Generator function is the implementation of the co-program in ES6, the biggest feature of which is that the execution of the function can be surrendered (i.e., suspended execution).

The Generator function is an encapsulated asynchronous task, or a container for an asynchronous task. W here asynchronous operations need to be paused, they are yield the yield statement. The Generator function is executed as follows.

function* gen(x) {
  var y = yield x + 2;
  return y;
}
var g = gen(1);
g.next() // { value: 3, done: false }
g.next() // { value: undefined, done: true }

In the above code, calling the Generator function returns an internal pointer (i.e. the traverser) g. T his is another place where the Generator function differs from the normal function in that executing it does not return results, returning pointer objects. Calling the next method of pointer g moves the internal pointer (that is, the first paragraph of performing an asynchronous task) to the first yield statement encountered, in the example of execution until x plus 2.

In other words, the next method is designed to perform the Generator function in stages. E ach time the next method is called, an object is returned that represents the information for the current stage (value property and done property). The value property is the value of the expression after the yield statement, which represents the value of the current stage, and the done property is a Boolean value that indicates whether the Generator function is executed or not, i.e. whether there is another stage.

Data exchange and error handling of generator functions

The Generator function can pause and resume execution, which is the root cause of the asynchronous task it encapsulates. In addition, it has two features that make it a complete solution for asynchronous programming: data exchange and error handling mechanisms inside and outside the function.

Next returns the value property of the value, which is the Generator function outputs data outwards, and the next method can also accept parameters and enter data into the Generator function body.

function* gen(x){
  var y = yield x + 2;
  return y;
}
var g = gen(1);
g.next() // { value: 3, done: false }
g.next(2) // { value: 2, done: true }

In the above code, the value property of the first next method returns the value 3 of the expression x plus 2. T he second next method has argument 2, which can be passed into the Generator function and received by the variable y inside the function as a return result of the asynchronous task of the previous stage. Therefore, the value property of this step returns 2 (the value of variable y).

The Generator function can also deploy error handling code within to catch errors thrown outside the function.

function* gen(x){
  try {
    var y = yield x + 2;
  } catch (e){
    console.log(e);
  }
  return y;
}
var g = gen(1);
g.next();
g.throw('出错了');
// 出错了

On the last line of the above code, outside the Generator function, an error thrown using the throw method of the pointer object can be called try inside the function... C atch block capture. This means that the error code and the code that handles the error achieve a time and space separation, which is undoubtedly important for asynchronous programming.

Encapsulation of asynchronous tasks

Here's a look at how to use the Generator function to perform a real asynchronous task.

var fetch = require('node-fetch');
function* gen(){
  var url = 'https://api.github.com/users/github';
  var result = yield fetch(url);
  console.log(result.bio);
}

In the code above, the Generator function encapsulates an asynchronous operation that reads a remote interface and then parses information from data in JSON format. As mentioned earlier, this code is very much like a synchronization operation, except for the yield command.

Here's how to execute this code.

var g = gen();
var result = g.next();
result.value.then(function(data){
  return data.json();
}).then(function(data){
  g.next(data);
});

In the above code, first execute the Generator function, get the traverser object, and then use the next method (second line) to perform the first stage of the asynchronous task. Because the Fetch module returns a Promise object, call the next next method with the then method.

As you can see, while the Generator function represents asynchronous operations succinctly, process management is inconvenient (that is, when to perform the first phase and when to perform the second phase).

4. Thunk function

The Thunk function is one way to automate the Generator function.

The value policy for the argument

The Thunk function was born as early as the 1960s.

At that time, programming language is just beginning, computer scientists are still studying, compiler how to write is better. One point of contention is "求值策略" which is when the parameters of the function should be valued.

var x = 1;
function f(m) {
  return m * 2;
}
f(x + 5)

The code above defines the function f and then passs the expression x plus 5 to it. Excuse me, when should this expression be evaluated?

One opinion is "call by "传值调用" which is to calculate the value of x plus 5 (equal to 6) before entering the function body, and then pass that value into function f. This strategy is used in the C language.

f(x + 5)
// 传值调用时，等同于
f(6)

Another opinion is the "call by “传名调用” which directly incoming the expression x plus 5 into the function body and evaluating it only when it is used. The Haskell language adopts this strategy.

f(x + 5)
// 传名调用时，等同于
(x + 5) * 2

Which is better for pass-through calls and name-calling calls?

The answer is both pros and cons. The value call is relatively simple, but when the parameter is valued, it is not actually used, which may result in performance loss.

function f(a, b){
  return b;
}
f(3 * x * x - 2 * x - 1, x);

In the code above, the first argument to function f is a complex expression, but it is not used at all in the function body. I t is actually unnecessary to value this parameter. As a result, some computer scientists tend to "name-calling", which is to value only at the time of execution.

The meaning of the Thunk function

The “传名调用” implementation often places the argument in a temporary function and then passes the temporary function into the function body. This temporary function is called Thunk function.

function f(m) {
  return m * 2;
}
f(x + 5);
// 等同于
var thunk = function () {
  return x + 5;
};
function f(thunk) {
  return thunk() * 2;
}

In the code above, the parameter x plus 5 of function f is replaced by a function. Wherever the original parameters are used, the Thunk function can be valued.

This is the definition of the Thunk function, which is an implementation strategy for "name-calling" to replace an expression.

The Thunk function of the JavaScript language

The JavaScript language is a pass-through call, and its Thunk function has a different meaning. In the JavaScript language, the Thunk function replaces not an expression, but a multi-argument function, replacing it with a single-argument function that accepts callback functions only as arguments.

// 正常版本的readFile（多参数版本）
fs.readFile(fileName, callback);
// Thunk版本的readFile（单参数版本）
var Thunk = function (fileName) {
  return function (callback) {
    return fs.readFile(fileName, callback);
  };
};
var readFileThunk = Thunk(fileName);
readFileThunk(callback);

In the above code, the readFile method of the fs module is a multi-parameter function with two parameters, the file name and the callback function, respectively. A fter the converter processes, it becomes a single-argument function, accepting only callback functions as arguments. This single-argument version is called the Thunk function.

Any function, as long as the argument has a callback function, can be written as a Thunk function. Here's a simple Thunk function converter.

// ES5版本
var Thunk = function(fn){
  return function (){
    var args = Array.prototype.slice.call(arguments);
    return function (callback){
      args.push(callback);
      return fn.apply(this, args);
    }
  };
};
// ES6版本
const Thunk = function(fn) {
  return function (...args) {
    return function (callback) {
      return fn.call(this, ...args, callback);
    }
  };
};

Use the converter above to generate the Thunk function of fs.readFile.

var readFileThunk = Thunk(fs.readFile);
readFileThunk(fileA)(callback);

Here's another complete example.

function f(a, cb) {
  cb(a);
}
const ft = Thunk(f);
ft(1)(console.log) // 1

Thunkify module

A converter for the production environment, using the Thunkify module.

The first is the installation.

$ npm install thunkify

Here's how to use it.

var thunkify = require('thunkify');
var fs = require('fs');
var read = thunkify(fs.readFile);
read('package.json')(function(err, str){
  // ...
});

Thunkify's source code is very similar to the simple converter in the last section.

function thunkify(fn) {
  return function() {
    var args = new Array(arguments.length);
    var ctx = this;
    for (var i = 0; i < args.length; ++i) {
      args[i] = arguments[i];
    }
    return function (done) {
      var called;
      args.push(function () {
        if (called) return;
        called = true;
        done.apply(null, arguments);
      });
      try {
        fn.apply(ctx, args);
      } catch (err) {
        done(err);
      }
    }
  }
};

Its source code is primarily an inspector, and the variable called ensures that the callback function runs only once. T his design is related to the Generator function below. Take a look at the example below.

function f(a, b, callback){
  var sum = a + b;
  callback(sum);
  callback(sum);
}
var ft = thunkify(f);
var print = console.log.bind(console);
ft(1, 2)(print);
// 3

In the code above, because thunkify allows callback functions to execute only once, only one line of results is output.

Process management of generator functions

You might ask, what's the use of the Thunk function? The answer is that it was really useless before, but with the Generator function in ES6, the Thunk function can now be used for automatic process management of the Generator 自动流程管理

Generator functions can be executed automatically.

function* gen() {
  // ...
}
var g = gen();
var res = g.next();
while(!res.done){
  console.log(res.value);
  res = g.next();
}

In the code above, the Generator function gen automatically completes all the steps.

However, this is not suitable for asynchronous operations. I f the previous step must be guaranteed to be completed before the next step can be performed, the above automatic execution is not feasible. A t this point, the Thunk function can be useful. T ake reading a file, for example. The following Generator function encapsulates two asynchronous operations.

var fs = require('fs');
var thunkify = require('thunkify');
var readFileThunk = thunkify(fs.readFile);
var gen = function* (){
  var r1 = yield readFileThunk('/etc/fstab');
  console.log(r1.toString());
  var r2 = yield readFileThunk('/etc/shells');
  console.log(r2.toString());
};

In the code above, the yield command is used to move the execution of a program out of the Generator function, and a method is required to return the execution to the Generator function.

This method is the Thunk function because it can return execution to the Generator function in the callback function. For ease of understanding, let's first look at how to perform this Generator function above manually.

var g = gen();
var r1 = g.next();
r1.value(function (err, data) {
  if (err) throw err;
  var r2 = g.next(data);
  r2.value(function (err, data) {
    if (err) throw err;
    g.next(data);
  });
});

In the code above, the variable g is an internal pointer to the Generator function, indicating which step is currently being performed. The next method is responsible for moving the pointer to the next step and returning the information for that step (value property and done property).

A closer look at the code above reveals that the Generator function executes by actually passing the value property of the next method over and over again to the same callback function. This allows us to automate the process with recursive returns.

Automatic process management of the Thunk function

The real power of the Thunk function is that it 自动执行 the Generator function. Below is a Generator executor based on the Thunk function.

function run(fn) {
  var gen = fn();
  function next(err, data) {
    var result = gen.next(data);
    if (result.done) return;
    result.value(next);
  }
  next();
}
function* g() {
  // ...
}
run(g);

The run function in the code above is an auto-executor of the Generator function. T he internal next function is Thunk's callback function. The next function first moves the pointer to the next step of the Generator function (gen.next method) and then determines whether the Generator function ends (result.done property), and if it does not end, the next function is passed in to the Thunk function (result.value property) or it exits directly.

With this executor, it's much easier to execute Generator functions. R egardless of how many asynchronous operations are inside, pass the Generator function directly into the run function. Of course, the premise is that every asynchronous operation must be a Thunk function, that is, the yield command must be followed by a Thunk function.

var g = function* (){
  var f1 = yield readFileThunk('fileA');
  var f2 = yield readFileThunk('fileB');
  // ...
  var fn = yield readFileThunk('fileN');
};
run(g);

In the code above, function g encapsulates n asynchronous read file operations, which are done automatically whenever the run function is executed. In this way, asynchronous operations can be written not only like synchronization operations, but also a line of code can be executed.

The Thunk function is not the only scenario in which the Generator function is automated. B ecause the key to automated execution is that there must be a mechanism that automatically controls the flow of generator functions, receiving and returning the execution rights of the program. Callback functions can do this, and promise objects can do it.

5. co module

Basic usage

The co module is a gadget released in June 2013 by renowned programmer TJ Holowaychuk for the automatic execution of Generator functions.

The following is a Generator function that reads two files in turn.

var gen = function* () {
  var f1 = yield readFile('/etc/fstab');
  var f2 = yield readFile('/etc/shells');
  console.log(f1.toString());
  console.log(f2.toString());
};

The co module allows you to write an executor without writing generator functions.

var co = require('co');
co(gen);

In the code above, the Generator function is executed automatically whenever it is passed in to the co function.

The co function returns a Promise object, so you can add callback functions using the then method.

co(gen).then(function (){
  console.log('Generator 函数执行完成');
});

In the above code, a line of prompts is output until the execution of the Generator function is over.

The principle of the co module

Why can co automate generator functions?

As mentioned earlier, Generator is a container for asynchronous operations. Its automatic execution requires a mechanism that automatically returns execution rights when asynchronous operations have results.

Two methods can do this.

(1) Callback function. Wrap the asynchronous operation as a Thunk function and hand back execution rights within the callback function.

(2) Promise object. Wrap asynchronous operations as Promise objects and return execution rights using the then method.

The co module is really wrapping two auto-executors (Thunk functions and Promise objects) into one module. T he prerequisite for using co is that the yield command of the Generator function can only be followed by the Thunk function or the Promise object. If the members of an array or object are all Promise objects, you can also use co, see the example below.

The Thunk function-based auto-executors were described in the last section. B elow, the auto-executor based on the Promise object. This is necessary to understand the co module.

Auto-execution based on a Promise object

Or follow the example above. First, wrap the readFile method of the fs module into a Promise object.

var fs = require('fs');
var readFile = function (fileName){
  return new Promise(function (resolve, reject){
    fs.readFile(fileName, function(error, data){
      if (error) return reject(error);
      resolve(data);
    });
  });
};
var gen = function* (){
  var f1 = yield readFile('/etc/fstab');
  var f2 = yield readFile('/etc/shells');
  console.log(f1.toString());
  console.log(f2.toString());
};

Then, manually execute the Generator function above.

var g = gen();
g.next().value.then(function(data){
  g.next(data).value.then(function(data){
    g.next(data);
  });
});

Manual execution is actually using the then method, layer by layer to add callback functions. With this in common, you can write out an auto-executor.

function run(gen){
  var g = gen();
  function next(data){
    var result = g.next(data);
    if (result.done) return result.value;
    result.value.then(function(data){
      next(data);
    });
  }
  next();
}
run(gen);

In the above code, the next function calls itself as long as the Generator function has not yet been executed to the last step, thus implementing automatic execution.

The source code for the co module

co is an extension of the auto-executor above, with only a few dozen lines of source code, which is very simple.

First, the co function accepts the Generator function as an argument and returns a Promise object.

function co(gen) {
  var ctx = this;
  return new Promise(function(resolve, reject) {
  });
}

Inside the returned Promise object, co first checks whether the parameter gen is a Generator function. If so, execute the function to get an internal pointer object, if not return, and change the state of the Promise object to resolved.

function co(gen) {
  var ctx = this;
  return new Promise(function(resolve, reject) {
    if (typeof gen === 'function') gen = gen.call(ctx);
    if (!gen || typeof gen.next !== 'function') return resolve(gen);
  });
}

Next, co wraps the next method of the generator function's internal pointer object into an onFulfilled function. This is primarily to be able to catch thrown errors.

function co(gen) {
  var ctx = this;
  return new Promise(function(resolve, reject) {
    if (typeof gen === 'function') gen = gen.call(ctx);
    if (!gen || typeof gen.next !== 'function') return resolve(gen);
    onFulfilled();
    function onFulfilled(res) {
      var ret;
      try {
        ret = gen.next(res);
      } catch (e) {
        return reject(e);
      }
      next(ret);
    }
  });
}

Finally, the key next function, which calls itself repeatedly.

function next(ret) {
  if (ret.done) return resolve(ret.value);
  var value = toPromise.call(ctx, ret.value);
  if (value && isPromise(value)) return value.then(onFulfilled, onRejected);
  return onRejected(
    new TypeError(
      'You may only yield a function, promise, generator, array, or object, '
      + 'but the following object was passed: "'
      + String(ret.value)
      + '"'
    )
  );
}

In the code above, the internal code of the next function has only four lines of commands.

On the first line, check to see if this is the last step of the Generator function and return if so.

The second line, making sure that the return value for each step, is the Promise object.

On the third line, using the then method, add a callback function to the return value, and then call the next function again through the onFulfilled function.

The fourth line terminates execution by changing the state of the Promise object to rejected if the parameters do not meet the requirements (parameters are not Thunk functions and Promise objects).

Handles asynchronous operations that are not the same

co 并发 that allow certain operations to take place at the same time until they are all complete before taking the next step.

At this point, place the same operation inside the array or object, followed by the yield statement.

// 数组的写法
co(function* () {
  var res = yield [
    Promise.resolve(1),
    Promise.resolve(2)
  ];
  console.log(res);
}).catch(onerror);
// 对象的写法
co(function* () {
  var res = yield {
    1: Promise.resolve(1),
    2: Promise.resolve(2),
  };
  console.log(res);
}).catch(onerror);

Here's another example.

co(function* () {
  var values = [n1, n2, n3];
  yield values.map(somethingAsync);
});
function* somethingAsync(x) {
  // do something async
  return y
}

The above code allows three somethingAsync asynchronous operations to be performed at the same time until they are all complete before going to the next step.

Example: ProcessIng Stream

Node Stream mode read and write data, characterized by processing only a portion of the data at once, which is processed in “数据流” stream". T his is great for working with large-scale data. Stream mode uses the EventEmitter API to release three events.

Data event: The next block of data is ready.
End event: The entire "data stream" is finished.
Error event: An error occurred.

Using Promise.race() you can determine which of these three events occurs first, and only enters the processing of the next block of data when the data event occurs first. Thus, we can read all the data through a while loop.

const co = require('co');
const fs = require('fs');
const stream = fs.createReadStream('./les_miserables.txt');
let valjeanCount = 0;
co(function*() {
  while(true) {
    const res = yield Promise.race([
      new Promise(resolve => stream.once('data', resolve)),
      new Promise(resolve => stream.once('end', resolve)),
      new Promise((resolve, reject) => stream.once('error', reject))
    ]);
    if (!res) {
      break;
    }
    stream.removeAllListeners('data');
    stream.removeAllListeners('end');
    stream.removeAllListeners('error');
    valjeanCount += (res.toString().match(/valjean/ig) || []).length;
  }
  console.log('count:', valjeanCount); // count: 1120
});

The above code reads Les Miserables text files in Stream mode, uses the stream.once method for each block, and adds a one-time callback function to the data, end, and error events. The variable res has a value only when the data event occurs, and then adds up the number of times the word valjean appears in each block of data.