Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

MongoDB aggregation


May 17, 2021 MongoDB


Table of contents


MongoDB aggregation

Aggregate in MongoDB is primarily used to process data (such as statistical averages, aggregations, etc.) and to return calculated data results. I t's a bit like count in a sql statement.


Aggregate() method

The method of aggregation in MongoDB uses aggregate().

Grammar

The basic syntax format of the aggregate() method is as follows:

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Instance

The data in the collection is as follows:

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by_user: 'w3cschool.cn',
   url: 'http://www.w3cschool.cn',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No sql database is very fast',
   by_user: 'w3cschool.cn',
   url: 'http://www.w3cschool.cn',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

Now let's calculate the number of articles written by each author from the above set, using aggregate() to calculate the following results:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
   "result" : [
      {
         "_id" : "w3cschool.cn",
         "num_tutorial" : 2
      },
      {
         "_id" : "Neo4j",
         "num_tutorial" : 1
      }
   ],
   "ok" : 1
}
>

The above example is similar to the sql statement: select by_user, count (*) from mycol group by by_user

In the example above, we group the data by_user fields and calculate the sum of the by_user the same values for each field.

The following table shows some aggregated expressions:

The expression Describe Instance
$sum Calculate the sum. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg Calculate the average db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min Gets the minimum value for all documents in the collection. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max Gets the maximum value for all documents in the collection. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push Insert values into an array in the result document. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet Insert values into an array in the resulting document, but do not create a copy. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first Get the first document data based on the sort of resource documents. db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last Get the last document data based on the sort of resource documents db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

The concept of pipes

Pipelines are typically used in Unix and Linux to use the output of the current command as an argument to the next command.

MongoDB's aggregation pipeline passes mongoDB documents to the next pipeline after one pipeline has been processed. Pipeline operations can be repeated.

Expression: Processes the input document and outputs it. Expressions are stateless and can only be used to evaluate documents for the current aggregate pipeline and cannot work with other documents.

Here's a look at a few of the common operations in the aggregation framework:

  • $project: Modify the structure of the input document. It can be used to rename, add, or delete fields, or to create calculations and nested documents.
  • $match: Used to filter data and output only eligible documents. $match to use MongoDB's standard query operations.
  • $limit: Used to limit the number of documents returned by the MongoDB aggregation pipeline.
  • $skip: Skip a specified number of documents in the aggregation pipeline and return the remaining documents.
  • $unwind: Split an array type field in a document into multiple bars, each containing a value in the array.
  • $group: Group documents in a collection that can be used to count results.
  • $sort: The input document is sorted and output.
  • $geoNear: Outputs an ordered document that is close to a geographic location.

An instance of a pipeline operator

1, $project instances

db.article.aggregate(
    { $project : {
        title : 1 ,
        author : 1 ,
    }}
 );

In this case, there are only _id, tilte and ausor fields,_id which are included by default, if you want to _id the following:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }});

2.$match instance

db.articles.aggregate( [
                        { $match : { score : { $gt : 70, $lte : 90 } } },
                        { $group: { _id: null, count: { $sum: 1 } } }
                       ] );

$match is used to obtain records with scores greater than 70 than or equal to 90, and then send eligible records to the next stage of the $group pipeline operator for processing.

3.$skip example

db.article.aggregate(
    { $skip : 5 });

The first five documents are "filtered" out after processing by the $skip pipeline operator.