DevExtreme - Real World Patterns - MongoDB Data Access

24 March 2017

This post is part of a series describing a demo project that employs various real-world patterns and tools to provide access to data in a MongoDB database for the DevExtreme grid widgets. You can find the introduction and overview to the post series by following this link.

This post describes the functionality implemented in the devextreme-query-mongodb package, which can easily be used outside the context of the larger demo project that this blog series focuses on.

MongoDB is a NoSQL database that stores documents in collections. It is a commonly used system, and many programming languages and environments have easy access to it with the help of drivers or libraries. There is a very useful “getting started” overview of MongoDB that should be your starting point if you want to consider using MongoDB for your own projects.

From the SQL point of view, MongoDB collections are similar to tables. They are containers that contain documents, which are similar to table rows. However, documents are much more flexible and each document can differ from others in the same collection and contain nested structures. For purposes of querying data, collections are the starting points, just like tables are in SQL databases. Joins are not common with NoSQL databases, and while MongoDB supports joins for certain types of queries, these capabilities are not used by the project described here.

DevExtreme `loadOptions`

The goal for the implementation of devextreme-query-mongodb was to create an interface between the Custom Store querying functionality supplied by DevExtreme and the MongoDB APIs. All queries should utilize MongoDB server functionality, if possible.

At the core of the DevExtreme querying features is a structure commonly called loadOptions, which defines the parameters for a data query. Here is an overview of the supported parameters:

Parameter	Details
`take`	Integer. Restrict the number of top level results to return.
`skip`	Integer. Skip a number of items from the start of the result set. In conjunction with `take`, this is often used to implement paging.
`sort`	Array. Elements have the structure `{ selector: "field", desc: true/false }` and are applied to the data in sequence to implement multi-level sorting.
`filter`	Array. This can be a hierarchical structure of elements and arrays. Details of this can be found below.
`searchExpr`, `searchOperation` and `searchValue`	Strings. Another mechanism to define a filter, restricted to one criterion. The filtering details described below apply to these parameters.
`select`	Array. A simple list of field name strings that restricts the returned data objects to that set of fields.
`requireTotalCount`	Bool. Indicate that a total count of result set data objects must be returned in the `totalCount` field of the result. The count is expected to consider any filters that are being applied, but it is supposed to disregard any `take` parameter used for the query. The count must reflect the number of data items available after filtering, not for instance the number of grouping nodes or similar.
`totalSummary`	Array. Elements have the structure `{ selector: "field", summaryType: "type" }`, where `summaryType` can be one of `sum`, `avg`, `min`, `max` and `count`. Results of these summary calculations are returned in an array called `summary` that contains the result values in the same order used for the summary definitions.
`group`	Array. The structures in this array define the grouping levels that need to be applied to the data. Details of this can be found below.
`requireGroupCount`	Bool, default `false`. When used in conjunction with `group`, require the return of a top-level group count in the `groupCount` field of the result.
`groupSummary`	Array. The structure is the same as for `totalSummary`, but these summary values will be returned for each group if `group` is used.

If parameters are not included in the options, they don’t apply. In other words, for an empty loadOptions object the full set of data objects will be returned.

Result data structure

The result of a query operation returned by devextreme-query-mongodb looks like this:

{
    data: [ ... ],    // list of result data objects
    summary: [ ... ], // list of summary results if requested in totalSummary
    totalCount: 10,   // if required in requireTotalCount
    groupCount: 3     // if required in requireGroupCount
}

Filtering

Individual filter criteria have the format [ "field", "operator", "value" ], where operator is one of =, <>, <, >, <=, >=, startsWith, endswith, contains and notcontains. The last four are used for strings, otherwise operators apply to all value types. Criteria can be combined in hierarchies using nested arrays with and and or operators like this:

[
  [ "field", "=", 10 ],
  "and",
  [
    [ "otherField", "<", 3 ],
    "or",
    [ "otherField", ">", 11 ]
  ]
]

It is possible to negate a criterion (or hierarchy of criteria) by applying the operator ! like this: [ "!", [ "field", "=", 3 ]].

Finally, special syntax is supported to access “parts” of the information contained in date values. This feature is used by the Pivot Grid in situations where groupInterval settings for date fields are applied, but can be used independently. A field postfix in a criterion looks like this: [ "dateField.Month", "=", 4 ]. Supported postfixes are Year, Quarter (1-4), Month (1-12), Day and DayOfWeek (0=Sunday - 6=Saturday). The implementation doesn’t require particular casing for these postfixes.

Grouping

The group structure is a flat array of group definitions. Each group definition can have the following parameters:

Parameter	Details
`selector`	String, required. The field name to group on.
`desc`	Bool, default `false`. Define descending sort order on the `selector` field.
`isExpanded`	Bool, default `false`. Confusingly named, this field has meaning only for the last group definition and setting it to `true` requires the group data objects to be returned with the result set. To be clear, nested grouping data is always returned, even if `isExpanded` is `false` for a group definition that is not the last one. `isExpanded` only defines whether data objects will be returned as well.
`groupInterval`	Integer or String. For Integer values, data will be grouped in ranges of the given length. String values apply only to date fields, supported options are `year`, `quarter`, `month`, `day`, `dayOfWeek`, `hour`, `minute` and `second`.

For group queries, the result data structure is different from the simple query result structure shown above. Here it is:

{
    data: [              // list of groups
      {
        count: 3,        // count of items in this group, even if items=null
        key: "groupKey", // value that grouping was applied on
        items: [ ... ],  // sub-groups or data objects (for the last group when isExpanded=true)
                         //   can be null when isExpanded=false and there are no further groups
        summary: [ ...]  // list of group summary results if requested in groupSummary
      },
      ...                // further groups on this level
    ],
    summary: [ ... ],    // list of summary results if requested in totalSummary
    totalCount: 10,      // if required in requireTotalCount
    groupCount: 3        // if required in requireGroupCount
}

Note that the key in each group is the value used for grouping. In simple selector based group definitions, the key will be a value valid for the selector field, but if groupInterval is also used the key will contain the interval value (which, in the case of date intervals, is not a valid value for/from the selector field).

Examples

The tests implemented for the package show many examples of loadOptions structures.

Querying data

The package devextreme-query-mongodb exports a function called query. Call this and pass a MongoDB collection with a loadOptions structure, and optionally processing options (see below). Here is a simple example:

const MongoClient = require("mongodb").MongoClient;
const query = require("devextreme-query-mongodb");

async function queryData() {
  MongoClient.connect("mongodb://localhost:27017/testdatabase", (err, db) => {
    const results = await query(db.collection("values"), {
      // This is the loadOptions object - pass in any valid parameters
      take: 10,
      filter: [ "intval", ">", 47 ],
      sort: [ { selector: "intval", desc: true }]
    });

    // Now "results" contains an array of ten or fewer documents from the 
    // "values" collection that have intval > 47, sorted descendingly by intval.
  });
}

devextreme-query-mongodb Implementation Details

The following two paragraphs describe some details about the implementation of the package. Especially the paragraph Query Pipelines is mainly of interest if you would like to work with the source code, or use it as a pattern to develop similar functionality yourself.

The `query` function

The entry point to the package is the query function:

async function query(collection, loadOptions = {}, options = {}) {
  const standardContextOptions = {
    replaceIds: true,
    summaryQueryLimit: 100
  };
  const contextOptions = Object.assign(standardContextOptions, options);
  const context = createContext(contextOptions, loadOptions);

  return loadOptions.group && loadOptions.group.length > 0 ?
    context.queryGroups(collection, loadOptions) :
    context.querySimple(collection, loadOptions);
}

The function returns an async result. I decided to introduce a query context at this point, which holds the loadOptions required by various parts of the algorithm, and some query options.

The replaceIds query option is set to true by default. The effect of this is that _id field values are returned as strings, instead of using the MongoDB internal object representation. The default value is set under the assumption that data will be passed on to parts of the application system where the origin of the id values should not be visible. Since MongoDB doesn’t have a built-in way (as far as I know) of returning id values as strings, there is a certain overhead associated with this approach. In cases where you are going to process the data further using other MongoDB queries, it might be beneficial to pass false for replaceIds.

summaryQueryLimit is a safety-net style workaround for a situation where group queries are executed to return large result sets, and group summaries are required at the same time. Separate queries need to be executed internally to calculate various summaries, and if this is accidentally done for all groups it can result in an increase in processing time to the extent that other parts of your architecture encounter timeouts. Realistically this shouldn’t happen as long as group queries are combined with reasonable take values. But with certain combinations of flags (server-side grouping and summaries, but no groupPaging) the Data Grid executes queries that result in this issue. The default summaryQueryLimit prevents more than 100 summaries from being calculated, which should suffice in most cases. In case you want to deactivate this mechanism, you can set summaryQueryLimit to 0 (zero).

Finally, the query function splits the execution path in two by distinguishing between simple and group queries. The query pipelines for the simple queries are less complicated and I decided to implement the two variations separately. Technically it would be possible to combine the two, which is a consideration for future refactoring efforts.

Query Pipelines

MongoDB supports a simple querying interface through the collection.find() function. For grouping queries however, a call to collection.aggregate() is required, which uses a powerful framework of processing pipeline stages. Since there is functional overlap in the implementations of the simple and group queries, I decided to use aggregation pipelines for all queries since it allows me to reuse pipeline elements in both algorithms. You can see that the function querySimple retrieves its results using a collection.aggregate(...) call:

let resultObject = {
  data: (await collection.aggregate(dataPipeline).toArray()).map(this.replaceId)
};

Both functions querySimple and queryGroups build their querying pipelines out of parts constructed in various helper functions. These are the parts:

Part	Details
`completeFilterPipeline`	This pipeline part combines operations that filter data, based on the `loadOptions` elements `filter`, `searchExpr`, `searchOperation` and `searchValue`. If filtering is performed fields that have date part postfixes, the pipeline also includes a stage to add the nested fields. The function `createCompleteFilterPipeline` returns information on any nested fields that are being added by the part, so they can be referred to later.
`sortPipeline`	The `loadOptions` element `sort` is implemented here.
`skipTakePipeline`	The parameters `skip` and `take` from the `loadOptions` are reflected by this part.
`selectPipeline`	This part implements a projection stage to take the `loadOptions.select` option into account.
`removeNestedFieldsPipeline`	Nested fields added to the document stream as part of the `completeFilterPipeline` are removed again after they have been used.
`countPipeline`, `totalCountPipeline` and `groupCountPipeline`	These three are constructed conditionally outside the main data retrieval process, to count data objects and groups in the result set.
`summaryPipeline`	Constructed outside the main data retrieval process, to calculate total summary values.

A few additional pipelines are created in the createGroup function, which queries group data recursively for all grouping levels. The pipeline part groupKeyPipeline is used to add the group key field to the data stream, which is important in cases where nested sub-groups need to be queried at a later point, because a filter must be applied on the basis of the parent group’s key value. Group level counts and summaries are queried separately (in the case of count only when necessary), which requires separate pipelines.

Since the pipelines are nested arrays of objects, it is easy to observe the construction process in a debugger or output pipelines to the console. For illustration, consider these loadOptions:

{
  filter: [
    ["date2.Month", ">=", 4],
    "and",
    ["date2.Month", "<", 7]
  ],
  group: [
    {
      groupInterval: "month",
      isExpanded: false,
      selector: "date1"
    }
  ],
  groupSummary: [
    {
      selector: "int1",
      summaryType: "sum"
    }
  ],
  totalSummary: [{
    selector: "int1",
    summaryType: "sum"
  }],
  requireTotalCount: true
}

The main data querying pipeline for these options is this:

[
  {
    "$addFields": {
      "___date2_Month": {
        "$month": "$date2"
      }
    }
  },
  {
    "$match": {
      "$and": [
        {
          "___date2_Month": {
            "$gte": 4
          }
        },
        {
          "___date2_Month": {
            "$lt": 7
          }
        }
      ]
    }
  },
  {
    "$project": {
      "___date2_Month": 0
    }
  },
  {
    "$addFields": {
      "___group_key_0": {
        "$month": "$date1"
      }
    }
  },
  {
    "$group": {
      "_id": "$___group_key_0",
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "key": "$_id",
      "count": 1
    }
  },
  {
    "$sort": {
      "key": 1
    }
  },
  {
    "$addFields": {
      "items": null
    }
  }
]

Free DevExpress Products - Get Your Copy Today

The following free DevExpress product offers remain available. Should you have any questions about the free offers below, please submit a ticket via the DevExpress Support Center at your convenience. We'll be happy to follow-up.

Oliver Sturm (DevExpress)

Need help or require more information?

Announcing DevExpress Universal v26.1

DevExpress Wins 20 Visual Studio Reader's Choice Awards

DevExtreme - Real World Patterns - MongoDB Data Access

DevExtreme `loadOptions`

Result data structure

Filtering

Grouping

Examples

Querying data

devextreme-query-mongodb Implementation Details

The `query` function

Query Pipelines

Free DevExpress Products - Get Your Copy Today

Recent Posts

Need help or require more information?

Announcing DevExpress Universal v26.1

DevExpress Wins 20 Visual Studio Reader's Choice Awards

DevExtreme - Real World Patterns - MongoDB Data Access

DevExtreme loadOptions

Result data structure

Filtering

Grouping

Examples

Querying data

devextreme-query-mongodb Implementation Details

The query function

Query Pipelines

Free DevExpress Products - Get Your Copy Today

Recent Posts

DevExtreme `loadOptions`

The `query` function