DevExtreme - Real World Patterns - MongoDB Data Access

Oliver's Blog
24 March 2017

This post is part of a series describing a demo project that employs various real-world patterns and tools to provide access to data in a MongoDB database for the DevExtreme grid widgets. You can find the introduction and overview to the post series by following this link.

This post describes the functionality implemented in the devextreme-query-mongodb package, which can easily be used outside the context of the larger demo project that this blog series focuses on.

MongoDB is a NoSQL database that stores documents in collections. It is a commonly used system, and many programming languages and environments have easy access to it with the help of drivers or libraries. There is a very useful “getting started” overview of MongoDB that should be your starting point if you want to consider using MongoDB for your own projects.

From the SQL point of view, MongoDB collections are similar to tables. They are containers that contain documents, which are similar to table rows. However, documents are much more flexible and each document can differ from others in the same collection and contain nested structures. For purposes of querying data, collections are the starting points, just like tables are in SQL databases. Joins are not common with NoSQL databases, and while MongoDB supports joins for certain types of queries, these capabilities are not used by the project described here.

DevExtreme loadOptions

The goal for the implementation of devextreme-query-mongodb was to create an interface between the Custom Store querying functionality supplied by DevExtreme and the MongoDB APIs. All queries should utilize MongoDB server functionality, if possible.

At the core of the DevExtreme querying features is a structure commonly called loadOptions, which defines the parameters for a data query. Here is an overview of the supported parameters:

Parameter Details
take Integer. Restrict the number of top level results to return.
skip Integer. Skip a number of items from the start of the result set. In conjunction with take, this is often used to implement paging.
sort Array. Elements have the structure { selector: "field", desc: true/false } and are applied to the data in sequence to implement multi-level sorting.
filter Array. This can be a hierarchical structure of elements and arrays. Details of this can be found below.
searchExpr, searchOperation and searchValue Strings. Another mechanism to define a filter, restricted to one criterion. The filtering details described below apply to these parameters.
select Array. A simple list of field name strings that restricts the returned data objects to that set of fields.
requireTotalCount Bool. Indicate that a total count of result set data objects must be returned in the totalCount field of the result. The count is expected to consider any filters that are being applied, but it is supposed to disregard any take parameter used for the query. The count must reflect the number of data items available after filtering, not for instance the number of grouping nodes or similar.
totalSummary Array. Elements have the structure { selector: "field", summaryType: "type" }, where summaryType can be one of sum, avg, min, max and count. Results of these summary calculations are returned in an array called summary that contains the result values in the same order used for the summary definitions.
group Array. The structures in this array define the grouping levels that need to be applied to the data. Details of this can be found below.
requireGroupCount Bool, default false. When used in conjunction with group, require the return of a top-level group count in the groupCount field of the result.
groupSummary Array. The structure is the same as for totalSummary, but these summary values will be returned for each group if group is used.

If parameters are not included in the options, they don’t apply. In other words, for an empty loadOptions object the full set of data objects will be returned.

Result data structure

The result of a query operation returned by devextreme-query-mongodb looks like this:

{
    data: [ ... ],    // list of result data objects
    summary: [ ... ], // list of summary results if requested in totalSummary
    totalCount: 10,   // if required in requireTotalCount
    groupCount: 3     // if required in requireGroupCount
}

Filtering

Individual filter criteria have the format [ "field", "operator", "value" ], where operator is one of =, <>, <, >, <=, >=, startsWith, endswith, contains and notcontains. The last four are used for strings, otherwise operators apply to all value types. Criteria can be combined in hierarchies using nested arrays with and and or operators like this:

[
  [ "field", "=", 10 ],
  "and",
  [
    [ "otherField", "<", 3 ],
    "or",
    [ "otherField", ">", 11 ]
  ]
]

It is possible to negate a criterion (or hierarchy of criteria) by applying the operator ! like this: [ "!", [ "field", "=", 3 ]].

Finally, special syntax is supported to access “parts” of the information contained in date values. This feature is used by the Pivot Grid in situations where groupInterval settings for date fields are applied, but can be used independently. A field postfix in a criterion looks like this: [ "dateField.Month", "=", 4 ]. Supported postfixes are Year, Quarter (1-4), Month (1-12), Day and DayOfWeek (0=Sunday - 6=Saturday). The implementation doesn’t require particular casing for these postfixes.

Grouping

The group structure is a flat array of group definitions. Each group definition can have the following parameters:

Parameter Details
selector String, required. The field name to group on.
desc Bool, default false. Define descending sort order on the selector field.
isExpanded Bool, default false. Confusingly named, this field has meaning only for the last group definition and setting it to true requires the group data objects to be returned with the result set. To be clear, nested grouping data is always returned, even if isExpanded is false for a group definition that is not the last one. isExpanded only defines whether data objects will be returned as well.
groupInterval Integer or String. For Integer values, data will be grouped in ranges of the given length. String values apply only to date fields, supported options are year, quarter, month, day, dayOfWeek, hour, minute and second.

For group queries, the result data structure is different from the simple query result structure shown above. Here it is:

{
    data: [              // list of groups
      {
        count: 3,        // count of items in this group, even if items=null
        key: "groupKey", // value that grouping was applied on
        items: [ ... ],  // sub-groups or data objects (for the last group when isExpanded=true)
                         //   can be null when isExpanded=false and there are no further groups
        summary: [ ...]  // list of group summary results if requested in groupSummary
      },
      ...                // further groups on this level
    ],
    summary: [ ... ],    // list of summary results if requested in totalSummary
    totalCount: 10,      // if required in requireTotalCount
    groupCount: 3        // if required in requireGroupCount
}

Note that the key in each group is the value used for grouping. In simple selector based group definitions, the key will be a value valid for the selector field, but if groupInterval is also used the key will contain the interval value (which, in the case of date intervals, is not a valid value for/from the selector field).

Examples

The tests implemented for the package show many examples of loadOptions structures.

Querying data

The package devextreme-query-mongodb exports a function called query. Call this and pass a MongoDB collection with a loadOptions structure, and optionally processing options (see below). Here is a simple example:

const MongoClient = require("mongodb").MongoClient;
const query = require("devextreme-query-mongodb");

async function queryData() {
  MongoClient.connect("mongodb://localhost:27017/testdatabase", (err, db) => {
    const results = await query(db.collection("values"), {
      // This is the loadOptions object - pass in any valid parameters
      take: 10,
      filter: [ "intval", ">", 47 ],
      sort: [ { selector: "intval", desc: true }]
    });

    // Now "results" contains an array of ten or fewer documents from the 
    // "values" collection that have intval > 47, sorted descendingly by intval.
  });
}

devextreme-query-mongodb Implementation Details

The following two paragraphs describe some details about the implementation of the package. Especially the paragraph Query Pipelines is mainly of interest if you would like to work with the source code, or use it as a pattern to develop similar functionality yourself.

The query function

The entry point to the package is the query function:

async function query(collection, loadOptions = {}, options = {}) {
  const standardContextOptions = {
    replaceIds: true,
    summaryQueryLimit: 100
  };
  const contextOptions = Object.assign(standardContextOptions, options);
  const context = createContext(contextOptions, loadOptions);

  return loadOptions.group && loadOptions.group.length > 0 ?
    context.queryGroups(collection, loadOptions) :
    context.querySimple(collection, loadOptions);
}

The function returns an async result. I decided to introduce a query context at this point, which holds the loadOptions required by various parts of the algorithm, and some query options.

The replaceIds query option is set to true by default. The effect of this is that _id field values are returned as strings, instead of using the MongoDB internal object representation. The default value is set under the assumption that data will be passed on to parts of the application system where the origin of the id values should not be visible. Since MongoDB doesn’t have a built-in way (as far as I know) of returning id values as strings, there is a certain overhead associated with this approach. In cases where you are going to process the data further using other MongoDB queries, it might be beneficial to pass false for replaceIds.

summaryQueryLimit is a safety-net style workaround for a situation where group queries are executed to return large result sets, and group summaries are required at the same time. Separate queries need to be executed internally to calculate various summaries, and if this is accidentally done for all groups it can result in an increase in processing time to the extent that other parts of your architecture encounter timeouts. Realistically this shouldn’t happen as long as group queries are combined with reasonable take values. But with certain combinations of flags (server-side grouping and summaries, but no groupPaging) the Data Grid executes queries that result in this issue. The default summaryQueryLimit prevents more than 100 summaries from being calculated, which should suffice in most cases. In case you want to deactivate this mechanism, you can set summaryQueryLimit to 0 (zero).

Finally, the query function splits the execution path in two by distinguishing between simple and group queries. The query pipelines for the simple queries are less complicated and I decided to implement the two variations separately. Technically it would be possible to combine the two, which is a consideration for future refactoring efforts.

Query Pipelines

MongoDB supports a simple querying interface through the collection.find() function. For grouping queries however, a call to collection.aggregate() is required, which uses a powerful framework of processing pipeline stages. Since there is functional overlap in the implementations of the simple and group queries, I decided to use aggregation pipelines for all queries since it allows me to reuse pipeline elements in both algorithms. You can see that the function querySimple retrieves its results using a collection.aggregate(...) call:

let resultObject = {
  data: (await collection.aggregate(dataPipeline).toArray()).map(this.replaceId)
};

Both functions querySimple and queryGroups build their querying pipelines out of parts constructed in various helper functions. These are the parts:

Part Details
completeFilterPipeline This pipeline part combines operations that filter data, based on the loadOptions elements filter, searchExpr, searchOperation and searchValue. If filtering is performed fields that have date part postfixes, the pipeline also includes a stage to add the nested fields. The function createCompleteFilterPipeline returns information on any nested fields that are being added by the part, so they can be referred to later.
sortPipeline The loadOptions element sort is implemented here.
skipTakePipeline The parameters skip and take from the loadOptions are reflected by this part.
selectPipeline This part implements a projection stage to take the loadOptions.select option into account.
removeNestedFieldsPipeline Nested fields added to the document stream as part of the completeFilterPipeline are removed again after they have been used.
countPipeline, totalCountPipeline and groupCountPipeline These three are constructed conditionally outside the main data retrieval process, to count data objects and groups in the result set.
summaryPipeline Constructed outside the main data retrieval process, to calculate total summary values.

A few additional pipelines are created in the createGroup function, which queries group data recursively for all grouping levels. The pipeline part groupKeyPipeline is used to add the group key field to the data stream, which is important in cases where nested sub-groups need to be queried at a later point, because a filter must be applied on the basis of the parent group’s key value. Group level counts and summaries are queried separately (in the case of count only when necessary), which requires separate pipelines.

Since the pipelines are nested arrays of objects, it is easy to observe the construction process in a debugger or output pipelines to the console. For illustration, consider these loadOptions:

{
  filter: [
    ["date2.Month", ">=", 4],
    "and",
    ["date2.Month", "<", 7]
  ],
  group: [
    {
      groupInterval: "month",
      isExpanded: false,
      selector: "date1"
    }
  ],
  groupSummary: [
    {
      selector: "int1",
      summaryType: "sum"
    }
  ],
  totalSummary: [{
    selector: "int1",
    summaryType: "sum"
  }],
  requireTotalCount: true
}

The main data querying pipeline for these options is this:

[
  {
    "$addFields": {
      "___date2_Month": {
        "$month": "$date2"
      }
    }
  },
  {
    "$match": {
      "$and": [
        {
          "___date2_Month": {
            "$gte": 4
          }
        },
        {
          "___date2_Month": {
            "$lt": 7
          }
        }
      ]
    }
  },
  {
    "$project": {
      "___date2_Month": 0
    }
  },
  {
    "$addFields": {
      "___group_key_0": {
        "$month": "$date1"
      }
    }
  },
  {
    "$group": {
      "_id": "$___group_key_0",
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "key": "$_id",
      "count": 1
    }
  },
  {
    "$sort": {
      "key": 1
    }
  },
  {
    "$addFields": {
      "items": null
    }
  }
]
no comments
No Comments

Please login or register to post comments.