Investing analysis of the software companies that power next generation digital businesses

MongoDB World 2022 – Product Announcements

MongoDB held their annual user conference last week, bringing together customers, partners and employees at an in-person event in New York. As we would expect, MongoDB leadership used the conference to introduce a number of new product offerings and enhancements. They also hosted an Investor Session to tie the product announcements to MongoDB’s broader expansion strategy. The overall theme was to enable customers to address more application workloads with the MongoDB platform. Given the breadth and depth of the product improvements announced during MongoDB World, I think they easily achieved this goal.

Audio Version
View all Podcast Episodes and Subscribe

Product and Market Strategy

Before I detail all the major enhancements, let’s step back and review MongoDB’s strategy and the top-level themes for the product roadmap. The leadership team reinforced these as a precursor to the announcements. I think these help provide investors with a framework for evaluating the potential of the product releases.

MongoDB is pursuing a TAM estimated to be $85B this year by IDC. IDC categorizes the market as “database management systems software.” This category encompasses several segments of databases, all within the general scope of transactional database use cases, which MongoDB can address. These include relational database management systems, operational non-schematic database management systems, navigational database management systems, data lake management systems, in-memory shared data managers and low-code database management systems.

MongoDB Investor Session, June 2022

IDC estimates the TAM will grow by about 12% a year to reach $138B by 2026. This scope primarily encompasses transactional databases – the market for data warehouses and analytics is separate. Demand for transactional databases is benefitting from significant tailwinds, primarily digital transformation and automation of business processes. New enterprise efforts to harness smart devices to optimize operations (Industrial IoT) are driving an exponential increase in data volumes. This data is collected by front-line transactional databases, processed and then shipped to larger analytics stores for deeper analysis.

IDC survey data shows that roughly 90% of enterprises are either actively moving production data to the cloud and adopting cloud databases for new workloads or have plans to do so within the next three years,” said Carl Olofson, research vice president for Data Management Systems software research at IDC. “Data volumes are rising at an exponential rate, driven by streaming, mainly IoT data, log data, and so forth. This will drive the use of both high-speed data collection and processing technologies, including in-memory shared data managers and large volume analytic data platforms or data lakes.

IDC Research Report, september 2021

As enterprises consider solutions to capture and process all this data, either as part of a new digital transformation project or an upgrade to an existing legacy system, they will generally consider a modern database offering. While a 12% CAGR doesn’t sound exciting on the surface, in a large market like databases, it provides a lot of growth for leading providers. A better way to consider the opportunity is that $11B of new database spend will be introduced in 2023. MongoDB has a reasonable chance of landing a fair percentage of that. Add to that some portion of the existing spend that is considered for an upgrade.

MongoDB’s product strategy is to be in a strong position to benefit from both of these motions. First, upgrades of existing application database solutions will occur within the lifecycle of an older software application. After some period of time, a software application’s architecture drifts far enough from newer standards that maintaining it creates impedance for the engineering team. They have to retain staff familiar with a legacy database technology, update dated interfaces to accommodate new data types and wrestle with scalability as application usage grows.

Many legacy applications are coded as a large “monolith”, which would benefit from a refactoring into independent services. In this case, the team will plan a project to rewrite the application, possibly breaking it up into microservices. That provides an opportunity to migrate from a relational database onto another data model that may be better suited for that specific application function. If the lifetime of an application is 10 years on average, then $8.5B of existing database spend in 2022 would be available for a modern database platform to win.

Beyond the upgrade motion, enterprises are also creating new applications as part of digital transformation. This may be to create a new digital experience for customers, partners, supply chain providers or employees. For these new applications, the engineering team can choose a modern database solution that can address their data workloads. If the functionality is powering a site search, they may look for a search index. If it is powering a new IoT data collection service, the time series data model would be well suited. Or, they might be planning a new mobile app that needs a system to keep the device’s cache in sync with the central data store.

In all these cases, a database platform that can address multiple data access patterns provides some benefits. The developers have fewer interfaces and data storage technologies to learn. The DevOps team has fewer dependencies to manage. The leadership team can consolidate spend to fewer vendors and achieve volume based discounts. Consolidating database vendor sprawl results in lower cost and higher efficiency for the engineering team.

And this is the foundation of MongoDB’s product strategy. They seek to be in the consideration set for a greater number of application workloads. Workloads are the unit of measure for market penetration in application databases. The application database market also isn’t winner-take-all, like CRM or ERP. An engineering team usually has more than one database types in use at any time. A database vendor could land within an enterprise for a single use case on one application and expand into more workloads over time.

For MongoDB, this product strategy translates into platform improvements along three vectors. All of these enable customers to apply MongoDB to more of their application workloads.

  • Make it easier to migrate a legacy (relational) database to MongoDB.
  • Allow MongoDB to be applied to more data workloads.
  • Support additional application deployment architectures.
MongoDB Investor Session, June 2022

This provides the framework upon which the product announcements at MongoDB World hang. By supporting more data models and deployment architectures, MongoDB moves beyond its nucleus of centralized, document-oriented databases. This opens up a larger share of the $96B of application database spend in 2023 for MongoDB to capture.

Winning more Workloads

This high level strategy for capturing more application workloads was introduced at last year’s virtual conference in July. Leadership labeled MongoDB as the application data platform. This extended MongoDB’s base beyond powering document-oriented workloads to supporting a broader set of the typical data storage patterns within modern applications.

This was a reaction to the state of “sprawl” within the transactional database market. Developers have to choose from multiple options within each category of database, generally aligned with a single data model. A perusal of the listings on DB-Engines demonstrates this bounty of choices – they include about 400 different databases in their popularity tracker.

As I discussed, reducing the number of different databases that back enterprise software applications generates efficiencies, simplicity and cost savings for the engineering team. Consolidation may not be appropriate for every data access pattern, but the pendulum has swung too far in the direction of purpose-built databases over the past 10 years. We won’t have one general purpose database platform to rule them all, but we don’t need 400 options either.

MongoDB.live Investor Session, July 2021 (Author Annotations)

With these advantages in mind, MongoDB’s product strategy is to continue to improve the MongoDB data platform to become a suitable replacement for more flavors of databases. During their annual user conference in 2021, the product team presented the slide above, showing common database workloads for a typical customer application and popular point solutions for each. MongoDB’s long-term goal is to replace many of these. As part of MongoDB World this year, they announced a number of platform improvements that move MongoDB a few steps closer to this vision. I have highlighted in green boxes the targeted data workloads that received the most focus in this year’s announcements.

The team also introduced a number of generalized enhancements to the platform that apply to all data workloads. These encompass important functions like data security, cluster synchronization, secure APIs and developer tools. These enhancements make the platform more scalable, easier to use and highly secure. With that set-up, let’s run through the product announcements and how they tie back to MongoDB’s overall strategy.

Relational Migrator

MongoDB’s largest set of entrenched workloads to target are relational databases. First, I acknowledge that MongoDB won’t replace every SQL database out there. That is not reasonable or likely. However, as legacy applications are upgraded and new applications are planned, MongoDB’s data platform would be a suitable choice for many database implementations that previously defaulted to relational. Application re-architectures, particularly a refactoring into microservices, drives this motion. A service-oriented architecture makes it much easier to keep some workloads on relational, and move the rest to another model.

Migrating off of an entrenched relational database is a difficult exercise. It requires changes to the data model, the application code and movement of the data. In the past, engineering teams had to create their own tooling to support a migration. MongoDB sales support and professional services would assist in these projects by providing basic scripts to automate some of the work. A flexible, UI-driven tool was needed as much of the effort can be redundant across implementations.

This was the genesis for MongoDB’s first big product announcement, called the Relational Migrator. The goal of this product is to provide engineering teams with a tool to easily connect to a relational database, analyze its table structure, map that to the document model and then manage the data migration.

Interestingly, this capability was introduced by Mark Porter, MongoDB’s CTO. This is relevant because Mark has deep history working with the relational database model. Over his career, he was part of the core kernel group at Oracle, ran Amazon’s RDS group and managed Aurora. He knows databases, is hands-on and very technical. He claims that at one point, his code was used in every Visa transaction and ATT mobile call. He joined MongoDB because he felt it provides a brighter future for developers.

Mark contends that relational data models are dated and have limited use going forward for application development. They were designed 50 years ago, when systems needed to optimize for expensive storage. This constraint drove multiple design choices that now represent encumbrances for modern application development.

MongoDB Investor Session, June 2022

The document model, on the other hand, provides a more natural fit for developers. It maps directly with the computer science concept of object-oriented design, which is how most software programs manage data internally. A relational model requires a mapping of data from the application variables to the table structure (ORM). The document structure and relationships between them can accommodate many data models. Within an architecture like MongoDB’s distributed nodes, a cluster of database servers using the document model can be scaled to high concurrency.

MongoDB Investor Session, June 2022

With these inherent design and architectural advantages, many engineering organizations are choosing the document model for new and upgraded application workloads, often over the relational model. In fact, Mark Porter claims that he has met with over 400 CTO’s in the last two years and every one of them has an “off relational plan”, or at least an intent to do so. While a strong statement, I can visualize some percentage of databases that would fall into this category.

MongoDB Investor Session, June 2022

The Relational Migrator will be released initially to internal pre-sales and professional services teams within MongoDB to utilize. This will provide value to customers, while refining the tool’s functionality in a controlled environment. In 2023, a cloud-hosted version of the tool will be released for customers to use in a self-serve fashion.

Time Series Support

The upcoming release of MongoDB version 6.0 will come with a number of new capabilities for managing time series data. Notable improvements include support for secondary indexes on time series measurements. These make it easier for users to index data that varies over time. Version 6.0 also improves read performance for sort operations. MongoDB will support time series data for the full data lifecycle, including ingestion, storage, querying, analysis, visualization, and online archiving. MongoDB version 6.0 is currently in preview mode.

Time series data has existed for a long time. The primary use case has been in collecting telemetry data for system observability in software infrastructure. Example data types for this context are CPU utilization and available disk space at points in time. The definition of a time series data point is straightforward – a data value associated with a timestamp. With the proliferation of smart devices, whether for human or industrial use cases, the volume of time series data is increasing exponentially. Collecting, processing and aggregating this data at enormous scale is becoming a real database technology challenge.

MongoDB Investor Session, June 2022

Traditionally, working with large volumes of time series data has been difficult for several reasons. The data storage and query patterns differ from standard relational databases, limiting the relational model for processing high volumes of time series data.

  • Massive data throughput from a large number of inputs (sensors).
  • Latest data is most valuable, old data needs to be archived out.
  • Query pattern of retrieving time slices with fast response times.
  • Data can have gaps (if a device goes offline).

Customers have been using MongoDB for time series data for years, but the MongoDB team realized that the platform could better serve this use case. A year ago, they started a series of improvements to make time series workloads easier to manage, faster to query, less error prone and less expensive to maintain. The team created a new data collection within MongoDB that stores time series data in an optimized format.

MongoDB Investor Session, June 2022

The graphic above lists all the improvements made over the past year. Sharding support allowed time series data to be written to multiple nodes in parallel, which is important to accommodate the extremely high throughput of most IoT systems. They also added cardinality improvements for better performance, compression techniques, methods to fill data gaps and more efficient data archiving.

Geo indexing, introduced in version 6.0, is particularly powerful, as that allows developers to slice time series data by geographic location, which is a common use case. The MongoDB team combined two primitives within the platform to deliver this capability. They brought their geo-spatial libraries to the time series processing engine and unified them to support the creation and querying of indexes by geographic spacial definitions.

There are currently a number of point database solutions that focus on time series data. InfluxDB is the most popular, with Prometheus, Graphite, Kdb+ and TimescaleDB representing other alternatives. InfluxData is the commercial entity behind InfluxDB. They have raised about $120M to date and claim over 1,300 paying customers. Many of the larger ones overlap with MongoDB customers, like Cisco, eBay and Adobe. The time series database market is estimated to reach $575M in annual spend by 2028. I think it could become much larger than that based on the number of emerging use cases in Industrial IoT.

Search

While time series is exciting, powering application search likely represents a larger opportunity. We are all familiar with the free form search box on most web sites. A more prolific search use case is the slicing of results by product dimensions, like on an e-commerce site. These slices are called facets.

MongoDB Investor Session, June 2022

Search has even broader applications across a number of industries. In a generic sense, search represents targeted data retrieval across a number of dimensions that can return results extremely fast. It can be applied to customer loyalty programs, risk assessment, employer ratings, music retrieval, targeted content and order management. MongoDB is already seeing a number of search-related use cases from its customers.

MongoDB Investor Session, June 2022

The traditional method for enabling application search involved setting up a stand-alone tier of servers with a search index loaded onto them. These search servers were optimized for data retrieval, often employing a reverse index and the open source library Lucene. In order to keep the search index updated, engineering teams would need to maintain a data synchronization process between the search index and the primary database.

MongoDB Investor Session, June 2022

This created extra overhead for engineering teams in managing two separate systems. At minimum, they would set up a separate search tier using popular solutions like Elasticsearch and Solr. They also had to set up a dedicated ETL process to pull data out of the primary data store and write it to the search index. These resulted in labor costs and expense for the commercial version of the search servers (Elastic or Lucidworks).

The MongoDB team saw an opportunity to to make this process more efficient. In 2020, they introduced Atlas Search in GA. In MongoDB’s implementation, the search engine is integrated directly into the platform and sits beside the database. This provides a unified and fully managed system. Developers don’t need to maintain two sets of servers. They also avoid the overhead of managing a data sync mechanism, writing custom transformation logic and then remapping search indexes every time the database schema is updated.

MongoDB Investor Session, June 2022

MongoDB’s Atlas Search implementation is built using the open source Lucene search libraries. This is the same core search functionality at the center of Solr and Elasticsearch. While Lucene provides a lot of the basic search functionality, there are a number of use cases within search that need to be implemented outside of what comes with the library. The MongoDB team has been adding these in incremental releases over the last two years.

MongoDB Investor Session, June 2022

With the MongoDB 6.0 release, customer engineering teams will have access to search facets, cross-collection searching, stored source fields and embedded documents in arrays. These further round out the use cases that Atlas Search can address, with facets being the major addition.

In the product keynote, leading real estate company Keller Williams demonstrated how they are using Atlas Search to enable consumers and agents to search for properties on KW.com, supported by the underlying MongoDB application data platform with integrated Atlas Search. The search engine supports both full text searches by address and faceted search by different parameters like price, year built, bedrooms, square footage, etc.

The opportunity for MongoDB is to displace stand-alone application search engines, particularly those that are built on Lucene, like Elasticsearch and Solr. Elastic lists a large number of customers for their enterprise search offering, which includes both application search and internal retrieval of enterprise documents. Elastic also offers pre-packaged solutions for observability and security, which aren’t relevant for MongoDB. Application search represents a direct product extension for MongoDB, where they can easily layer on application search from the core database engine. Elastic would not be able to move in the opposite direction, from search index to transactional database (at least not easily).

MongoDB’s CPO even took a veiled shot at Elastic in his keynote, reflecting what he thinks will be the value of MongoDB’s singular focus on application search use cases. We will see how much market share MongoDB gains, but I think their strategy is sound. At minimum, MongoDB should be able to harvest low hanging fruit in standard faceted and site search use cases, like the example with Keller Williams.

We are laser focused on the developer use cases for search that power modern applications. We’re not off trying to solve observability or security problems with our platform.

Chief Product Officer, MongoDB, June 2022

Analytics

When MongoDB is talking about analytics, they are referring to “in-app” analytics. These are not the rich data visualizations constructed by aggregating multiple data sources in a data warehouse. MongoDB has no aspirations to move down the stack to power those types of big data workloads in the domain of Snowflake or Google BigQuery.

MongoDB is targeting data visualization and analytics use cases that are generated in near real-time using the data set stored within a single application’s transactional database. The purpose is not to help corporate executives or analysts understand business trends. Rather, these use cases typically drive a recommendation or decision made within the application context itself. Some examples include:

  • Personalization. Make consumer offers as items populate their shopping cart.
  • Fraud prevention. Drive real time decisions for payment approval based on fraud detection analysis.
  • Process optimization. Monitor order supply issues and re-route delivery traffic in real time.
  • Preventative maintenance. Track the performance of devices or vehicles and schedule maintenance needs when most convenient.
MongoDB World Keynote, June 2022

These use cases require the data processing framework illustrated in the screenshot above. The application needs access to three different databases for these kinds of logic decisions, including the operational data, a time series data store (as many of these use cases are associated with time series data) and the analytical database to query for recommendations. MongoDB is targeting all of these workloads, represented within the green box above. Having one database eliminates the ETL and synchronization jobs necessary to keep the three separate databases aligned. MongoDB isn’t targeting the space at the bottom of the diagram, occupied by the enterprise data warehouse and data lake.

In order to consolidate all of these functions into one platform, MongoDB has built in many capabilities that enable robust in-app analytics. These include a flexible data model, a framework to aggregate and query data within time windows, support for long-running queries that generate a snapshot view and workload isolation. The last item prevents analytics load from affecting the performance of operational database transactions.

In fact, MongoDB already has several customers using the platform for these types of real-time analytics use cases. Boxed is an online wholesale retailer (like Costco). They use MongoDB to optimize their delivery supply chain globally, so that they can reduce costs and compete with other distribution networks like Amazon. Amadeus powers the booking experience for many of the largest airlines worldwide. They apply MongoDB to generate the recommendations for flight options as consumers try to plan a trip.

Powering these use cases around in-app analytics is becoming another increasingly large market. As enterprises build new applications to enable digital transformation for their customers, the next logical step is to use that data to make their operations “smarter”. This could be through personalization of the customer experience or optimization of operations and fulfillment to improve outcomes and reduce costs.

This is also a target market for Snowflake. They are moving up the stack with aspirations to provide a data source for applications. This would serve the use case in which data aggregated within the data warehouse is then fed to an application to display it. Snowflake could become the direct source for this summarized data, versus standing up a separate application data store to serve it. They are leveraging the Streamlit acquisition to enable this use case.

I don’t see this as a risk to MongoDB, however. In MongoDB’s case, the application decisions are being made by using the data collected and stored within the application itself. For Snowflake’s use case, the data insights usually span multiple sources. The only impact to MongoDB is that some of the Snowflake-powered data applications might have been served previously by standing up a new MongoDB cluster as the data source.

Serverless

MongoDB introduced a serverless version of Atlas in public preview in July 2021. Customers could access the functionality on request with limits on query volume and database size. Last week, during MongoDB World, the team brought Atlas serverless instances to GA.

Serverless provides a number of advantages over a dedicated MongoDB cluster for developers. Most notably, it minimizes the configuration overhead required. Developers can simply select their desired cloud provider and region, then name the database. This eliminates the need to make additional configuration decisions related to cluster capacity, MongoDB version, back-ups and sharding configuration. Some of these inputs may not be known at creation time with a new application.

MongoDB Investor Session, June 2022

Besides having to make a number of decisions about cluster configuration, some of the more advanced features like sharding can have implications as the application scales. While the developer isn’t stuck with a decision, it can require some rework to make significant changes to a cluster’s configuration or the data structure. With serverless, these constraints don’t apply. The developer can simply treat MongoDB like a black box, reading and writing data without concern for the capacity configuration. MongoDB manages this on the developer’s behalf.

A serverless model can have implications for pricing. Serverless is paid for based on usage. If an application’s usage goes down, the enterprise isn’t continuing to pay the same rate. Of course, the flip side applies, where increased usage incurs more cost. This model works well in cases where usage can be sporadic or unpredictable.

MongoDB Investor Session, June 2022

During the Investor Session, the CPO provided the slide above to illustrate the difference in billing for MongoDB clusters versus serverless. For serverless, billing is based on atomic units that map directly to usage. MongoDB plans to offer volume discounts for higher levels of usage, so the cost/usage line will actually curve downward as usage increases.

I think MongoDB’s launch of serverless represents a significant step forward. It has obvious advantages in term of simplifying configuration overhead and managing costs for certain workload types. It also represents a perception issue, where a serverless offering is viewed as being more advanced. Having a true serverless capability brings MongoDB to parity with other modern database offerings (like Amazon DynamoDB and Aurora). There are also independent globally distributed data storage services, like Fauna, Macrometa and even Cloudflare. These services add the ability to store and distribute data across all global regions automatically. MongoDB has plans to add these data distribution capabilities to its serverless solution in the future.

Edge

When MongoDB talks about enabling edge workloads, they are referring to making state management for connected devices easier. The most common example of a device is a mobile application on an iPhone or Android. However, with IoT, new types of devices are proliferating like fixed sensors, vehicles and manufacturing equipment. These IoT devices are similar to smart phones in a lot of ways – they have a local application runtime and data cache. It is this management of the data cache that adds complexity to device-based applications. That is because the developer has to assume that the device can become disconnected from the network or crash at any time, requiring some of the application state to be stored locally. Having two copies of state incurs the complexity of synchronization.

The flow charts above illustrate some of the edge cases in managing state between devices and synchronization between a remote device and the central application back-end. This logic can create a lot of overhead for developers, that really isn’t value added work. Developer time should be spent building new features into applications, not implementing retry logic and error handling for edge cases.

MongoDB Investor Session, June 2022

MongoDB Atlas Device Sync and their Realm service makes all of this much easier for the developer. It simplifies the data mapping between the device’s cache and the central database. It handles the case of network drops and re-syncing of data. It addresses errors and recovery. These all save the developer time and are handled automatically.

MongoDB Investor Session, June 2022

As part of MongoDB World, the team released a couple more synchronization capabilities. First, they provided granular control over the data that is managed for a particular device. This might vary based on the type of user or context of usage. It avoids a broad stroke of synchronization, which can waste network bandwidth and result in unnecessary storage. Second, they launched a one-way synchronization capability. This is optimized for sensor devices, where all the data is being sent from the device to a centralized data store. This allows data flows to be processed more quickly with less overhead as the device and central service don’t need to manage synchronizing data down to the device.

These new capabilities extend MongoDB’s addressable use cases further into the realm of devices. Asymmetric sync in particular, combined with the new time series and analytics capabilities, helps MongoDB expand their applicability to use cases in Industrial IoT, which represents a growth area. Several of MongoDB’s highlighted customers, like 7-Eleven and Toyota, are finding uses for these capabilities to address data management for fleets of devices.

General Capabilities

In addition to launching new product offerings that target specific workloads, MongoDB added several capabilities that extend the security, management and ease-of-use of the MongoDB platform for all customer workloads.

Queryable Encryption

The highlight of MongoDB’s general capabilities announcements was Queryable Encryption. Given the broader sensitivity around security and data breaches, this release is perfectly timed. While technologists can debate the finer points of the implementation, it provides MongoDB’s marketing team with a claim to have better data security than any other database solution on the market.

Specifically, MongoDB has addressed the last part of end-to-end data encryption within the modern data stack. Most software architectures encrypt data while “in transit” and “at rest”. This means that data is transmitted between systems within a virtual private network over secure channels and when the data lands on disk, it is encrypted. These measures go a long way to protect that data from hackers, who could otherwise sniff out unencrypted data on the network or access it from a compromised machine’s disk.

However, there is a subset of the lifecycle of a data packet where it isn’t fully encrypted. That is the time when the database server is actually processing a query on that data. In those cases, sensitive data can exist unencrypted in the database server’s memory or in the CPU cache. A hacker who has compromised the database server could then access the sensitive data in its raw form.

While other database technologies have addressed this problem, they provide a partial solution. The sensitive data can be encrypted in memory for a subset of data access patterns, usually just the basic look-up. If a developer has a use case that goes outside of basic data retrieval on a single cell, they either risk exposure of the raw data or have to write a lot of extra code to work around the limitation.

With Queryable Encryption, MongoDB has addressed this constraint. Now, developers will have access to a broader set of query patterns (referred to as expressive queries) without having to worry about the security of the data on the database server. This capability will be available in preview mode as part of MongoDB’s 6.0 release. Users of MongoDB will get access to the capability automatically, without having to re-architect their systems.

With the introduction of Queryable Encryption, MongoDB is the only database provider that allows customers to run expressive queries, such as equality (available now in preview) and range, prefix, suffix, substring, and more (coming soon) on fully randomized encrypted data. This is a huge advantage for organizations that need to run expressive queries while also confidently securing their data.

MongoDB Blog Post, June 2022

This capability is backed by several years of research, sourced and vetted by academics. It was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz, who are pioneers in the field of encrypted search. Several years ago, the pair cofounded a searchable encrypted database startup known as Aroki Systems. Aroki collaborated with MongoDB on a database security feature, announced in 2019. In 2021, MongoDB acquired Aroki. Kamara and Moataz continued working on a prototype of a truly searchable encrypted database. Queryable Encryption is the outcome of their work and is unique to MongoDB.

While MongoDB has offered advanced security features previously, the introduction of structured encryption at the field level addresses the last remaining weakness. Most databases have figured out how to secure data at rest or in motion but fail to secure data while in use. MongoDB’s field-level encryption protects data in memory and on disk within the server. It’s the highest level of security for breaches.

With Queryable Encryption, MongoDB announced the first-ever commercially available, structured encryption model. With structured encryption, MongoDB can transform the encrypted field in a cryptographically secure way such that it can store anonymous metadata allowing expressive and efficient queries to be performed. As an example, structured encryption enables a developer to build a bank application that can find transactions using a range of dates or dollar amounts for fraud investigation. As the example below shows, these kinds of queries can be performed without exposing the customer’s SSN on the database side.

MongoDB Blog Post June 2022

In an even more confident move, MongoDB plans to open source the technology behind the new encryption capability. They will share the code, the algorithms and the math with the broader community. This will generate the benefit of peer review and suggested improvements from academia. While opening up the code could inspire competitors, I suspect they would have challenges in implementing an exact copy. Their database architecture may not be well-suited to the technique or they may encounter performance issues. Additionally, portions of the interface into MongoDB’s system may remain obfuscated or proprietary. In sum, I think the benefits of open sourcing outweigh the risks.

If anything, this capability allows MongoDB to claim superior data security over competitive solutions. In today’s environment, just the marketing message provides a big advantage for MongoDB. Additionally, it removes yet another barrier to adoption, many of which MongoDB has been knocking down one-by-one. At first, it wasn’t performant. Then, didn’t support transactions. Then, it wasn’t as secure. All of these have been addressed, allowing MongoDB to continue its trajectory towards addressing the most mission-critical, scalable and security sensitive data workloads.

Cluster-to-Cluster Sync

By default, MongoDB allows users to keep data synchronized within a defined cluster. This provides useful replication across cloud providers and regions, so that an application has fail-over options and geographic distribution of data. It is the predominant use case for synchronization. However, there are situations where users want to synchronize data across different clusters of MongoDB. This situation may exist in several cases:

  • A customer is first migrating from licensed MongoDB to Atlas.
  • Keeping development, test and production environments in sync.
  • Supporting production deployment techniques, like “blue-green” deployments.
  • Creating a dedicated environment for analytics queries.
  • Data locality requirements.
  • Moving data to the edge.

In these cases, MongoBD instances would be deployed into separate clusters. They would have similar data structures, but require a controlled method for updating one cluster from the source of another. Previously, customers would have to manage this synchronization on their own, usually by writing and managing ETL jobs.

MongoDB World Keynote, June 2022

With Cluster-to-Cluster Sync, MongoDB will handle this cross cluster synchronization. It provides continuous unidirectional data synchronization of two MongoDB clusters (source to destination) in the same or hybrid environments. Users have full control of the synchronization process and can monitor the progress of the synchronization in real time.

Cluster-to-Cluster Sync is currently available as a Preview release and will soon be moved to GA. While a seemingly mundane feature, it has been requested by a number of customers. Amadeus (global airline scheduling system) has been eagerly awaiting this feature and mentioned it as part of a testimonial during MongoDB World.  

The ability to leverage Cluster-to-Cluster Synchronization (C2C) for our many existing MongoDB-based travel applications is something we’ve been wanting for a long time and will be a huge benefit to us. It will greatly improve many facets of our software lifecycle, such as supporting “blue/green” deployments, data distribution, cloud migration and further increasing our high levels of geographic availability for our airline customers,” said Sylvain Roy, Senior Vice-President, Technology Platforms & Engineering (TPE), Amadeus.

MongoDB press Release, June 2022

Other Capabilities

The MongoDB team announced several other new capabilities during the user conference. These all provide incremental improvements to the usability of the MongoDB platform and continue to chip away at feature differences with comparable legacy solutions.

  • MongoDB Atlas Data API to GA. This allows developers to access all aspects of their MongoDB instance through a serverless, secure API interface over HTTPS. Introduced in preview mode in November, Atlas customers have already been adopting it for a variety of use cases. These include connecting to IoT environments where MongoDB drivers aren’t supported, or to integrate Atlas with other third-party cloud services, such as AWS Lambda, Microsoft Power Apps and edge-based web services such as Vercel and Cloudflare.
  • MongoDB Atlas CLI (Command Line Interface). The Atlas CLI supplements the existing Admin GUI. It provides a command line tool for interacting with an Atlas cluster. Having a CLI makes these types of administrative interactions programmable. Developers can manually run commands directly at the prompt, or create scripts to string together commands and run them on a schedule. Examples of automation include provisioning users, establishing connectivity, controlling access, loading data and scheduling back-ups – basically everything in the GUI except some billing items. As part of the keynote introduction, MongoDB’s CTO mentioned that a few large accounting firms are already using this capability.

Investor Take-aways

MongoDB’s announcements during the MongoDB World user conference added a number of new capabilities to the platform. These bring MongoDB closer to the vision of providing a general-purpose database platform for developers to address an increasing number of use cases. As leadership focuses on their customer expansion goal to capture more application workloads, MongoDB’s platform development continues to address objections around suitability, scale, ease of use and security. These improvements allow MongoDB’s advantages in data model simplicity and developer productivity to tip the scale towards enterprise adoption.

During MongoDB World, the team invited technology leaders from several customer companies to speak about their use of MongoDB. What struck me is the evolution of MongoDB to becoming the center of these companies’ data infrastructure. MongoDB is no long being relegated to applications of convenience. It is becoming core to massive software infrastructure operations where expectations for uptime, scale, security and usability are extreme.

Identity provider Auth0 (acquired by Okta) presented during the product keynote, represented by their CPO and VP of Engineering. Auth0 provides the identity service for many consumer Internet applications, where users need to authenticate themselves in order to access personalized services securely. They have over 35,000 customer companies, which provide digital services to millions of end consumers. Auth0’s infrastructure had been running on hundreds of self-hosted MongoDB clusters and they just completed a migration to Atlas. I can’t conceive of a more demanding environment to prove the suitability of MongoDB to run at “web scale” (a reference to a popular animated meme about MongoDB that circulated years ago). Thousands of Internet applications providing authentication for millions of users who won’t tolerate latency or downtime.

And this is not just Auth0. Technology leaders from Toyota, Verizon, Wells Fargo, Mizuho, Boots, Vodafone, Forbes and Avalera discussed how MongoDB is central to their database operations. These examples reinforce that fact that MongoDB adoption isn’t dependent on growth exclusively from new Internet plays like Coinbase (which is also a customer). It is being adopted by enterprise stalwarts in finance, manufacturing, telecom, retail and publishing.

As economic uncertainty prevails, this kind of broad customer adoption will provide some stability for MongoDB’s growth. Granted, in their Q1 report, leadership has already cautioned that expansion will slow with their small to medium sized customers for the remainder of 2022. They expect less impact to enterprise customers. At the foundation of the IT spending value chain, application databases are at the least risk of being shut off.

While valuations will compress, a challenging macro environment may even provide some tailwinds for MongoDB. First, it’s likely that enterprises will look to optimize their IT spending. One strategy would be to consolidate the number of database vendors, replacing all the point solutions that have proliferated over the last several years. While these point database solutions provided some performance and feature advantages, the overhead of maintaining multiple billing contracts, duplicate code, cognitive switching and myriad API interfaces looks much less appealing in a constrained spending environment. CFO’s can quickly break down technical arguments when savings can be realized.

MongoDB’s strategy of expanding their data platform’s capabilities to address more application workloads is timely. The announcements at MongoDB World bring new workloads into focus, providing an argument to absorb time series, search, mobile sync and in-app analytics onto the MongoDB platform. The engineering team realizes the benefits of a single bill (eligible for volume discounts), one programmatic interface, a single system for DevOps to manage and less time interacting with multiple vendors.

Additionally, the dirty little secret of macro pullback and evaporation of VC funding is that competitive pressure from private start-ups will ease. Flush with VC money and little accountability for profitability, these start-ups create distractions and undercut deals. They make sales for incumbents more costly and split market share. However, without the VC money faucet flowing, these start-ups have to pull back, hire less (or layoff employees), reduce marketing or even fold. There is nothing like a down round to kill morale. With $1.8B in cash and roughly break-even on a FCF basis, MongoDB can ride out the volatility and emerge in a more solidified competitive position.

Obviously, I can’t ignore the macro environment. It will affect valuations of high multiple software companies and may even slow their growth in a recession. I can’t do much about that, except to ensure I have enough cash on hand. Investors would be wise to consider portfolio management, balancing short term and long term needs. Otherwise, I will focus on the companies that stand to emerge from the downturn with more market share, better competitive position and great product fit.

From that perspective, I think the announcements made during MongoDB World position the company well. The platform has inflected from interesting toy database to critical infrastructure. The number of use cases it can address is expanding. Hyperscalers have largely conceded their portion of the transactional database market to MongoDB and now actively co-sell the solution. AWS and GCP were premier sponsors of MongoDB World – unimaginable just 3 years ago. These factors all combine to ensure that MongoDB’s market share expansion will continue, regardless of how the macro environment plays out.

NOTE: This article does not represent investment advice and is solely the author’s opinion for managing his own investment portfolio. Readers are expected to perform their own due diligence before making investment decisions. Please see the Disclaimer for more detail.

1 Comment

  1. Michael Orwin

    Thanks for the article and audio summary. Is Queryable Encryption the same as homomorphic encryption applied to a database, and is it as strong as fully homomorphic encryption?