Examining Application Data Platforms – Part 1

July 29, 2021 / poffringa

At their recent user conference, MongoDB positioned themselves as the first “Application Data Platform”. Their vision is to address all data storage workloads that developers typically need to build a modern, scalable software application. The scope of this goes far beyond their previous niche as a document-oriented database, to span caches, search indices, mobile app interfaces and even basic analytics for data visualizations. The premise is that developers prefer to focus on building features, versus worrying about data storage infrastructure. As cloud data solutions proliferate and IT roles converge, MongoDB wants to reduce overhead for developers through a unified data storage platform. Their goal is to increase developer productivity by eliminating the “tax on innovation”.

This posture sets a high bar and obviously encroaches on space occupied by other database vendors. At the same time, the market may be ready for this simplification, following a decade of evolving software architectures, infrastructure abstraction behind cloud services, the consolidation of engineering roles and the ascendency of the developer. It is easy to appreciate the complexity created by an over-abundance of data storage options, where specialization may be based on optimizations that are no longer necessary. While technologists could make nuanced arguments about performance advantages of certain database architectures, at some point inexpensive compute, horizontal scalability and unlimited storage render those arguments moot.

The evolution of programming languages provides a useful parallel. High performance, statically typed, compiled programming languages like C and Java were required to build scalable back-ends for the first wave of web sites. Over time, application server architectures added capacity through horizontal scaling and monolithic applications could be partitioned with micro-services. Developer-friendly, interpreted languages like PHP, Ruby and JavaScript could be applied to many workloads. Even some of the largest consumer web sites still use these languages today.

This is because companies realized that the resource constraint was not server hardware, but rather developer cycles. Hosting web applications became a commodity service, and developer productivity provided differentiation. As keeping a web site online and performant became table stakes, feature release velocity drove competitive advantage. The industry shifted attention to constructing elaborate development frameworks, which wrapped programming languages to provide every function that a developer might need for coding web sites and mobile apps.

MongoDB is applying a similar tactic to data storage. They recognize that developers are encumbered by many variants of database types, at the same time that they are feeling maximum pressure to deliver application features. The industry may have reached a historical sweet spot, where the market is ready for a one-size-fits-all data platform that can address all software application workloads. Something that runs transparently across all cloud providers, is easy to provision, scales automatically and enforces reliable security. At the very least, it is a bold move.

In this post, I will explore this new positioning from MongoDB and the product suite supporting it. This will include coverage of the recent MongoDB.live conference and a slew of new product enhancements. Given that the Application Data Platform could disrupt other transactional data storage providers, I will give updates on a few other leading independent players and contrast their approaches. For the related publicly traded companies, I will highlight recent quarterly results. I am dividing this post into two parts. Part one will provide a background on transactional data storage and cover MongoDB’s progress. Part two will explore how these moves relate to popular database alternatives.

For more background, readers can check out my blog post from January on Architectures for Transactional Data Storage. Peer analyst Muji at Hhhypergrowth also published a broader piece on Application Development that serves as a useful primer.

Background

Over the past two decades, we have witnessed an explosion in the variety of data storage solutions. This has followed the increasing complexity and scale of Internet applications. In the early days, applications could be created as “monoliths“, which generally implied a single code base, that ran in one data center and stored data in one database. That primary database was usually relational, queried through a language called SQL (Structured Query Language).

As application usage grew, scalability moved to the forefront. Having a large web site crash due to too much load on the primary database was common. Databases could be scaled for reads to a point with replicated, read-only copies, but this approach still had limits for writes and expensive queries that spanned multiple tables. To mitigate this, alternatives emerged to cache data results, separate from the database. Applications were instructed to check a cache first for a desired data value before querying the database. If there was a cache miss, then the cache would be updated with the value retrieved from the database. These caches could relieve read load from the database by a factor of 10x or more. Similarly, writes for transient data, like user sessions, could be offloaded to cache. Cache implementations usually have simple data structures like a key-value pair. Popular examples of key-value stores used for application data caching are Redis and Memcached.

As Internet applications expanded to address more features, search became a common use case. The initial search function was text-based, reflecting the familiar search box on most content web sites. More complicated search use cases later emerged, like faceted search on an e-commerce site. Search “facets” can represent any set of metadata about an item that has multiple fixed values, like size, rating, type, color, etc. Faceted search supports the rapid calculation of a result set based on any combination of those fixed facets. As many Internet experiences have a geographic context, use cases emerged for geospatial search, primarily involving constraining a set of results by physical proximity.

Search use cases like these are not easily addressed by a relational database. Technologists invented a separate solution to address search, originally based on the open source search technology Lucene. Lucene implements a search engine library that can interact with a data format suitable for rapid retrieval of search queries (technically a reverse index). Several open source projects then built stand-alone search servers around Lucene. The most popular examples are Solr and Elasticsearch. The maintainers of these projects organized into the commercial entities Lucidworks and Elastic (ESTC).

Solr Indexing Architecture, Lucidworks Documentation

A developer typically interacts with a search server separate from the relational database, in order to address search use cases. This is accomplished through network calls from the application server to the search server, in a similar way that the database server is accessed. Maintaining a separate set of servers for search introduces additional system administration overhead. In addition to setting up and maintaining the search servers, data from the primary database has to continuously be loaded into the search server in order to keep its index up to date.

As mobile applications proliferated, they required another approach to data access. Apps on iOS and Android represent stand-alone applications that can store state locally. This state can serve the resident app directly, improving responsiveness and handling a network disconnect. However, this local state needs to be kept in synch globally with the primary database and other users. Updates from any mobile client need to be pushed to the primary database and then all the connected mobile apps. Solutions to manage this synchronization on the client have been incorporated into app development frameworks, but these still need a cloud-based data store to interact with. Google created a popular data storage solution, called Firebase, to abstract out this data synchronization for developers. However, as with search indexes, this creates another data interface and infrastructure component.

As developers kept increasing the complexity and scale of applications, the relational data model was shoehorned into addressing new data access patterns. This provided an opening for other special-purpose databases to gain adoption, like graph databases, wide column stores, document-oriented databases and time series databases. These solutions were optimized for a particular type of data structure and access pattern, generating a performance benefit over a standard relational model. However, these special-purpose databases usually supplemented the primary data store, requiring an ETL job to exchange data with the primary database.

Cloud Hosting

As cloud hosting gained adoption, databases moved out of self-managed enterprise data centers onto infrastructure provided by cloud vendors. As many of these database solutions were open source projects, engineering teams could install the database software onto compute and storage provided by the cloud vendor and manage the database themselves. This still carried the overhead of operating the database, but at least removed the requirement to provision data center space, network connectivity and hardware.

Not wanting to miss a sales opportunity, the cloud vendors determined that they could generate revenue by offering services to provision and operate these databases for their customers. The customer engineering team still made decisions about the size of the database, basic configuration and choices for clustering. After configuration, the database service was launched through automation. Common maintenance functions like patching and back-ups were handled by the cloud vendor. An example of this type of service is Amazon RDS. Within RDS, the customer can choose from several data engines to run, including MySQL, PostgreSQL, MariaDB and even commercial providers like Oracle and MS SQL Server.

The commercial entities behind most open source database projects went through a similar evolution. They first offered versions of their database software that enterprise customers could license (often called the enterprise edition) and install in their hosting environment. These enterprise versions of the software often included proprietary, add-on features not available in the free version. This generated a source of revenue for the open source project sponsor companies.

However, cloud vendors realized that they were not prohibited from hosting the non-proprietary distribution of these open source projects and charging their customers for that service. This created confusion in the marketplace, as customers thought they were licensing a product from the open source sponsor. It also cut into revenue for the sponsoring company. While legal, this approach by the cloud vendors felt unfair. After all, the project sponsors were often paying the salaries of the open source contributors.

The open source project vendors countered these practices with various licensing changes, which generally prohibited the cloud vendors from selling a hosted version of the open source project or requiring the purchase of a license to do so. Independent public companies like MongoDB, Elastic and Confluent all went through these kinds of licensing machinations. After making the change, cloud vendor offerings had to be pinned to an older version of the open source project. Eventually, through subsequent feature improvements, the independent database offerings surpassed the capabilities of the cloud vendor knockoffs.

The independent database solutions then achieved their big breakthrough when they created cloud-hosted versions of their database products. The independent providers offered a similar service as the cloud vendors, but based on their enhanced proprietary software package. They made it very easy for developers to provision a database on their cloud vendor of choice. The provider would handle all of the management tasks for the instance, including maintenance services like upgrades and back-ups. Examples of these cloud services are MongoDB Atlas and Elastic Cloud. In both cases, the services are available globally on AWS, Azure and GCP.

The cloud versions of these popular data storage solutions have become very appealing for engineering organizations for a few reasons. First, due to the licensing constraint mentioned earlier, the capabilities and features of the independent offering have exceeded those of the data storage services offered by the cloud vendors. Second, their implementations are cloud neutral, meaning the software and data access patterns are the same regardless of which cloud vendor it runs on. That provides the enterprise with optionality, as they wouldn’t need to change the application code in order to run on a different cloud vendor. Third, support is much better from the independent provider, particularly where most of the open source project committers work for the independent.

Recently, data storage solutions have evolved further to offer solutions that run across cloud vendors (multi-cloud) and provide consistency from multiple geographic locations (data distribution). A multi-cloud capability allows the enterprise to run a database cluster with nodes that are hosted on more than one cloud vendor. These nodes can share data, effectively making data transfer between cloud installations seamless. This represents a big advantage for enterprises that are trying to avoid cloud vendor lock-in.

Global data distribution provides performance advantages. The idea is that a globally distributed application could request data from the database node that is in closest geographic proximity to the user. This reduces transit time to a central database that may be on another continent. Further, data residency requirements are emerging in many countries. Having the ability to constrain user data by geographic location offers a straightforward way for enterprise engineering teams to automatically support changing data sovereignty rules.

Fast forward to today, software engineering teams are presented with multiple options for hosting their data. The best solutions offer a rich set of features, easy cloud-based provisioning, automated scaling and global reach. While the hyperscalers initially mopped up most of the database service revenue during the initial migration of applications to the cloud, the market is opening up for multiple independent database solution providers. Specialization and cloud neutrality are emerging as significant competitive advantages. We should also keep in mind that independent database usage is still a positive for cloud vendors, who generate revenue from the underlying consumption of compute, storage and network services.

Evolving Developer Persona

In parallel to the expansion of data storage options over the last decade, the organizational roles that build and maintain software applications have also undergone significant changes. In the past, roles within an engineering team were very focused on one portion of the infrastructure or software discipline. Teams were made up of specialists, like front-end versus back-end engineers, mobile (both Android and iOS) developers and dedicated service engineers, like search. Functions associated with writing code were separated from running infrastructure, with operational roles like Database Administrator, Network Engineer and SysAdmin.

The emergence of robust development frameworks started to abstract away the underlying complexity of working with client devices and application services. This allowed a software developer to be productive across more functions. The front-end developer could work on the back-end, and vice versa. All purpose Javascript frameworks like React allowed the same code to be used for a web browser and mobile apps. Specialized developer roles began consolidating and teams gravitated towards a mode in which a single developer could write code across the full infrastructure stack. This let to the emergence of the term “full stack developer“.

For databases, this meant developers could focus on the functionality of their applications without having to worry about the details of a database implementation. They still had to consider their data model and access patterns, but frameworks handled the lower layer interfaces to read and write data. These abstractions even caused many developers to question the inherent data structure, preferring database storage solutions that more closely mirrored how their applications managed state. This gave rise to database structures modeled around objects versus tables, with document-oriented solutions like MongoDB becoming popular.

On the infrastructure side, the operation of the servers was outsourced to the cloud-service provider. Engineering teams needed fewer (or none at all) network engineers, database administrators and SysAdmins. These roles shifted to the cloud vendors. Engineering teams in cloud native start-ups could consist primarily of full stack developers. With management of servers, databases and network connectivity shifted to the cloud-service providers, start-up teams could focus on just building application features.

Developers tend to gravitate towards solutions that deliver simple data access interfaces, as long as they can scale reliably behind the scenes. They don’t get hung up on the nuances of database engines and are satisfied to let the data storage provider handle the administration. If one solution can address multiple use cases consistently, that saves them the trouble of learning an additional interface. If the provider works seamlessly across cloud vendors, secures data in compliance with industry standards and automatically distributes it based on usage patterns, then they have even less to worry about. These factors all favor an all-encompassing, developer-friendly database platform.

Analytical Workloads

Up to this point, I have been discussing transactional workloads that serve user-facing software applications. The industry term for this type of query load is Online Transaction Processing (OLTP). This is contrasted with analytical workloads, which are referred to as Online Analytical Processing (OLAP). These are typically found on data warehouses. The two types of data processing generally vary along these lines:

Concurrent Users. OLTP data stores should handle a very high number of users at once, without performance degradation. OLAP data stores usually serve a handful of users at one time. If performance slows down a bit with more users, that is okay.
Response Times. Average response time for OLTP databases is expected to be very low, in the millisecond range. Response times for OLAP queries can be hours or days, depending on the job. Because of this difference, OLTP databases are used for synchronous requests, where the user is actively waiting on a response and is blocked from further action. OLAP databases can treat requests as being asynchronous, where the user will go do something else before expecting a response.
Type of User. OLTP databases are typically accessed by application developers. OLAP databases are used by data analysts and data scientists.
Type of Client. Due to expectations for concurrent users and response times, OLTP databases are accessed by real-time user applications, like web sites, mobile apps and micro-services. OLAP databases, on the other hand, usually provide the data source for reporting, business intelligence and machine learning jobs. They also ingest data from many other data sources, including application databases.

Examples of OLTP databases are all the application data stores discussed thus far in this post. OLAP databases have traditionally been referred to as data warehouses, with examples like Teradata, Oracle, and IBM Netezza. Cloud vendors also offer popular data warehousing solutions, like Amazon Redshift, Google BigQuery and Azure Synapse Analytics. Of course, the most visible independent data warehouse provider is Snowflake.

While historically distinct, these dividing lines between OLTP and OLAP data workloads are starting to blur. Some transactional databases have added capabilities to serve basic analytics functions. An example is MongoDB, with their Data Lake offering to provide a single interface to combine transactional and offline data sources into a single response. Similarly, Snowflake refers to their offering broadly as a “cloud data platform” and lists data applications as one of the workloads that can be addressed. For data application building, they appeal to developers and offer a runtime in their new Snowpark environment with support for popular programming languages. Some technologists are even making a distinction between hot and cold analytics, implying that hot analytics serve real-time data access workloads.

For now, I don’t see Snowflake replacing MySQL or PostgreSQL as the primary transactional data store for applications. Nor do I see MongoDB or CockroachDB replacing the data warehouse. However, as user-facing applications increasingly incorporate data rich views powered by analytics, there is a growing demand for a data storage solution that can process multi-dimensional data queries, but do so with reasonable response times for many concurrent users.

One inherent advantage, or at least difference, in architectures between traditional OLTP databases and their OLAP counterparts has to do with data distribution. Transactional database providers are more quickly evolving to support the ability to serve data concurrently from multiple geographic locations. They can deploy in distributed configurations in which nodes are located across the globe. This way, application requests can be directed to the node with the closest geographic proximity. Data updates distribute rapidly across the cluster with adjustable expectations for consistency.

Data warehouses and advanced analytics processing systems are generally designed to deploy into a single location. By their gravity, they are inherently not distributed. This may be a future product direction for the more advanced providers, particularly as they evolve into a multi-cloud data platform. Another consideration is that the output of machine learning jobs is often pushed back up to the application interface through ETL jobs that feed the transactional database. These types of ML models for inference processing could be distributed to be in proximity to edge compute locations.

I will leave a detailed analysis of OLAP solutions and Snowflake for a future post. In the meantime, peer analyst Muji at Hhhypergrowth has published a detailed overview of Snowflake. This is a must-read for anyone interested in SNOW and the analytics landscape in general.

The Application Data Platform

Developers have a myriad of data storage options to back their applications. They can choose from multiple databases, purpose-built for each access pattern, including relational, graph, object, geospatial, time series, etc. Similarly, developers have other solutions to service search queries, mobile app data synch and analytics. For all of these data stores, developers also have to consider security standards, cloud hosting and data residency requirements.

An argument can be made that data storage solutions have swung too far towards specialization. In the same way that developer roles consolidated into a single “full stack developer”, it would be convenient if one database platform could address all transactional workloads. That would allow developers to learn and interact with a single, abstracted interface for reading and writing all types of data. Similarly, with a cloud-hosted, serverless database solution, the development team wouldn’t need to worry about provisioning, scaling or maintenance of each point solution.

At their annual developer event in mid-July, called MongoDB.live, MongoDB proposed just that. They essentially recast the MongoDB product suite as the Application Data Platform. This is made up of many components, most of which MongoDB has released over the last two years. They have been busy filling feature gaps and rounding out the core infrastructure. MongoDB leadership made an argument that the Application Data Platform provides all the tooling needed by developers to address the data requirements for building an end-to-end modern software application.

As part of the investor session at MongoDB.live, both the CPO and CTO discussed the data storage engines that a typical enterprise will support in order to deliver a modern software application. In order to deal with limitations of relational databases and address unique workloads, special purpose non-relational databases are set up, like document-oriented, graph, geospatial, object and key-value stores. These extend the breadth of use cases that can be addressed by the application, but also create compounding overhead in system administration and maintenance. As data stores are disconnected from the primary database, most of these solutions require their own ETL process to keep in synch.

Micro-services didn’t necessarily lessen this burden, and arguably contributed to it. This is because a service will typically run in its own environment, with a separate codebase and data store. Micro-services are useful because they allow large development teams to sub-divide organizationally around each micro-service. The independence of services enables the selection of special-purpose data storage solutions that cater to the query demands of each micro-service. Teams then incur the overhead of learning each data storage engine and managing the system maintenance of each.

If an application needs to support text-based search, development teams will often stand-up a search server that wraps Lucene. This too requires a process to update the search index from the primary database and involves separate architectural planning for master and slave nodes. If the organization offers mobile apps to their customers, they will have to contend with data synchronization from the origin database to all the local data stores on each mobile device. This is generally handled with separate mobile specific data infrastructure and push services. If the application includes data visualizations, they may stand up another database to serve a basic analytics workload. This might be separate from the data warehouse, as the application would query it directly.

*MongoDB.live Investor Session, July 2021*

In the diagram above, the MongoDB team modeled out a typical system architecture for a large scale, user-facing application. They represented all the types of data stores discussed and inserted a popular solution for each one. This provides a reasonable representation of what would be found at many large-scale digital enterprises. It also reflects the data footprint that the MongoDB solutions engineering team encountered at a Fortune 500 insurance company that is a customer.

The MongoDB leadership team contends that the Application Data Platform can replace all of this. MongoDB itself can be used for all the special-purpose workloads, including graph, time series, key-value, object/JSON and even handles relational constructs. It can address both text-based and faceted search through Atlas Search. For real-time analytics with workload isolation and native data visualization, developers can leverage Charts pulling from the core data store. They can run federated queries across the operational data stores and cloud object storage (like S3) using Atlas Data Lake. Data requirements for mobile apps (both iOS and Android) can be serviced through Realm and Sync.

On the infrastructure side, developers get the capabilities required to meet security, scalability and portability requirements. MongoDB supports multi-document transactions that are ACID compliant. It includes industry leading data privacy controls with client-side field level encryption. The data store can be scaled fluidly through live sharding, which can be kicked off behind the scenes without disrupting real-time data delivery. To further reduce maintenance, the runtime can be deployed in a serverless mode on MongoDB Atlas.

A recent addition that is worth calling out separately is the multi-cloud capability of MongoDB Atlas. This allows the same data store to be addressed from applications running on multiple cloud vendors, reaching 81 global regions across GCP, AWS and Azure. MongoDB is one of few database vendors that supports this capability. It allows an enterprise engineering team to make use of the best technology from each hyperscaler, like hosting the primary application on AWS, but running AI/ML analysis on GCP. MongoDB Atlas multi-cloud prevents lock-in on one hyperscaler and provides a consistent data access API across all hosting providers. Development teams don’t need to learn the interface for the proprietary data storage solutions on AWS, Azure and GCP separately.

I think the strategy and market positioning of the Application Data Platform is forward-thinking and timely by the MongoDB team. Assuming they can pull it off, it would provide a fairly compelling value proposition for any large engineering team. Developers would gain substantial productivity improvements by interacting with one data storage solution for all application related use cases. MongoDB provides a unified developer experience, a consistent operational and security model, automated data movement behind the scenes and reduced data duplication.

With this positioning, MongoDB is expanding the scope of their addressable market significantly. Previously, the argument for market share revolved around how much processing their document-oriented database could take from traditional relational databases. Now, the scope encompasses all front-facing application data stores.

It will be interesting to see how the market reacts to this. As an old-school CTO familiar with purpose-built databases, I would be skeptical initially. All of those databases emerged for a reason. However, the vision is appealing. Combined with other industry trends in developer role convergence, cloud migration and release velocity, it is timely. I think many CTO’s would give the solution the benefit of the doubt and at least run an experiment, given the inherent advantages if it works. Additionally, any start-up planning their application data architecture could get started with the MongoDB platform and then stretch its capabilities as far as possible.

In some ways, this strategy mirrors what Snowflake (SNOW) is doing for analytical workloads. Their positioning is that enterprises can use Snowflake as the data warehouse solution and run all types of analytics, AI/ML, ETL and other functions on Snowflake’s data and compute infrastructure. Snowflake is cloud-neutral, available on all the hyperscalers. If Snowflake is now the go-to solution for OLAP workloads, MongoDB is trying to take on everything associated with OLTP.

For more perspective, it is worth providing some commentary on MongoDB’s CTO, Mark Porter. As investors will recall, Mark assumed the CTO role from co-founder Eliott Horowitz a year ago. Mark had been a member of the MongoDB Board of Directors. Mark was eager to assume an operational role at MongoDB. As head of technology, he felt he could cap off a long career building and using databases. The data platform his team is now building at MongoDB represents the pinnacle of his career. The MongoDB solution for him is the natural evolution of data storage, beyond the strictly relational model that dominated for the past half century.

During the Investor Session at MongoDB.live, Mark talked about the MongoDB data platform and new features that have been launched. Echoing the product vision discussed by the CPO, Mark went on to offer some controversial perspective of his own on the direction of the application data market. He said that “single model databases, like relational, are becoming niche”. In fact, he claims that new computer science graduates don’t want to work on relational databases, and that relational skills are becoming the new COBOL (an outdated programming language). Wow.

Mark’s perspective is that a true data platform going forward needs to offer multiple methods of accessing data, including relational, but also key-value, time series, object, graph and geospatial. The MongoDB Application Data Platform is the first solution, in his opinion, that can address all of the use cases needed by developers. It also models data in object format, which most closely aligns with how data is represented within applications. This removes the translation layer (ORM) that has been traditionally maintained between an application using OOP design principals and the relational database.

While I would normally chalk this up to pumping his own company, Mark spent most of his career leading relational database development and usage at some of the most influential database providers and consumers in the space. He has been on both sides of the market, as a vendor and customer. If anyone understands the underpinnings of the relational model and the inherent advantages/disadvantages relative to others, it would be him. With that background, he could have taken any technology leadership job and chose MongoDB.

Single model databases, including relational, are becoming more and more niche in most of the customers I talk to personally. With general purpose, flexible databases, like MongoDB, becoming the de facto standard for modern enterprises.
Mark Porter, CTO, MongoDB

Developer Mindshare

Developers have increasing influence over technology selections within their engineering organization. This hasn’t always been the case. In the past, technology selection was delegated to architects who made tool choices, but didn’t necessarily have to work with them every day. A new CTO or VP Eng would often bring their favored stack into an organization, along with an army of vendors and consultants to support it.

Cloud based hosting and services have significantly lowered the barrier to provisioning new technologies. It is very easy for a developer to open a cloud account, provision a simple set up and produce a proof of concept. This kind of working proof quickly reduced the primacy of the architect or technology leader, because for most problems, working code wins. Technology selection couldn’t be obfuscated behind vendor Powerpoint presentations or elaborate architecture diagrams.

Additionally, as enterprises scramble to increase their software release velocity, developer productivity has been pushed to the forefront. While this created more pressure on development teams, it also empowered them. Agile retrospectives encouraged developers to point out process or tooling changes that would make them more productive for the next sprint. Scrum masters and product leads would greenlight technology explorations as tech debt that might drive future developer productivity. As the squeaky wheel, developers could get a lot of grease. This focus applied to the software tooling and infrastructure used by the team, including database choices.

Further, due to the consolidation of the developer role and outsourcing of infrastructure maintenance to cloud vendors, there are fewer individuals in the engineering organization with an alternate view. The developer is increasingly tapped to represent considerations for database management (previously the DBA) and system operations (now called DevOps). Even in large organizations that retain central architecture committees, senior engineers are often at the table.

Lastly, inexpensive and scalable cloud based infrastructure allowed every new start-up to begin creating software applications for the masses with a very low upfront investment. Because compute, storage and networking were readily available from the cloud vendors, the first couple of hires could be developers. It would be much further down the line before the engineering team could support specialists.

This implied that the technology choices made by the initial development team are often long-lasting. Once a feature-rich application has been built on a particular software stack, inertia keeps those choices in place for a long time. The technology organization may go to great lengths to keep an infrastructure choice in place. Granted, the stack will eventually evolve, but even large digital natives still retain the technology choices of their founders. Examples are Shopify with Ruby on Rails and Facebook with PHP.

With that background, we can learn a lot about the potential for database market share simply by understanding developer preferences. The popular developer site Stack Overflow conducts an annual survey in which they ask 65,000 developers about their preferences across a number of technology types. Included is input on programming languages, frameworks, tools and platforms. Transactional data storage solutions are covered in the Databases category.

Stack Overflow, Annual Developer Survey, Feb 2020, Most Wanted Database

The most recent survey was conducted in February 2020. The 2021 Survey is being assembled now and should be published in another month. Developers ranked MongoDB as the most “wanted” database for the fourth year in a row. This reflects the preferences of developers who are not working with the database, but have expressed a desire to use it. Elasticsearch, which addresses a narrower set of use cases (but is the only search solution on the list), is ranked third. Data storage products from the hyperscalers and legacy software vendors are much further down the list.

Besides Stack Overflow, another indicator of data storage engine usage across all categories is provided by DB-Engines. They maintain a ranking of popularity of solutions on their web site, with an overall score and an indication of change over time. This is constructed from a combination of inputs pulled from various public forums, discussion boards, web sites and job postings, which are all heavily developer influenced. Like the Stack Overflow survey, this provides a reasonable indicator of overall usage and change over time.

DB-Engines Ranking, Top 10 List, July 2021

The graphic above shows the list of the top 10 most popular database solutions as of July 2021. Of the ten listed, MongoDB had the largest year/year increase in their ranking and is closing the gap on PostgreSQL. Of the solutions ranked higher than MongoDB, MySQL and PostgreSQL are open source, relational databases without a real commercial effort behind them. Oracle tops the list due to their historical dominance, but is dropping rapidly. The same can be said for MS SQL Server.

This puts MongoDB in a great position as a leading transactional database solution provider, backed by a publicly traded company. The other players to watch are Redis, backed by Redis Labs, which increased their ranking by two positions in the last year. Elasticsearch is holding its own as well, with a gradual increase in popularity since the snapshot from January 2021 (below).

DB-Engines Ranking, Top 10 List, Jan 2021

MongoDB’s developer evangelism effort is behind much of this traction. Over time, they have been very successful inserting MongoDB into popular development frameworks. This is best evidenced by examining the alphabet soup of framework acronyms. For a long time, one of the most popular developer frameworks was LAMP, which stands for Linux, Apache, MySQL and PHP. For many years, this was the go-to stack for start-ups that wanted to quickly build a software product without the overhead of some of the heavier, enterprise frameworks (like Java and Microsoft .NET).

With the emergence of JavaScript support on the server-side (through Node.js), software stacks evolved to allow JavaScript to be run on both the client and server-side. This provided development teams with the efficiency of a single language. Given that JSON-modeled data structures are part of JavaScript programming, it was natural that a JSON-friendly, document-oriented database would be utilized as the data storage engine. MongoDB was a perfect fit, given its open source distribution and popularity.

New software stacks began to adopt MongoDB as the data store and named their acronym stacks with an “M” for MongoDB. The first was MEAN – MongoDB, Express, Angular and Node.js. Alternatives emerged for client-side development, creating other derivatives including MERN (React), MEEN (Ember) and MEVN (Vue). These frameworks were popularized amongst the developer community and further memorialized in many “how to” web development books. A search on Amazon for “MEAN stack web development” yields many results, several of which were recently published.

In addition to having MongoDB evangelized through framework acronyms and books, it has become a popular database taught as part of coding bootcamp curriculums. Coding bootcamps were started in 2011 with Code Academy. By 2017, there were 95 full-time coding bootcamps in the U.S., and more overseas. Bootcamp sessions typically last between 8 to 36 weeks, where the student attends classes each day all week, much like a job. The curriculum is restricted to just coding and producing project work. The idea behind a coding bootcamp is to provide the student with a condensed and accelerated learning program to prepare them for a professional career as a software engineer. This contrasts with a traditional computer science degree from a four-year university, in which time is spent taking liberal arts classes and participating in other activities. I won’t argue the merits of each. Suffice it to say that coding bootcamps and other compressed certificate programs have become a major source of new developers entering the workforce.

By examining the curriculums of these coding bootcamps, we can get insight into the database technologies being taught. Iron Hack is a popular option, with a fully online program or in-person training at locations across 8 countries. Out of a 9 week training program, a third (weeks 4-6) of the course covers the back-end. In this case, the technologies learned consist of NodeJS, Express and MongoDB. The next 3 weeks (weeks 7-9) then introduce React and focus on building an application from scratch using the MERN (MongoDB, Express, React and NodeJS) framework.

A second example is Hack Reactor. They provide a list of student projects, along with the technologies used to build them. Of the projects that list a database, many are implemented with MongoDB. The others are MySQL and PostgreSQL, which are common relational database solutions. Coding Dojo organizes their curriculum around what they claim are the 5 most popular programming languages and frameworks. This list consists of Java, Python, .NET, Ruby and MERN. Of course, MERN expressly dictates that the database is MongoDB.

These approaches to developer evangelism are resonating. MongoDB has had over 175M cumulative downloads of the open source project to date. This growth is continuing at scale, as 70M of those downloads have occurred in the last 12 months. MongoDB University, their free online training program, has had more than 1.5 million developer registrations.

Recent Product Announcements

The MongoDB team has been on a product development tear this year. This was highlighted at MongoDB.live. In addition to introducing the idea of the Application Data Platform, the MongoDB product team announced a slew of other product improvements. Several of these bolster the argument that the data platform can meet all use cases for developers in building modern software applications.

MongoDB Application Data Platform, *MongoDB.live Investor Session, July 2021*

The major product announcements made during their annual developer event are listed below:

Native Time Series. This represents a new data type that is supported within the MongoDB data store. Time series data typically represents a large set of data values each attributable to a point in time. This data is often generated by IoT sensors or other streaming sources. Developers can now ingest time series data into MongoDB, process it and join the results with other types. There is no longer a need to set up a separate database for this. The solution comes with useful aggregate functions to process data, like derivatives and integrals over time windows.
Live Resharding. This feature allows the operator to change the MongoDB cluster shard key on-demand with little impact to live querying. All changes are performed in the background, without the need to schedule downtime. This represents a huge improvement for scalability, as sharding is the primary mechanism to expand database cluster capacity. Having to change the shard key in the past was a major effort, creating an adoption barrier for large datasets.
Enhanced Security. MongoDB now supports client-side field level encryption on all public clouds. This even works across multi-cloud deployments, which is inherently complex due to the different underlying storage systems in each cloud provider. Users can make changes to security settings online without downtime, allowing them to respond to security events in real-time.
Atlas for Government. MongoDB achieved FedRamp ready status for its new Atlas for Government offering. U.S. government agencies can take advantage of this by moving their workloads to a government specific cloud environment. This service is available on AWS GovCloud and in AWS US East/West regions. This accomplishment positions MongoDB for significant growth with government agencies. Besides federal spend, state and local governments generally draft off of FedRamp status in making their own technology selections.
Atlas Search Additions. MongoDB search support is moving beyond just text search. They have added capabilities to support typical e-commerce use cases, including synonyms (product name substitution) and custom function scoring. Function scoring allows developers to create sophisticated calculations on product attribute values to yield better personalization of results. They have also added support for sorting by count and filtering by product facets (faceted search). Leadership feels this improved search functionality will help MongoDB win more deals that have a search use case. As I’ll discuss in the next post, this does start to encroach on the application search functionality offered by competing search services like Elasticsearch and Solr.
New SDKs for Realm. Developers are increasingly migrating to cross-platform mobile app development frameworks that support both iOS and Android. This allows development teams to write the majority of the code for a mobile app in a single language, versus the prior approach of having separate development teams for each device. To accomplish this, MongoDB added support for the three most popular cross-platform frameworks which are Unity (for game developers), Kotlin Multiplatform and Dart/Flutter. They also improved Realm Sync, making it more selective and customizable in what the service synchs so that the data on the phone is faster and reflects only what is needed. MongoDB’s goal is to make Realm the default mobile database, which will further drive Atlas adoption.
Real-time Analytics Improvements. The team added support for long running analytics queries that can execute on data snapshots, without affecting real-time query performance. The query workloads for analytics are isolated from operational workloads transparently. This prevents having to set up a dedicated analytics database to keep heavy background analytics queries from impacting user-facing systems.
MongoDB Charts for Data Lake. The Charts feature can now pull data from MongoDB Data Lake. This allows developers to build real-time data visualizations and embed them in web sites. Charts can load data from the transactional database, S3 buckets and even time series data.
Atlas Serverless. The CTO considers this their most important announcement. MongoDB Atlas can be run in serverless mode. Customers no longer have to pick a specific machine type or size. MongoDB will scale up or down the data store as the workload changes. This allows developers to build applications without having to worry about the database infrastructure. Leadership thinks this capability will drive Atlas adoption further.

For 6-12 months of development effort, this represents a significant set of product enhancements. If the MongoDB product and engineering teams can continue this pace of innovation, then it is realistic to assume that they will be able to round out any rough edges in their offering and achieve their vision for a single Application Data Platform.

Quarterly Results

MongoDB’s product expansions appear to be resonating with customers and driving market returns. On June 3rd, MongoDB announced earnings results for Q1 FY2022, covering the period from February – April 2021. The headline was the continued momentum of Atlas, which accelerated to 73% year/year growth, up from 66% in Q4. Atlas now makes up 51% of total revenue (49% in Q4) and has a run rate of $400M. The market loved the results, pushing MDB stock up 16% the next day. Even with that gain, the stock was still down 12% YTD. Since June, MDB has made up more ground and is now about even for the year.

Q1 revenue was $181.6M, growing by 39.4% annually and 6.2% sequentially. This beat the company’s initial estimate set with Q4 earnings for $167-170M, which would have represented about 29.3% growth. Analysts had assumed the top end of the range and targeted $170.0M. The Q2 revenue estimate was raised just slightly above the analyst estimate, with a target for 31.2% growth. While not a big raise, this is above the 29% growth target that had been set for Q1. Assuming a similar magnitude of beat would result in actual annualized revenue growth of 41%, or a slight acceleration off of Q1. MongoDB also raised their annual guidance from 31.1% growth to 35.1%. This signals to me that management expects annual revenue growth to pass 40% for the year.

Non-GAAP gross margin was 72%, compared to 73% a year ago and inline with Q4’s performance. As Atlas becomes a larger component of revenue, MongoDB is minimizing its impact on gross margin by improving operating efficiencies as the cloud component scales. Adjusted loss from operations was $8.4M for an op margin of -4.6%. This compares to an op margin of -5.7% a year ago, representing a slight improvement. MongoDB generated $8.4M in free cash flow in Q1, yielding a FCF margin of 4.6%. This is up substantially from -$8.5M a year ago for FCF margin of -6.5%. MongoDB had $935.6M in cash and equivalents at the end of the quarter.

Customer growth was strong. This underscored much of the optimism for MongoDB’s growth and the case for Atlas looking forward. Total customers increased to 26,800, up 46% year/year and 8.1% sequentially. Almost all of this growth in customer counts was attributable to Atlas, which grew to 25,300 customers and over 50% annually. Customers with >$100k ARR crossed 1,000 to reach 1,057, which was up 36% year/year. Net expansion rate was over 120% again this quarter.

Additionally, management highlighted a few customer wins on the earnings call:

Commercetools, a leading enterprise commerce platform, selected Atlas to power its e-commerce sites. The platform is used by over 200 global brands, such as Audi, AT&T, and Tiffany. With Atlas, Commercetools is able to scale its platform and flexible API approach to support the world’s biggest retailers.
UiPath provides an end-to-end automation platform that combines robotic process automation with a full suite of capabilities to enable organizations to rapidly scale digital business operations. They chose MongoDB as their underlying data persistence platform. Their intent is to increase efficiency for developers and accelerate release of new features and functionality. This reflects the focus on developer productivity discussed earlier.
Avalara is a leading provider of cloud-based compliance solutions. Their engineering team is migrating from SQL server to MongoDB to support complex data requirements on their back-end. The company has more than 30,000 customers and needs a database that can process billions of transactions. This will support their platform growth into new industries, geographies, and compliance services.
One of the largest grocery chains in the world chose Atlas to strengthen its digital portfolio and scale its services across their e-commerce and in-store offerings. The company has more than 2,000 stores. They selected MongoDB to replace Cosmos DB (Microsoft’s non-relational database service), after searching for a scalable solution with a flexible data model that will give developers a richer feature set and better visibility into their data.
Tapple, created by leading Japanese Internet company, CyberAgent, is a dating application with over 6M registered users and has made over 200M matches since inception. The app uses Atlas as its data store due to ease of use, high capacity, and the ability to upgrade very large clusters. Their Atlas installation stores over 7.5B documents.

Looking forward, given the new product positioning and improvements around the Application Data Platform, we can expect Atlas to continue to drive high revenue growth. With support for multiple data storage workloads and a presence on all major cloud providers, it is easy to conceive of Atlas becoming the go-to data storage service for any developer building a user-facing application. By addressing multiple use cases, MongoDB stands to grab a larger share of the estimated $73B market for databases.

For the remainder of this fiscal year, I think MongoDB could re-accelerate revenue growth after a tepid 2020. The Q2 revenue estimate, with a similar level of beat to Q1, would creep above 40% annualized growth, which has eluded MongoDB for several quarters. Performance in the second half of 2020, with revenue growth in the high 30% range, provides an achievable bar to beat in 2021. Similarly, the updated estimate for FY2022 calls for 35% growth, which was raised 4% from the original estimate set at the beginning of the year. With Atlas revenue growth accelerating to the 70% range and making up more than half of MongoDB’s overall revenue, I could see the company exiting this fiscal year with revenue growth in the 40% range on a quarterly basis.

Partnerships and the Hyperscalers

What a difference a couple of years can make. Following its IPO in late 2017, the market feared competition from the major cloud providers would kneecap MongoDB. These concerns were justified, as the hyperscalers were scrambling to roll their own database-as-a-service offerings. These included several relational flavors, but also non-relational variants, like document-oriented stores. In fact, an offering from AWS is called Amazon DocumentDB and promotes “MongoDB compatibility”. This compatibility is pinned to an older version of MongoDB and doesn’t represent a complete implementation.

MongoDB’s licensing change to SSPL in October 2018 created this limitation for the cloud vendors. This license change explicitly prevents other parties, like the cloud vendors, from providing a hosted version of MongoDB without a commercial license. In reaction, the hyperscalers either took the Amazon approach of pinning to an older pre-SSPL version of MongoDB or developing their own proprietary document store. In Microsoft’s case, they offer Cosmos DB, which includes compatible APIs for both MongoDB and Cassandra.

While we always have to monitor the offerings from hyperscalers, if we look at DB-Engines and Stack Overflow, these hyperscaler solutions don’t appear to be gaining much traction with developers. They are certainly used, but haven’t achieved the broad market appeal of MongoDB. For example, on the latest rankings from DB-Engines, MongoDB sits at position 5 with a score of 496, while the highest ranked hyperscaler offerings are Azure SQL Database and Amazon DynamoDB at positions 15 and 16 with scores around 75 each.

I think there are a couple of advantages for a neutral, independent provider that explain why MongoDB appears to be pulling away from the hyperscaler alternatives. I discussed these advantages in a prior post on independent software providers and the cloud vendors. I think those advantages are manifesting broadly in the performance of independents across several categories of infrastructure services, including CPaaS (Twilio), observability (Datadog), security (Crowdstrike), identity (Okta), edge compute (Cloudflare, Fastly), etc.

Specific to MongoDB, they have accomplished a lot through developer evangelism. As discussed earlier, their database is taught as part of the curriculum at leading developer boot camps and even some university programs. MongoDB is the “M” in many popular developer frameworks (MERN, MEAN, etc.). Maintaining a free community version that is open source also helps with adoption. This is reflected in the package’s download count over 175M.

On product development, MongoDB’s focus on a single service category inevitably results in a more complete solution. While the hyperscalers have much larger technology budgets, these are sub-divided many times to support each product. AWS, for example, has 14 separate offerings in their database product category. This plethora of offerings is repeated across 26 product categories. While I don’t doubt that AWS can compete with independents in many product offerings, that is still a lot of landscape. In many areas where independents have chosen to specialize, the independent offering often surpasses that of the cloud vendor in capabilities and features.

Another challenge for the hypescalers is engineering talent. While they attract their share of technology expertise, the best talent is usually allocated to higher-paying leadership positions or the most critical product categories. Many of the look-alike offerings are staffed by offshore teams or new college grads. The independent providers, on the other hand, have the benefit of stock option potential, brand recognition and focused impact to attract skilled engineers.

Finally, multi-cloud is the achilles heel of any proprietary hyperscaler solution. A custom-built data interface and storage engine creates a unique set of interactions and patterns for a developer to learn. Multiplied across several cloud vendors, that represents a lot of mental overhead. Any large enterprise has to plan for multi-cloud, even if they primarily occupy one cloud vendor. This is due to the need to avoid the perception of lock-in, in order to maintain a strong negotiating position. Also, in some industries (like e-commerce, health care, financial services, gaming), the cloud vendor can emerge as a competitor, creating an awkward situation for the CTO when all their infrastructure is hosted by the cloud division of that new competitor.

Given these advantages, the disposition of the cloud vendors towards the independent providers has shifted radically over the last couple of years. They have pivoted from competitors to enablers. At the very least, we see a form of coopetition (cooperative competition). This shift coincided with the emergence of cloud versions of independent software vendor offerings, like MongoDB Atlas or Elastic Cloud. These cloud offerings generate revenue for the hyperscalers, because the independent software provider services run on top of the cloud vendor’s core compute, storage and network. Most cloud vendors have even incorporated the offerings from the independents into their own marketplace of third-party solutions (which they presumably earn a revenue share from).

This is all to say that MongoDB has a productive relationship with all three U.S. hyperscalers at this point. Atlas runs on AWS, Azure and Google Cloud. As part of the Q1 earnings release, management pointed out that they “closed our strongest quarter to date with AWS, Google Cloud and Alibaba.” These co-sell arrangements are resulting in significant usage growth for MongoDB.

During MongoDB.live, leadership included a partner awards section. AWS was named the Cloud (Co-Sell) Partner of the Year. Google was named the Cloud (Marketplace) Partner of the Year. MongoDB has a strong partnership with Microsoft as well. Given the progress of these relationships, it’s fair to say that hyperscalers no longer represent an existential risk to MongoDB. MongoDB is now thriving as a result of these partnerships with the cloud vendors. Today’s situation would have been hard to imagine back in 2018.

As a final note on partnerships with cloud vendors, MongoDB has a unique relationship with Alibaba. The MongoDB open source project is very popular with developers in China. Due to regulatory limitations, MongoDB could not offer Atlas on top of the cloud infrastructure provided by Chinese hyperscalers. Presumably, this is due to concerns over data exfiltration. To account for this, MongoDB licenses their software to Alibaba and Tencent, who then resell it to Chinese customers. This approach has allowed MongoDB to offer its product to the Chinese market. They are starting to see material revenue benefit from these partnerships.

We saw a huge increase in joint wins (with Alibaba and Tencent). We started with Alibaba first, and what we have done is repurpose some of our local sales team in China to really enable the Alibaba organization to more effectively position and articulate the value proposition of MongoDB. And that’s — we’re seeing the results there get impacted quite positively very, very — in the short time frame we’ve done that, so much so that even Alibaba is investing more in building a center of excellence around MongoDB. So we feel really good about what’s happening with Alibaba. Tencent, we’re in the early days, and our plan is to do the same there in terms of operationalizing their capabilities to sell MongoDB as a Service to their customers as well.
MongoDB Q1 FY2022 Earnings Call, June 3, 2021

The Alibaba partnership will be one to watch, as that could eventually drive significant revenue by itself. This would be in addition to the growth of the Atlas offering outside of China. They have been discussing this partnership for over a year, and it now appears to be gaining real traction.

Investor Take-Aways

MongoDB is undertaking a bold product expansion. The Application Data Platform represents a disruptive approach in the evolution of data storage solutions for transactional workloads. It bets on versatility over specialization, with developer productivity as the undercurrent. If successful, it would dramatically simplify the database consideration set for modern software applications, addressing mobile apps, web sites and data-driven visualizations from one platform.

Based on data from developer surveys and usage trackers, like Stack Overflow and DB-Engines, MongoDB appears to be in a pole position. Their partnerships with the major cloud vendors indicate that the existential competitive threat from hyperscaler data storage offerings has subsided. MongoDB’s multi-cloud Atlas offering adds another layer of insulation.

This traction appears to be reflected in MongoDB’s recent quarterly performance. Atlas accelerated to 73% year/year growth and now makes up 51% of total revenue with a $400M run rate. This drove overall revenue growth just shy of 40%. Customer activity provided further evidence of adoption, with total customers surging 46% year/year. Almost all of the customer growth is attributable to new Atlas installations. MongoDB leadership raised guidance for the next quarter and full year, indicating their confidence looking forward.

Risks to the one-size-fits-all data platform do exist. Engineering teams may reject the approach as diluted, preferring to retain a specialized database for each workload. While MongoDB’s CTO asserts that relational database skills are dying, SQL has demonstrated surprising longevity and is still the preferred tool for analytics. Competitors could also take a similar tact. Modern data storage solutions like Couchbase and CockroachDB are trying to appeal to a broader set of use cases as well. And the hyperscalers can always decide to reverse course and double-down on their proprietary database offerings.

With that said, customer additions, developer preferences and market alignment provide MongoDB with a favorable growth environment for the remainder of 2021 and likely several years into the future. Their addressable market is huge. Digital transformation projects and legacy infrastructure upgrades will provide many new sales opportunities. Continued expansion of existing customer installations will fuel a long tail of growth. These factors should combine to drive revenue for the next several years.

When I initiated coverage of MongoDB in November 2019, MDB was trading around $141. I set a 5 year price target (by end of 2024) of $490. The stock reached a high of $429 in February and now trades in the $360 range. I think MDB could break the $490 target within the next year, representing about a 36% gain. Once it does, I will update the price target for 2024.

While I think Atlas adoption will be a significant driver of future growth for MongoDB, I would like to see some more customer wins across multiple workloads. This will provide further proof that the Application Data Platform is gaining traction. I would also like to start to see improvement on profitability measures, demonstrating that MongoDB can realize leverage as they grow. Otherwise, I think the company will continue to benefit from their popularity with developers, cloud neutral status and ongoing secular demand from digital transformation projects.

For my personal portfolio, I have owned MDB in the past. I closed my position in 2020 for a profit, after revenue growth slowed due to COVID-impacted IT spend. Given that growth appears likely to re-accelerate through 2021, I may restart a position. In my portfolio, I have to sell an existing holding in order to open a new position. This creates a higher barrier for me than if I were allocating new money. For first time investors, I think MDB is worth consideration, especially as gains so far in 2021 have been negligible. I will continue to monitor their performance and provide updates on future quarters.

Other Data Providers (Part 2)

In the second part of this post, I will compare the concept of the Application Data Platform to solutions from other data storage providers. While MongoDB is expanding the scope of their reach, other companies that serve the data needs of application developers are not standing still. Several of these offer a more complete and robust solution for particular types of application data workloads. These stand in contrast to MongoDB’s overarching one-size-fits-all approach. For the publicly traded companies that I cover, I will also weave in some quarterly updates and relate to my personal investment strategy.

Here is a list of the primary companies that I plan to explore in the second part of this post:

Elastic (ESTC). Elastic provides the leading solution for all things search. Their platform can address all types of use cases for application search with extensive customization. Additionally, they have applied the concept of search to several other IT functions, like observability, security and workforce productivity. Like MongoDB, Elastic is based on an open source project, and layered a proprietary license with advanced featured over it. They too have a thriving cloud product offering in Elastic Cloud.
Couchbase (BASE). Couchbase is a very recent IPO. They offer an open source, distributed document-oriented database. Couchbase has a colored history, with the original offering called Membase. Membase was derived from the popular key-value store memcached. The founding team sought to leverage the simplicity, speed, and scalability of memcached, and add storage, persistence and querying capabilities, to form a full-featured database. Some customers have found use cases for Couchbase which exceed the performance of MongoDB.
CockroachDB (Private). CockroachDB has been building momentum and garnering buzz in the developer community lately. The founding team came out of Google, where they had worked on the Google File System and applications for BigTable and Spanner. CockroachDB offers a relational, distributed database system that is highly resilient (like a cockroach, I suppose). The database has an interesting architecture, with a key value store at the core and a SQL processing engine over it. The code is open source, but with a license that prohibits offering a commercial version of CockroachDB as a service without paying.
Redis Labs (Private). Redis is an in-memory data structure store, which is used as a distributed, in-memory key-value database. Popular applications are to use Redis as an application cache or message broker. Unlike other memory-only caches, though, users can also persist the data. Redis is interesting because they extended the value part of the key-value store to accommodate several other data structures besides strings. These mirror common software application constructs, like lists, sets, hash tables, geohash and time-series streams. This removes the overhead of serializing these structures into a string for storage.
Snowflake (SNOW). Any conversation about cloud-hosted data storage has to include Snowflake. Their marketing position as the cloud data platform is purposely broad. While it is grounded in traditional data warehousing and analytics workloads, they include data applications as a target use case. Given their continued optimization of the query engine, it’s possible they drive performance to the point where Snowflake could be considered for some OLTP workloads. This would obviously start to encroach on the market share of transactional database providers.

Other Resources

Here are links to some other online resources that readers might find useful:

Evolving Architectures for Transactional Data Storage, Software Stack Investing, Jan 2021
A Blizzard on the Horizon (Snowflake deep dive), Hhhypergrowth, July 2021
A Brief History of Application Development, Hhhypergrowth, July 2021
Investor Session at MongoDB.live (this requires a light registration)

NOTE: This article does not represent investment advice and is solely the author’s opinion for managing his own investment portfolio. Readers are expected to perform their own due diligence before making investment decisions. Please see the Disclaimer for more detail.

In-Depth Analysis, Uncategorized

MDB

13 Comments

dmg
July 30, 2021 at 8:45 am

Well, Peter, you out-shine even your own laudatory standards with this new post. Even though you always write clearly and engagingly, never stinting to explain anything and everything throughout your text (even common acronyms)… I always seem to have a few questions. Not this time, though. I get it — I finally get it all, from beginning to end. (I could claim I get smarter but really it is because you write ever better.) Congratulations!

I do have one discrete remark. You carve out a neat niche for yourself that is unfilled by just about anyone else I can think of (save one person); your recent commentaries are less about a company and its shares but about (an ascendant) technology. In explaining that technology, you provide context, a general backdrop; especially for us lay-people to understand how that technology fits in, micro and macro. But you do not stop there; you proceed to the company behind the technology, its culture and executives, such as Mongo’s new CTO for whom this new role fits like fingers in a glove. Which understanding only provides further context for an understanding of a company’s publicly traded shares: Why do other investors favor (or disfavor) it? You even squeeze in the company’s fundamental and valuation numbers, but not in a vacuum; you reveal how the recent quarter continues trends from its recent past and near future. (Still growing? Is the cadence of growth accelerating or decelerating? etc.)

With Mongo’s vision as you describe it (or as I understand it), the Mongo team could find themselves in the role of M&A acquirer, buying companies such as Elastic or JFrog (or other companies) to complete their “Application Data Platform.” In the end, and if successful, they could ramp up and be a fitting competitor to AWS, no?

Thanks again, Peter!
- poffringa (Post author)
  August 3, 2021 at 6:47 pm
  
  Hi David – Great to hear from you. Thanks for your thoughtful feedback. You always provide great detail.
  
  That is an interesting question regarding future MongoDB acquisitions. They haven’t done a new one for some time. I could actually see them picking up a company or too to further round out their platform, as you imply. Perhaps not as large as Elastic. But, a private, second tier niche provider might make sense, like Redis Labs, Algolia or even CockroachDB. Those would be on the transactional side. Even more interesting would be if they do something on the analytics side, like a data warehouse or data lake solution. Finally, one of the distributed, edge database providers, like Fauna would be another option.
  
  I don’t think they would ever be in a position to compete with AWS wholesale (in the way that Cloudflare is implying). But, MongoDB certainly is in a position to take transactional database share from all the hyperscalers. In that way, they would cut into hyperscaler services revenue, but pay fees back to the hyperscalers for underlying compute, storage and any rev share on the marketplace.
Trond
July 30, 2021 at 9:20 am

Wow, great article again! I have to wonder (half seriously) after all these great articles are you often approached by the big names of Wall Street to hire you as an analyst? 🙂
- poffringa (Post author)
  July 31, 2021 at 12:23 am
  
  Thanks for the feedback. I am approached by analysts at large funds and we exchange notes often. I don’t expect to be paid for these engagements, as the insights gained are more valuable for my portfolio. Thanks for asking though.
Paul
July 30, 2021 at 8:02 pm

This is EXCELLENT. Thank you for all your efforts.
John Ritchie
August 3, 2021 at 1:06 pm

Peter, I agree with all of dmg’s comments – all of them! Thank you once again for taking the time to generously share your thoughtful and meaningful insights.
Soham Bhatia
August 4, 2021 at 4:01 pm

are you actually not investing new money in the portfolio, or just acting like the hurdle is there in order to boost your threshold for adding a position?
- poffringa (Post author)
  August 5, 2021 at 5:52 pm
  
  I do not have a regular income (i.e. job). All of my living expenses are paid from my investment portfolio. Also, I am almost always fully invested. So, adding a new position or reallocating positions requires selling something. That will incur long-term capital gains (non IRA account), which are about 15-20%.
Dp
August 5, 2021 at 9:21 pm

Huge earnings today from confluent and Ddog. Seeing some reaccerlerwtint growth let’s go! I might get another cloudflare position now if we can fall below 100
- poffringa (Post author)
  August 9, 2021 at 1:44 pm
  
  Agreed. I was particularly happy to see DDOG’s report and the associated revenue acceleration. Just phenomenal results. Will provide a quarterly update in a few weeks. I don’t think you will see NET under 100, unfortunately.
Scott
August 10, 2021 at 10:28 pm

Great article again!
Paul Dickwin
August 14, 2021 at 5:25 pm

Excellent post, lots of useful information. I believe this is a good strategy by MDB.

There was once a company that built a very advanced proprietary timeseries database from a database called FoundationDB (now owned exclusively by Apple, internally). They found that “normal” TSDBs would not scale to their needs because they were ingesting many millions of timeseries points per second. These TSDBs measure by points ingested per minute, and they fall over after only a few tens of thousands of points per second. Additionally, one of the hardest parts was querying the timeseries. This was slow when doing queries on a large scale. This company took FoundationDB, which was simply a large B-Tree, and wrote all of the necessary code to do indexing, caching, buffering, streaming, and a lot more to make it into an extremely performant and scalable TSDB. This took years to hone, and it is still not perfect. This allowed the company to go after very large fish in the metrics observability space. They were able to sell multi-million dollar contracts to companies sending several millions of points per second. It was quite possibly one of the most scalable TSDB systems out there today, aside from Splunk’s, from Splunk’s recent acquisition of their metrics observability solution. At the end of the day, however, the big winner in this space was actually not Splunk or this company. It was the company that decided to focus on the majority of use cases which were not so massive in point ingestion rate. I am pretty sure you know which company this is, however, you probably have never heard of Splunk’s acquisition or the company that created this TSDB from FoundationDB. It goes to show that if a company creates a solution that works for the developers and masses rather than specific large use cases while thinking the company can go down-market later, the company will reap huge rewards.
- poffringa (Post author)
  August 14, 2021 at 5:34 pm
  
  Hi Paul – That is a great story, which I had’t heard previously. Your point is spot on and likely validates MongoDB’s approach. Thanks for the feedback and the thoughtful anecdote.