Datadog Dash 2022 Recap

Datadog introduced a number of new products and enhancements at their annual user conference last week. Expectations were high coming into the event, as Datadog often holds back new releases for several months in order roll out a parade of goodies. Once again, they stepped up the pace, highlighting 18 separate product announcements versus 10 at Dash a year ago. The breadth and scope of their product reach continues to expand.

While the feature list is sprawling, Datadog’s product strategy is consistent. They remain focused on serving all needs of a modern DevSecOps function from a unified interface and shared data set. This creates advantages in efficiency and clarity over bundling together multiple open source and commercial point solutions, eliminating toggling between tools and reconciling inconsistent performance indicators. While Datadog reaches further into new areas like developer tooling and security, they are disciplined about delivering what is relevant for a DevOps context. This deliberate strategy should allow them to continue to grow their addressable market without infringing on core offerings of entrenched competitors in adjacent categories.

Audio Version
View all Podcast Episodes and Subscribe

In this post, we will review the evolution of Datadog’s product strategy and then delve into all the product announcements made during Dash 2022. For readers interested in a historical refresher on Datadog’s product offerings and financial performance, you can review my prior posts.

Evolving Product Strategy

While Datadog’s product set keeps expanding, its strategy remains the same. This is important for investors to appreciate, particularly as it appears that Datadog’s product extensions start to move into categories that are occupied by established competitors, whether in security, developer processes or incident management. Additionally, many point solutions exist that address specific use cases, a lot of which are available as viable open source projects. Investors might be left wondering why enterprises would pay so much for Datadog’s platform (particularly in this environment), when segments of the solution can be addressed by snapping together point products or open source.

The answer can be found by going back to Datadog’s first big industry achievement in 2018. That was consolidating the “3 Pillars of Observability” into one solution. This is a reference to the fact that monitoring the health of a software application could be accomplished by observing application logs, infrastructure metrics and code execution traces. Prior to Datadog, each of these functions was often addressed by a separate tool. Going back 10 years ago to 2012, an engineering team (DevOps was still a nascent concept) might use Splunk to examine log data, New Relic for traces and some open source project like Graphite for metrics. And Nagios could alert the team if something went out of threshold.

While it worked, this topology created real inefficiency for engineering teams (and the VPs leading them). When an issue arose, like the Home Page not loading, team members would have to toggle between three different tools to figure out the problem. They would do searches for error messages in Splunk, scan major service tiers for CPU spikes in Graphite or look for trace timeouts in New Relic. If they were lucky, a Nagios alert might point to the measure that went out of threshold.

Datadog removed this swiveling and context switching. All the indicators above were consolidated into one toolset, providing DevOps teams with a single view of the health of their applications, logs and infrastructure. Not only was the interface common, but the data set was shared. This removed confusion where two different tools provided incongruent interpretations of the same problem. This consolidation was the magic of Datadog and explains the rapid rise in its popularity, in spite of Splunk, New Relic and others having the jump on them.

Datadog Dash, Investor Presentation, October 2022

Granted, other observability players have followed a similar tact, incorporating the missing pillars into their solutions. Splunk eventually added traces and metrics, primarily through acquisitions. New Relic brought in logs and metrics. Dynatrace addressed the three pillars as well. But, Datadog was the first.

Datadog didn’t stop there, however, and that’s where they are pressing the same advantage. They kept releasing new functionality, incorporating Real User Monitoring (RUM), synthetics, network monitoring, databases, serverless, user session replay, and on and on. More recently, they pulled in security use cases and developer operations. All of these are offered as individual modules. There are so many, in fact, that the list starts to appear overwhelming.

However, there is a method to the madness. All of the disparate components of the Datadog platform are relevant for a modern DevSecOps function. DevSecOps represents the shared interests of developers, operations and security personnel. It encompasses all of the principles of DevOps and infuses security into each step, with the overall mantra that application and infrastructure security issues should be addressed before they reach the production environment.

Modern software applications delivered over the Internet at scale have become so complex that they have to be “observed” across all three of these planes in order to function properly and securely. And all 21 of the individual modules displayed above are relevant for the combined DevSecOps function and remain within the scope of application delivery and hosting. As examples, Datadog is not delving into endpoint security or developer IDE’s, because those take Datadog out of scope.

To fully appreciate the importance of this, we have to go back to Datadog’s original value proposition. Datadog saved users significant time and headache by consolidating logs, metrics and traces into one view. They eliminated toggling between tools and trying to connect an event in one tool to the source data in another.

The beauty of Datadog’s platform is that all these modules are linked to a common set of data and displayed in a single view. The user can easily navigate through modules by following the data, moving from one indicator to another until they have identified the root cause of the issue. This common platform facilitates a shared view across DevSecOps and “removes silos”, which were often the source of immense frustration as developers and operators argued over root cause because their “tools” showed different interpretations of the data.

And that is why point solutions, even free open source alternatives, will always introduce a handicap for the DevSecOps team. If several solutions are bundled together to address all monitoring functions (going far beyond just logs, metrics and traces), they will create drag for the team, as operators have to go back in time and start toggling again. Datadog’s value is in the shared view and common data plane across all facets of the platform, because it saves time and eliminates arguments. It’s all or nothing to make observability really work.

And while it may appear that some of the newer additions to the platform, like security or developer experience, may be feature islands off on their own, Datadog has been very selective in the functions they are choosing to address. For security, the modules are grounded in application security, SIEM and the cloud infrastructure workload. Datadog isn’t trying to do endpoints, because those are outside the purview of a DevSecOps function. Similarly, for developer experience, they are focusing on the integration and testing steps that precede the production environment. These by definition represent the connection points between developers, operators and security teams.

As Datadog keeps entering new product segments, it may appear that there are many commercial options for the functions addressed by Datadog’s platform. This might even lead one to conclude that the DevSecOps tooling space is becoming commoditized. However, if we assume as a first principle that teams shouldn’t waste time toggling between tools and talent is scarce, then there really aren’t other options. Few, if any, other observability vendors address the full scope of Datadog’s platform for a modern DevSecOps team in a single toolset with shared data.

And even if they wanted to, competitive offerings can’t keep up with Datadog’s pace of development. The Datadog product team is adding an ever accelerating number of relevant capabilities to the platform. Each year, they introduce additional modules and make substantial improvements to existing ones. If a new module isn’t best of breed at initial launch, Datadog continues iterating until it is. We need only look at their rapid ascendance of the Gartner APM Magic Quadrant from 2020 to 2022 for evidence of that.

While Datadog has a strong culture of innovation and speed, the financial model really enables it. Because Datadog remains steadfast in addressing use cases for the DevSecOps team, they generally sell into the same customer organization. Module additions generate more revenue with little sales effort. This allows Datadog to shift the majority of their operating expense to R&D. They are the only software infrastructure company that I cover which spends significantly more on R&D than S&M.

Datadog Dash, Investor Meeting, October 2022

R&D is the fuel for even more product innovation. Datadog’s additional innovation puts pressure on competitive offerings to spend more on S&M in order to land customers, because they can’t compete on feature set alone. This further aggravates the problem. Datadog’s rapid revenue growth has compounded it further, supporting large annual increases in R&D investment. Between FY2020 and FY2021, Datadog grew their R&D spend by 79% on a Non-GAAP basis, and still delivered 16% operating margin. This is unique amongst their competitive set.

As we turn to what was announced at Dash, we can keep this theme of a common experience for the modern DevSecOps function in mind. Even as Datadog announced 18 separate product launches, they all contribute to this shared view. Every one of them makes the DevSecOps function easier and more efficient. And every one of them is more valuable as part of a shared platform, than they could be on their own, even if they were free.

And with that, let’s see what Datadog announced at Dash.

Product Announcements

As we would expect, Datadog’s scope of announcements at Dash 2022 trumped anything from prior years. Based on a chart shared during the Investor Meeting at Dash, there were 18 individual product releases highlighted. This compares to 10 product releases listed in a similar chart during the Investor Meeting during Dash 2021, representing an 80% increase.

Looking across all the announcements from Dash, a few new themes emerged. These represent slight changes in product positioning and functionality.

Shift to taking action versus just observing. Datadog’s traditional posture has been to collect and report data, but rely on the operator to take action. Datadog has introduced new features that perform expected actions in response to specific events. This posture shift is most notable in security, where tools will now block malicious users, versus just reporting their activity.
More automation. Recognizing that skilled DevOps personnel are in short supply, Datadog introduced new tools that allow operators to create basic scripts and sequences of steps to execute for common scenarios. These apply to the prior point of taking action as well. Finding ways to generate efficiency and time savings for operators is a top priority. Additionally, they support no code / low code functionality, allowing a broader set of operators to be productive.
Better user experience. One of the big advantages of the Datadog platform over an approach of cobbling together multiple open source projects or point solutions is Datadog’s ability to report on activity at the web application user level and tie any unexpected behavior back to a metric, log or trace being monitored. Most common monitoring tools provide plenty of application and infrastructure performance data, but don’t present it within the context of a user experience. Datadog’s Digital Experience tools allow operators to “see” the path taken by every user session.
Security strategy is DevOps focused. As I touched on earlier, Datadog’s security strategy is deliberately scoped. They recognize that security experts are in short supply and are well served by full-featured security platforms from other vendors that encompass endpoints, network access and user identity. Datadog’s security strategy, on the other hand, enables developers and operations personnel to round out security protection by focusing on their scope, which is applications and cloud infrastructure. In this way, Datadog’s security tools supplement the functions of the SOC, providing a multi-layered security approach.

Of the 18 announcements at Dash, five are products in GA status. The rest are in limited availability or beta mode, with an expectation that they would progress to GA status over time. Products in GA, of course, are tied to revenue. Of those, Cloud Cost Management is now listed as a separate product SKU with pricing. It starts at $7.50 / host / month.

Datadog Product Grid, Web Site, October 2022

Of course, I couldn’t write a blog post about Datadog without including their product grid, which has been expanded to 18 cells (Yes, Muji, there it is). Prior to Dash, there were 17. At the beginning of 2022, this grid had 13 cells, representing 38% growth so far this year. In Datadog fashion, we may get another 1-2 of these over the next quarter or two.

These cells represent the stand-alone product modules with pricing on Datadog’s pricing page. Investors can think of these like products in a store – the more there are, the larger the customer basket. Obviously, as customers expand their product subscriptions, Datadog will make up a larger component of their IT budget. This is where Datadog’s messaging around the value proposition is important, lest large enterprises start to look towards alternatives to lower their spend.

Datadog’s value add revolves around the savings incurred through tool consolidation. While open source alternatives are “free” in theory, they still require staff to set up and maintain. Additionally, as I have discussed, open source projects or other lower cost point solutions only address one piece of the observability puzzle in isolation. Operators have to combine multiple tools in order to get the full view of application and infrastructure performance. This toggling is inefficient at best, and often the source of misdirection. Having all facets of user experience, infrastructure, logs, application traces and security incidents in one view will more than cover the staff time wasted in disconnected views and the business impact from extended outages.

Let’s run through the various product announcements made during Dash. I have to admit, grouping these is a bit challenging. As the number of product offerings keeps expanding, it seems that Datadog is experimenting with different labels to bucket them. For example, on their Platform slide from earlier, the new Cloud Cost Management product is grouped under Infrastructure Monitoring. However, on the summary slide of all the Dash 2022 announcements for the Investor Meeting, they put Cloud Cost Management under the Platform product area. Then, on their blog post summarizing all the Dash releases, they group Cloud Cost Management under “Breaking down silos”.

I will bias towards the announcements on the Dash 2022 slide and generally use those categorizations (with some smaller changes for clarity). There were many other enhancements included in the keynote address, blog posts and press releases. I will limit my coverage to those announcements that I thought were most significant.

Platform

Cloud Cost Management

For many organizations, the cost of cloud infrastructure is spinning out of control. This wasn’t as much of a problem in 2020-2021, when IT budgets were flush and growth trumped expense. But, in the current environment, IT teams are being pressured to reduce costs. One area that can yield substantial cost savings is cloud infrastructure.

Because spinning up new cloud resources can be automated, many organizations over-provision infrastructure for their applications. Additionally, they miss opportunities to optimize the performance of code to reduce the need for as much infrastructure. This is an important additional driver for infrastructure spend reductions. An AWS bill will tell the purchasing manager how much they spend on database resources, but it won’t show them that one poorly designed, frequent SQL query is generating 90% of the load.

To provide teams with tools to better manage their cloud infrastructure spend, Datadog launched Cloud Cost Management. This product is already available in GA and has pricing associated with it. The service is currently available on AWS only, with Azure and GCP coming in early 2023. Using the tool, DevOps teams can drill down and investigate changes in cloud spend by cost center, application, service and resource.

As I hinted at, what I think will make Datadog’s Cloud Cost Management tool unique among the many similar tools is its grounding in observability. Similar to my example, most existing tools will point out hotspots in spend generation, but not offer insights into ways to reduce resource utilization. In my example of database utilization, the Datadog platform would not only point out the heavy use of an RDS instance, but the Database Monitoring product would show the top 10 queries by load and even insights into how to tune them.

Another aspect of Datadog’s solution, relative to point products, is how cost can be rolled up into a top-level metric as part of an overall application health dashboard. While traditional observability metrics for a typical web application might include error rates or load times, adding cost as a tracking metric helps provide a full picture for the business. It also enables the ability to flag sudden changes in cost, as new application features are released.

As the service has been launched to GA, it is revenue producing. In the press release announcing the product, Datadog included a reference to at least one customer (Stitch Fix) using it. I could see many other existing observability customers adding this service, potentially replacing a similar function in a point product or consulting engagement. As an aside, industry protagonist Corey Quinn experimented with the module post launch and provided some funny commentary on Twitter. The issues he identified are easily addressable, but also highlight the complexity of cloud vendor pricing terms and billing.

Powerpacks

To provide customers with a way to bundle up common components into a sharable dashboard view, Datadog introduced Powerpacks in July. As DevOps organizations grow, it is useful to standardize the dashboards shared across teams. This provides a way to maintain consistency so that stakeholders and incident responders are viewing data in the same way. They also provide a useful knowledge sharing mechanism, as team members with deep tribal knowledge can offer their preferred system views.

Powerpacks enable this sharing by allowing team members to create a template with a set of widgets and data scope. Those templates can be named and shared across the organization, with easy access from the Datadog widget tray search bar. This standardization can speed up an organization’s adoption of new monitoring patterns and provide a scalable way to promote monitoring best practices. Powerpacks have been released to general availability as part of the broader Datadog platform.

CoScreen

Earlier this year, Datadog acquired CoScreen. CoScreen provides a collaboration tool that allows multiple users to share one or more windows in a common view. Layered over this are audio and video chat capabilities. This facilitates a lot of use cases in collaboration, including pair programming, user testing and product design. Datadog has refined the use cases for CoScreen to focus on common scenarios in DevOps, like reproducing user bugs, troubleshooting performance issues and incident management.

It appears that Datadog is offering CoScreen as a free download for individuals. The provide an enterprise licensing option for larger organizations. CoScreen is being integrated into the Datadog platform. Incident Management is the first integration point, where a DevOps team can launch a CoScreen session to collectively troubleshoot an incident (site outage, error reports, user issues, etc.).

Workflow Automation

One of the more interesting new capabilities introduced during Dash was Workflow Automation. It provides a framework for teams to create a sequence of actions to be executed. A workflow can be triggered by observability data, security signals, dashboards or can be initiated manually by the operator. Automating critical tasks with Datadog Workflows allows teams to reduce the time to resolution for incidents and kick-off maintenance tasks to prevent them.

Workflow Management Documentation, October 2022

Workflows currently includes over 600 canned actions in the Action Catalog. These span many tools, cloud provider services, third party offerings, Datadog products and even custom HTTP requests. Actions can perform state and logic operations, allowing users to build complex workflows with branches, decisions and data processing. Datadog provides preconfigured flows in the form of blueprints. Dozens of blueprints help operators build processes around incident management, DevOps, change management, security and remediation.

Given the shortage of skilled DevOps and security personnel, I think that Workflow Automation will be well received by enterprise customers. The flexibility and programmability of the workflow framework allows teams to address a large number of scenarios. It will also make the Datadog platform more sticky for customers, as their investment in writing workflow automation in Datadog creates a switching cost.

Datadog Workflows is being offered currently in private beta status for existing customers. I think this capability has a lot of potential and I wouldn’t be surprised if Datadog keeps building on these automation capabilities as a large value-add to their platform.

Event Management

Datadog Events provide records of significant changes in the production environment, ranging in variety from deployments, health of services, configuration and monitoring alerts. The Events tool provides operators with a consolidated user interface with which to search, filter and analyze events in one place. An event can also be categorized as an incident, if it is disrupting a service. Incidents, and all their associated monitoring data, can be tracked and coordinated with the Incident Management tool.

In a large service disruption, there may be many events getting generated, which all are associated with the same incident. As DevOps teams troubleshoot and respond, it is useful to know that a stream of events are associated with the same issue being addressed, versus a new issue that is service impacting and has to be investigated separately.

Datadog Event Management, Blog Post, October 2022

Datadog’s new Event Management service takes this a step further by correlating, contextualizing, and prioritizing events into a single, unified view. It will automatically detect that events and alerts are related and tie them together, decreasing the number of notifications that DevOps teams need to investigate. To allow for seamless communication and escalation among teams, Event Management also integrates with collaboration tools like ServiceNow, Jira and even Slack.

Event management has been introduced as a private beta. Existing customers can request access. Once ready for GA, it would likely be tucked in as a general capability of the platform.

Security

On the surface, Datadog’s move into the security space could be construed as counter-intuitive. Don’t they realize that security is a crowded market with a number of best-of-breed providers (PANW, CRWD, S, ZS, etc.)? However, Datadog’s security strategy makes more sense if we consider it through the lens of the typical enterprise IT organization and the potential buyers.

In most enterprises, the overall security function falls into the domain of the CSO/CISO (Chief Security Officer). That individual would maintain a team of security professionals who are responsible for securing the entire enterprise, from endpoints to identity to workforce apps to network access to data centers and offices. Their focus would be heavily on the employee base, enterprise resources, physical locations and hosting infrastructure. This team would be the buyer for security platforms that spanned these areas, including EDR/XDR, Zero Trust / SASE and IAM. They would also provide oversight to cloud security, and might even extend coverage of their security platform into workload protection.

For those companies that build their own software applications and spin up cloud infrastructure to host them, there is generally another organization that encompasses development and infrastructure operations (DevOps). These teams are typically led by roles like CTO, VP of Infrastructure and VP of Engineering. They may be a peer to the security organization. In some cases with newer, cloud-native companies, the security team reports into these leaders.

For the functions of building software applications, deploying them to cloud infrastructure and monitoring their performance, these teams will often already have embedded DevOps engineers that are heavily using observability platforms (like Datadog). This activity provides natural extensions into security context, which would include application security, cloud security posture management (CSPM), workload security and application log analysis with a security context (SIEM).

For these application and workload specific security functions, it can make sense to locate them within the purview of the DevOps team. This is because they are dependent on the applications, infrastructure and log data that the DevOps team maintains. While the security team will certainly be interested in the same data and likely import it into their broader XDR platform, it also is reasonable to assume that these DevOps teams and their leadership want to have assurances that their scope of operations are secure. I can see the rationale for a CTO/VP Eng/VP Infra to pay for these security tools to ensure that their purview is not the source of a breach. Yes, the security team is overall responsible, but these individuals want to cover themselves as well and would want to proactively track security context within their overall observability function.

This organizational dynamic is the foundation for Datadog’s move into the security space. It can be confusing for investors because the demand drivers don’t apply to all enterprises. A Fortune 500 company that has little internal software development (mostly relying on other vendors to provide software) would most likely not be a candidate for Datadog’s security modules. The IT functions within these companies are usually led by a CIO. If there is internal development, it reports up into that organization.

On the other hand, for those large companies that produce a software or Internet service as their primary business, or a good portion of it, the IT organization will typically skew towards the development function and be led by a CTO. There may be a peer CIO, who is responsible for the internal corporate IT functions and has an associated security team. That company’s software applications and their associated cloud infrastructure likely already have an observability solution in place. Adding security functionality for the same set of applications and infrastructure represents a logical extension.

This extension is seamless because a software agent for the observability function is already deployed across all cloud infrastructure. Avoiding the deployment of another agent for a stand-alone security platform is a nice benefit. Multiple agents can create bloat in configuration scripts, infrastructure management and raw performance of hardware. If the Datadog agent is already deployed across the infrastructure, then leveraging the same agent for security functions reduces agent bloat.

So, when investors try to reconcile how Datadog could conceive of competing in the security space with the likes of Crowdstrike, Palo Alto or Zscaler for application and cloud workload security, it really comes down to the target organization. To put it simply, Datadog is more likely to sell security products to Airbnb, Asana, SoFi, DraftKings or Peloton than they would to P&G, Pepsi, Merck or GM. That may limit the market size a bit, but still provides Datadog with plenty of SAM to pursue.

With that context, let’s take a look at what Datadog announced in the security realm.

Cloud Security Management

Cloud Security Management represents a new top-level Datadog security product that brings together their Cloud Security Posture Management (CSPM) and Cloud Workload Security (CWS) offerings with some additional capabilities to alert DevOps teams to misconfigurations and active threats in their infrastructure. Datadog’s Cloud Security Management product is generally available now.

This product addresses the security category defined by Gartner as Cloud Native Application Protection Platform (CNAPP). The idea is that protecting cloud-native applications requires its own techniques, going beyond legacy network security postures with a “castle and moat” topology. CNAPP incorporates integration with configuration pipelines that define modern infrastructure on public and private clouds. It also actively monitors that infrastructure for threats and responds to them.

CNAPP is made more effective when combined with observability data to provide context and prioritize alerts. It also works well when supplemented by granular details on technology stacks and configurations (per the new Resource Catalog discussed below). CNAPP also enforces tighter controls over misconfigurations of cloud workloads, secrets data, storage and containers. It can proactively scan, detect and quickly remediate security and compliance risks due to misconfigurations.

CNAPP also includes a Cloud Workload Protection (CWPP) service that addresses the workloads deployed across public, private, and hybrid clouds. CWPP makes it possible for enterprises to shift security left and integrate security solutions early throughout the application development lifecycle. Solutions in this category first discover workloads within an enterprise’s cloud and on-premises infrastructure. Then, they scan them to detect security issues and provide options to address the vulnerabilities. Additionally, CWPPs provide security functions such as runtime protection, network segmentation and malware detection for workloads

Datadog’s Cloud Security Management offering includes pricing for both components of Datadog’s CNAPP solution – CSPM and Cloud Workload Security. Costs accrue on a per host per month basis. The press release announcing the product included quotes from two customers, FirstUp and Vertex.

By coming to CNAPP from the observability space, Datadog offers a few advantages over other CNAPP providers. First, existing customers emphasize the value of having observability data available alongside the CNAPP capabilities from Datadog to provide additional context, allowing them to prioritize alerts. Second, Datadog’s agent is already installed on all infrastructure components, allowing the CNAPP functionality to piggyback on monitoring.

I think Datadog is being thoughtful in their pursuit of a CNAPP solution, versus other sectors of security. Because the functions of CNAPP and remediation would involve the DevOps team more than even the security team, this makes the DevOps organization the logical buyer, or at least having influence over the security team’s decision. By already having a relationship with DevOps through observability and the added benefit of observability data in this case, I think Datadog is in a favorable position to capture some share of the CNAPP market.

Workload Security Profiling

Collaboration between security and DevOps teams can be difficult if they use disjointed security and monitoring tools. For example, without enough context, DevOps may not be able to determine if a newly spawned child process for a workload is a part of a larger attack. Conversely, security teams may risk disrupting production workloads by blocking traffic from a legitimate source. On top of that, disparate tools often generate false positive notifications for authorized activity, creating alert fatigue.

Workload Security Profiles, a new capability being introduced within the Cloud Security Management offering, creates a baseline of a workload’s typical behavior in order to surface unusual activity. From that baseline, it becomes much easier for the system to determine if new activity is nefarious or just part of a recurring job. This additional context enables organizations to be sure they are focusing their efforts on legitimate threats rather than wasting time sifting through spurious alerts.

Workload Security Profiles is being offered in private beta and is a new capability within the Cloud Security Management product.

Resource Catalog

Simple misconfigurations in cloud resources can lead to costly data breaches if they are not identified in time. But finding these vulnerabilities is more difficult when teams do not have an efficient way to track all of their resources’ configurations, especially in large-scale cloud environments. Datadog Cloud Security Management solves this issue by providing insights into who owns specific resources as well as details about their overall security health via the Resource Catalog.

Datadog’s Resource Catalog provides a high-level overview of the hosts and resources in a customer’s cloud and hybrid environments. Information such as tags, configuration details, relationships between assets, misconfigurations and active threats are all available in the tool. Operators can view what team is responsible for each resource, as well as any security findings that have been reported. They can also access dashboards to monitor telemetry and security data for each resource.

Datadog Documentation for Resource Catalog, October 2022

The catalog enables teams to assess their environment’s most urgent vulnerabilities and drill down to specific resources for further investigation. This visibility also gives DevOps complete ownership over their resources. For example, they can proactively monitor the resources they own to determine which ones have failed one of Datadog’s posture management rules and group them by category to assess the scope of their risk.

The Resource Catalog is available to existing customers as a private beta. It is part of the Cloud Security Management service. I view this as a significant new capability to enhance Datadog’s security tooling and level it up in relation to competitive offerings.

Native Protection

Earlier this year, Datadog introduced Application Security Management (ASM) to provide application protections against code-level vulnerabilities and threats against application business logic. Common attacks of these types include SQL injection, XSS and server-side request forgery. Datadog’s ASM leverages its observability capabilities like APM to trace the flow of attacks and attackers across distributed services, giving teams insight into how their applications, APIs, and databases reacted to threats. This capability leverages Datadog’s acquisition of Sqreen in 2021.

As part of Dash 2022, the team expanded Datadog ASM to include native protection capabilities. Operators are not just alerted to attacks, but can now address them by taking proactive steps, like blocking malicious IPs directly from the Datadog interface. Datadog ASM also includes Vulnerability Monitoring, which automatically flags any code-level vulnerabilities introduced by an application’s open source library dependencies. Together, these new capabilities enable customers to identify any service at risk, improve their security posture and mitigate threats before they escalate.

Application Security Management is available now for customers with stand-alone pricing. Native protection is being introduced as part of ASM. It adds more capabilities to the product oriented around taking action to protect an application from threats. This brings Datadog’s product closer to feature parity with other Application Security products.

Developer Operations

The Datadog team continues to push into developer workflows. These are focused on pre-production steps, where multiple developers have completed their code changes and are ready to integrate them together into a build that could be released. When these integration steps are automated and run successively without developer intervention, the process is referred to as Continuous Integration (CI).

Before a software build candidate can be released to production, however, it must be tested for proper functioning across the full set of application use cases, browsers and devices. If any of these tests fail, then a developer has to investigate and fix the issue. This type of testing was done by hand long ago, and is increasingly being automated. Continuous Integration platforms can manage these software build and testing steps. Once all tests pass, the same system can even trigger the deployment. The collection of automated integration and deployment processes is referred to as CI/CD (continuous integration / continuous delivery).

A logical extension of Datadog’s observability platform has been a “shift left” into these integration testing steps and managing their automation. As these systems become increasingly complex and prolific, DevOps teams need tools to manage them. At the core is providing “visibility” into CI/CD processes. As an observability provider, Datadog is in a unique position to provide this visibility.

That has been the basis for Datadog’s CI Visibility product area, which already has pricing associated with it. At Dash, Datadog introduced two new products to push further into developer operations.

Continuous Testing

Datadog launched a new Continuous Testing product module that adds a number of automated testing capabilities to customers’ CI/CD processes. The benefit is reducing time spent setting up, configuring and running automated tests, and provide more confidence that software releases are issue free.

Continuous Testing accomplishes this through three primary workflows. First, it helps teams create effective tests quickly. It enables this through its codeless web recorder. This tool allows QA personnel to create functional tests by simply exercising the application. The web recorder then logs all the interaction detail, allowing it to replay the same set of actions automatically for each future test. The operator can designate expected outcomes for the system to check. If all expected outcomes are reached without throwing errors, then the tests would be marked as passed. These tests can be repeated across major browsers and device types to ensure thorough coverage.

Second, the test suite is made more resilient through “self-healing” capabilities. This means the system can recognize UI code changes that would impact a functional test and update it before a test failure. This capability is very valuable, as developers and QA personnel often spend numerous cycles updating their functional tests with each set of code changes.

Finally, Continuous Testing is integrated with most popular CI/CD platforms, including GitHub, GitLab, Azure DevOps, CircleCI and Terraform. This makes the tool versatile for multiple environments. It also integrates with Datadog’s observability services, like APM and RUM. In this way, if a test fails, the operator could troubleshoot in multiple ways. They might observe that the application threw an error and fix the bug. Or, the issue could be performance related – perhaps a time-out from a poorly running query. This wouldn’t surface as a bug, but would be noticeable in an APM trace.

Continuous Testing is available in GA for customers to use. It has its own top-level pricing, based on the number of tests run per month. The product offers pricing for browser-based tests and API testing. API testing is less expensive, because it doesn’t need to be repeated for multiple browser variants. Customers can also add-on a parallel testing capability, which spins up multiple test environments to execute the test suite faster.

I think Continuous Testing represents a meaningful addition to the developer workflow tooling from Datadog. It snaps nicely into capabilities that Datadog already has available in RUM and Synthetics, bringing those into developer operations. With clear added value and its own pricing, I could see customers picking up these capabilities to supplement their internal CI/CD practices. The space isn’t greenfield, however, as most CI/CD tooling providers, like GitLab, offer a similar capability. I think Datadog’s solution will differentiate by leveraging their extensive observability tooling to help developers troubleshoot issues when tests fail.

Intelligent Test Runner

To further streamline the CI/CD process, Data introduced a new capability called Intelligent Test Runner, which will be tucked into the CI Visibility product. A full integration test suite can consume a lot of time and resources to fully run. This can result in delays for developers who are waiting on feedback from integration tests before moving on to their next task. A full test suite can sometimes require hours to complete.

Oftentimes, a small code change only impacts a few of the tests in the suite. Without built-in intelligence, however, the CI tool will simply run the full test suite, introducing a long delay before the developer can close out the issue and move on to another task. With Intelligent Test Runner, the Datadog platform recognizes those functional tests that would be impacted by the code change. It then runs only the impacted tests and skips the rest. This significantly reduces test run times, speeding up developer cycles. It also uses testing resources more efficiently, which will lower costs by requiring fewer test environments.

Existing Datadog customers can opt into the limited-access beta to try out the new Intelligent Test Runner capability. I think this provides significant value on top of the existing CI Visibility product, which has stand-alone pricing. Since these products are oriented around developer workloads, their pricing is based on the number of committers, which roughly scales with the size of the development team. This represents a different, but logical, pricing model for Datadog, as opposed to some of their observability products.

Network Device Monitoring

SNMP Traps

Datadog’s Network Device Monitoring (NDM) service is offered as part of the Network Monitoring product with its own pricing. NDM collects telemetry data from a customer’s on-premise network equipment by polling devices using the Simple Network Management Protocol (SNMP). Pulling data this way through SNMP provides valuable insights into a customer’s fleet of devices, including routers, switches and firewalls. However, polling by itself can miss network issues that occur outside of polling periods. Further, some information about devices, like hardware failures, may not be available via SNMP polling at all.

For complete visibility into network equipment, Datadog NDM now collects SNMP Traps. SNMP Traps are generated by network equipment when there is real problem or error that needs to be addressed. The Trap provides information on the time and an identifier for the error. This is important information for operators to receive immediately, as they can precede equipment failure.

Datadog’s additional support for SNMP Traps expands the capabilities of their existing NDM suite, allowing operators to consolidate troubleshooting efforts within a single pane of glass. Users can easily view, sort and filter SNMP Traps side-by-side with other network infrastructure metrics. Users can also set up monitors for SNMP Traps, allowing them to receive notifications for issues before they impact the rest of the network.

SNMP Traps are now generally available to customers who utilize the Network Device Monitoring service within Network Monitoring.

NetFlow Monitoring

NetFlow is a network protocol system created by Cisco that collects active IP network traffic as it flows in or out of an interface. The NetFlow data can be analyzed to create a full picture of network traffic flow and volume. The NetFlow protocol is used to determine the point of origin, destination, volume and paths on the network for all traffic. Before NetFlow, network engineers and administrators used Simple Network Management Protocol (SNMP) for network traffic analysis and monitoring.

In order to support this higher level of network traffic analysis, Datadog has released NetFlow Monitoring to public beta for customers. This allows customers to tap into NetFlow to visualize and monitor flow records from their Netflow-enabled devices. This capability further improves the effectiveness of the Network Device Monitoring service.

User Experience (Synthetics, RUM)

Mobile App Testing

As I discussed in the overview, Datadog’s observability platform is compelling for customers because of its extensive support for all types of monitoring in a single toolset. While customers looking to save money could try to leverage open source projects or less expensive point solutions, they will find that many capabilities built into Datadog at the fringes of user experience monitoring just aren’t available in alternative products or require stitching together several capabilities, creating a lot of management overhead.

One of these extensive capabilities in the Datadog platform has to do with their Digital Experience Monitoring suite and the Synthetic Monitoring product. With Synthetic Monitoring, DevOps and QA teams can create code-free tests that simulate user transactions on web applications and APIs. These can be set to run frequently and quickly detect user-facing issues with API and browser tests. Multiple failures can trigger system-wide investigations to optimize performance and enhance end-user experience.

A capability lacking in Datadog’s synthetic tests previously was the ability to support mobile devices. With the announcements at Dash, Datadog has now added support for mobile application feature testing in both iOS and Android. This allows users to create step-by-step recordings of common application workflows and schedule them to run on real devices. These mobile application tests can be run against new builds in the CI/CD pipeline or against the production environment.

The Synthetic Monitoring tool will provide detailed results of all the tests and their pass or fail state. It also includes helpful screenshots of each step to facilitate troubleshooting. The system can detect small UI changes and update the tests to account for those, versus automatically failing.

Mobile app testing is available in private beta status for existing customers. I think this feature is important as it further rounds out Datadog’s capabilities in the observability space. The continued depth and breadth in observability feature support is what drives Datadog’s annual improvement in Gartner’s Magic Quadrant for APM and Observability. As investors will recall, Datadog has rapidly ascended the rankings over the last 3 years to reach the top spot in the Leaders Quadrant in 2022. They rapidly caught up to long-time leader Dynatrace and are now poised to pass by them. This rapid expansion of capabilities in existing categories like observability makes it very difficult for other providers to compete.

Heatmaps

Another useful, high-level feature introduced at Dash was Heatmaps, which supplements the Real User Monitoring (RUM) product offering. RUM provides insight into an application’s front-end performance from the perspective of real users. Operators can set up user journeys through the application, representing common paths. That experience is then measured with synthetic tests, back-end metrics, traces, logs and network performance data. This allows operators to quickly detect poor user experience and resolve issues with context from across the stack.

To make RUM more actionable for troubleshooting poor user experience issues, in July 2021, Datadog launched Session Replays. With this feature, operators can watch individual user sessions using a video-like interface. This allows them to view exactly how users interact with the web site and correlate that to reports of poor experience in RUM. This visual troubleshooting saves time and eliminates guesswork in recreating bugs.

Besides using Session Replays for troubleshooting, product managers can watch replays to determine if their product designs are effective. They can observe hundreds of individual user sessions, noting places where navigation, expected actions or UI elements may not be clear. However, watching hundreds of session replays in a row to find common patterns can be laborious.

Datadog is introducing Heatmaps as a supplement to Session Replays to address this problem. Heatmaps shows the common user interaction points overlaid on the actual web page or application screen, and visually highlights areas of that page at different levels of intensity based on user clicks. Using this view, product managers and UI designers can quickly assess whether actual users are navigating the web site or application as intended.

Datadog Heatmaps Example, Blog Post, October 2022

This is very high signal information for product designers and provides another example of a feature that differentiates the Datadog platform from basic monitoring tools and metric visualization solutions. Heatmaps are available as a private beta for existing Datadog RUM customers.

Observability (Logs, APM)

Log Forwarding

Datadog’s Log Pipelines capability within their Log Management product offering provides DevOps teams with a fully managed, centralized hub for all types of logs. Teams can ingest logs from the entire stack, then parse and enrich them with contextual information. Further, operators can add tags for usage attribution, generate metrics and quickly identify log anomalies.

However, there are cases where teams want individual logs forwarded on to third party systems for further analysis, auditing or compliance reasons. In order to make this forwarding of certain logs seamless, Datadog introduced Log Forwarding to supplement Log Pipelines. Log Forwarding allows users to distribute logs from Datadog to other destinations like Splunk, Elasticsearch and even HTTP endpoints. The tool provides in-depth filtering options and dual shipping capabilities, so that operators can send standardized logs to other systems to consume.

Log Forwarding was introduced in limited availability. Existing Log Management customers can request access to Log Forwarding to take advantage of this new capability.

Data Stream Monitoring

As real-time data streaming technologies like Kafka, RabbitMQ and Kinesis proliferate, tracking the performance of message and event distribution is a critical component of application health. An individual application service can be functioning perfectly, but if it communicates with other services through a queue or pub/sub technology, the service’s output might not reach its destination if those streaming technologies are malfunctioning. To offer DevOps teams insight into these systems, Datadog is launching a private beta of Data Stream Monitoring.

Data Streams Monitoring provides observability for asynchronous, message-based architectures. It gives operators visibility into end-to-end event pipelines to track arrival at destination and traversal times. When delivery issues arise, they can troubleshoot message producers, consumers and queues. Finally, once the issue is addressed, operators can safely restart message flows without overwhelming downstream systems.

Data Streams Monitoring can be activated by existing Datadog customers on request. After a period of beta testing, it will likely be released into GA as part of the APM suite. This represents another useful extension to Datadog’s observability suite as the use of data streaming technologies becomes commonplace in software infrastructure stacks.

Investor Take-Aways

Datadog Dash 2022 delivered a substantial set of new capabilities to the Datadog platform. As I highlighted earlier, they again stepped up the product delivery volume, announcing 80% more product offerings and enhancements over the prior year. This provides a testament to Datadog’s rapid product delivery cadence, which stems from their ability to invest more in R&D than Sales and Marketing.

While the cross-selling of new modules to the same DevSecOps buyer streamlines the sales process, a potential downside is the ever-increasing bill. Customers do complain about how much they pay Datadog, as it becomes a larger and larger line item in their budget. This is a consequence of selling more modules. I think this represents a perception issue that the Datadog GTM team has to address, primarily by underscoring the value of the platform and cost savings by consolidating away from multiple point solutions.

Datadog’s increasing product scope does not introduce a real opportunity for less expensive point solutions or open source projects to inject themselves into customer environments. While an open source project may appear less expensive on the surface, the DevOps team will regress in efficiency. They will revisit the frustration of toggling between tools and the confusion of reconciling signals over multiple data sources. It’s the shared view across all DevSecOps functions that gives Datadog its pricing power.

The evidence that their product module strategy is resonating can be found in Datadog’s published metrics for multiple module adoption. The growth of customers using 2+, 4+ and now 6+ products has been pretty consistent over time. If there were a problem with customer spend expansion, it would have been reflected in these metrics or DBNER at this point.

With that said, as we look forward to the next few quarters, macro will likely impact what is otherwise a cogent product strategy. As enterprises slow down digital transformation projects and look for ways to optimize their IT budget, then Datadog’s revenue growth would decelerate. Some of that slowdown would be offset by cross-sell of more modules, but not all of it, if enterprises really start to cut back. Hints of this cost optimization surfaced in Q2 and we will likely see more.

If we zoom out to a longer time period and assume that the current macro headwinds normalize at some point, then I do think Datadog is well positioned to remain the leading provider in the observability space and is not at risk of encroachment from below or commoditization. As long as technical personnel are in short supply and application topologies become more complex, enterprises will be willing to pay for a full-featured, all-encompassing observability platform.

NOTE: This article does not represent investment advice and is solely the author’s opinion for managing his own investment portfolio. Readers are expected to perform their own due diligence before making investment decisions. Please see the Disclaimer for more detail.

Additional Reading

Muji at Hhhypergrowth published a review of the announcements from Dash 2022 as well. He and I agree on many points and his perspective lends readers additional appreciation for the different product motions. Interested readers can check out his coverage of Datadog. Some content is for subscribers, but is well worth the cost.

4 Comments

Tom
October 26, 2022 at 1:42 am

Thanks Peter for this. AWesome update as always. Out of curiosity, have you looked into Confluent? Feels like it would be right in your wheelhouse….
- poffringa (Post author)
  October 27, 2022 at 8:43 am
  
  Thanks, Tom. I am familiar with Confluent. My hesitation with initiating coverage and opening a position revolves around a couple of concerns. First, they are monetizing a largely open source project. My experience with those companies is that they start to face headwinds as they grow larger and few have been able to maintain high growth for a longer period of time, particularly as they approach $1B in revenue. We have seen this with Elastic, Hashicorp and to some extent (although macro as well) MongoDB. Second, and more important, is that I don’t see the market as being as large for “real time data streaming” as Confluent represents. I agree that the technology is useful to transport data, but there are multiple ways to accomplish that and the use cases that really benefit from real-time streaming represent a small subset of all functionality in a modern software stack.
Michael Orwin
October 26, 2022 at 9:20 pm

Thanks for the very useful recap. Is Datadog going to run out of opportunities for new products in DevSecOps? I’m thinking of need and usefulness, rather than the attitude to the “ever-increasing bill”.
- poffringa (Post author)
  October 27, 2022 at 8:44 am
  
  Hi Michael – that’s a fair question. I can see many more directions for them, like product analytics, security and pushing further into developer operations. They haven’t run out of opportunities yet, but would be something to watch.

Evolving Product Strategy

Sponsored by Cestrian Capital Research