Building Analytics Apps for External Users? Here's How to Get it Right

June 01, 2022

David Wang
Imply

Analytics has come a very long way in recent years, with transformational developments widening its impact beyond internal stakeholders and into the hands of external users. This brings with it a range of challenges, and when building analytics applications for customers, for example, one of the first considerations will be the choice of database backend.

While the main options will probably include PostgreSQL, MySQL, or even extending a data warehouse beyond its core BI dashboards and reports, it's important to keep in mind that analytics for external users can be revenue-impacting. As a result, choosing the right tool for the job is essential if organizations are to deliver a high-quality user experience.

Analytics Performance

For many users, the most frustrating element of their analytics experience is performance, and in particular, the wait-state of queries in a processing queue. It's one thing to have an internal business analyst wait a few seconds or even several minutes for a report to load; it's entirely different when analytics functionality is being offered to external users whose tolerance of processing delays will be much lower.

This problem is generally caused by the amount of data to analyze with the processing power of the database and the number of users and API calls. Collectively, this determines how well the database can keep up with the application.

However, there are a variety of approaches to building an interactive data experience with any generic OLAP database when there's a lot of data. The issue? These come at a price. For instance, precomputing all the queries makes the architecture very expensive and rigid, while aggregating the data first minimizes the insights, and limiting the data analyzed to recent events doesn't give users the complete picture. All of these ways involve making compromises.

There is, however, a no-compromise approach that can deliver an optimized architecture and data format built for interactivity at scale. This comes in the form of Apache Druid — a high-performance, real-time analytics database to power analytics applications for any number of users.

Druid employs a uniquely distributed and elastic architecture that prefetches data from a shared data layer into a near-infinite cluster of data servers. This architecture enables faster performance than a decoupled query engine like a cloud data warehouse because there's no data to move and more scalability than a scale-up database like PostgreSQL/MySQL.

Furthermore, Druid provides automatic, multi-level indexing that is built into the data format to drive more queries per core. This goes beyond the typical OLAP columnar format with the addition of a global index, data dictionary, and bitmap index. In doing so, it maximizes CPU cycles for faster crunching.

High Availability Should be a High Priority

To illustrate the value of these capabilities, consider this scenario: if a dev team is building a backend for internal reporting, does it really matter if it goes down for a few minutes or even longer?

For most, the answer is probably not and explains why there's always been tolerance for unplanned downtime and maintenance windows in classical OLAP databases and data warehouses.

But what if the dev team then needs to build an external analytics application that customers will use?

Any outages here can impact revenue, with a serious knock-on effect on issues as varied as team resources to customer satisfaction. As a result, resilience — both high availability and data durability — must be a priority when choosing a database for external analytics applications.

Delivering resilience means posing some important design criteria questions — can you protect from a node or a cluster-wide failure?

How bad would it be to lose data?

What work is involved to protect your app and your data?

The legacy approach to achieving greater resiliency is to replicate nodes and to remember to take backups. But when dev teams are building apps for customers, the sensitivity to data loss is much higher, and as a result, occasional backups aren't fit for purpose.

In contrast, Druid's core architecture is designed to withstand downtime without losing data (even recent events) by implementing high availability (HA) and durability based on automatic, multi-level replication with shared data in S3/object storage. This not only enables the HA properties dev teams expect but also a form of continuous backup that automatically protects and restores the latest state of the database even if an entire cluster is lost.

Cost-Performance Benefits

Building a database that delivers high concurrency means striking the right balance between CPU usage, scalability, and cost. Historically, addressing concurrency was a matter of allocating more hardware to the challenge, and while adding more CPUs certainly allows organizations to run more queries, it can easily become very expensive.

In contrast, databases like Apache Druid are built with optimized storage and query engine that drives down CPU usage. By only reading the data it needs to, the infrastructure can serve more queries in the same timespan.

This is also an important consideration when building external applications that will deliver the performance and resilience required both today and in the future. For those organizations focused on customer retention, being able to scale their infrastructure is key to remaining competitive.

David Wang is VP of Product Marketing at Imply

Industry News

webAI and MacStadium Announce Partnership to Power World's Largest AI Models with Apple Silicon

March 27, 2025

webAI and MacStadium(link is external) announced a strategic partnership that will revolutionize the deployment of large-scale artificial intelligence models using Apple's cutting-edge silicon technology.

Akamai Supports kernel.org

March 27, 2025

Development work on the Linux kernel — the core software that underpins the open source Linux operating system — has a new infrastructure partner in Akamai. The company's cloud computing service and content delivery network (CDN) will support kernel.org, the main distribution system for Linux kernel source code and the primary coordination vehicle for its global developer network.

Komodor Announces New Capabilities for Automating Kubernetes Drift Management

March 27, 2025

Komodor announced a new approach to full-cycle drift management for Kubernetes, with new capabilities to automate the detection, investigation, and remediation of configuration drift—the gradual divergence of Kubernetes clusters from their intended state—helping organizations enforce consistency across large-scale, multi-cluster environments.

Red Hat OpenShift AI 2.18 and Red Hat Enterprise Linux AI 1.4 Released

March 26, 2025

Red Hat announced the latest updates to Red Hat AI, its portfolio of products and services designed to help accelerate the development and deployment of AI solutions across the hybrid cloud.

CloudCasa by Catalogic Announces Latest Release

March 26, 2025

CloudCasa by Catalogic announced the availability of the latest version of its CloudCasa software.

BrowserStack Launches Private Devices

March 26, 2025

BrowserStack announced the launch of Private Devices, expanding its enterprise portfolio to address the specialized testing needs of organizations with stringent security requirements.

Chainguard Libraries Released in Beta

March 25, 2025

Chainguard announced Chainguard Libraries, a catalog of guarded language libraries for Java built securely from source on SLSA L2 infrastructure.

Cloudelligent Achieves AWS DevOps Competency Status

March 25, 2025

Cloudelligent attained Amazon Web Services (AWS) DevOps Competency status.

Platform9 Launches Partner Program

March 25, 2025

Platform9 formally launched the Platform9 Partner Program.

Cosmonic Launches Cosmonic Control

March 24, 2025

Cosmonic announced the launch of Cosmonic Control, a control plane for managing distributed applications across any cloud, any Kubernetes, any edge, or on premise and self-hosted deployment.

Oracle and Microsoft Add New Services to Oracle Database@Azure

March 20, 2025

Oracle announced the general availability of Oracle Exadata Database Service on Exascale Infrastructure on Oracle Database@Azure(link sends e-mail).

Perforce Acquires Snowtrack

March 20, 2025

Perforce Software announced its acquisition of Snowtrack.

Mirantis and Gcore Partner on AI Infrastructure

March 19, 2025

Mirantis and Gcore announced an agreement to facilitate the deployment of artificial intelligence (AI) workloads.

Amplitude Announces Session Replay Everywhere

March 19, 2025

Amplitude announced the rollout of Session Replay Everywhere.

Oracle Releases Java 24

March 18, 2025

Oracle announced the availability of Java 24, the latest version of the programming language and development platform. Java 24 (Oracle JDK 24) delivers thousands of improvements to help developers maximize productivity and drive innovation. In addition, enhancements to the platform's performance, stability, and security help organizations accelerate their business growth ...

DEVOPSdigest

Analytics Performance

High Availability Should be a High Priority

Cost-Performance Benefits

Industry News

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

The Latest

Hot Topics

Analytics Performance

High Availability Should be a High Priority

Cost-Performance Benefits

Related Links

Industry News

Search form

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

User login

The Latest

Hot Topics