Why we NEED workload management for Cloud Data warehouse products?

4 min readJun 26, 2021

Scaling

Cloud only database’s focus is to let you run your workloads irrespective of resource limitations. And this focus translated into Scaling the data warehouse by different means scale up, scale out… A common use case advocated to exemplify scaling is data ingest speeds. Traditionally, EDW (enterprise data warehouse) customers followed a strategy to ingest data during ideal times (weekends / over-night). But, with teams working globally in different time zones, finding idle time is difficult and during concurrent workloads data ingest suffers. The ideal solution would be to SCALE UP during data ingest and SCALE DOWN after data is ingested. There are many use cases around very need for scaling. And bulk of content is published to exaggerate the fact that SCALING is the silver bullet to all the problems that traditional databases had.

Remember, Cloud only database does NOT have any financial incentive for running workloads with optimal use of existing resources. And general strategy is to throw more and more resources (through scaling) to improve concurrency and performance. Poorly written workload may succeed with throwing resources, but at what cost? Major drawbacks are customer are in a way encouraged not to focus on workload optimization, workload management and just pay more and more for scaled up systems. And marketing gimmicks like per second billing just to cover up sins.

I am NOT doubting true use cases around SCALING and capabilities of rich SCALING features. But, would like to bring attention to Ethical Scaling. “Ethical Scaling” new term (coined/used by me) to emphasize the fact that basic application development/optimization, database workload management should be augmented along with scaling to unearth true potential of cloud native data warehouse products. Databases not having robust workload management will come up with marketing gimmicks to counter this, but it is not easy to fool the intellectual customer base.

Concurrency

People many times relate # of connections database can handle with concurrency. Most of the popular databases in market are derived from Postgres and Postgres can accept more than 100 (default) connections. But concurrency is little advanced concept, where database need to factor in database architecture, optimal use of resources, avoid starving etc. And then come up with configuration around concurrency with sole aim is to provide maximum throughput. For example, Netezza with default configuration can execute mixed customer workload and provide the same throughput that we can achieve with running workload with connection_max. In short, increasing # of connections doesn’t guarantee increased throughput and coming up with perfect concurrency settings depends on workload characteristics. So, achieving optimal concurrency setting needs to be continuous process purely driven by workloads.

Let’s discuss a framework for how customers can get maximum return on investment using workload management and scaling.

Framework for optimal resource usage and scaling resources

1. Optimized workloads by Application Developers

Poorly written queries are the crux of most of the performance problems associated with workloads. There is no escape from designing schema, considering database architecture and writing optimal queries. Nowadays, there is a lot of tooling available around developing optimized workloads.

2. Workload Management using database features

Popular databases like Netezza have time tested workload management, which can help to set up scheduler rules, guaranteed resource allocation for groups, short query bias and prioritized query execution to extract MAXIMUM performance during highly concurrent environment with existing resources. For example, adhoc user running uncontrolled query can give impression that the system is starving for resources and create request for scale up. A better approach would be to cap all the adhoc users under (say) 10% resource group allocation to avoid uncontrolled use of the system.

3. Identify exact performance problems and fix them

Performance issues associated with popular MPP data warehouse products are mostly around data skew, processing skew and network/compute bottlenecks. And generally, bottlenecks shift from one place to other if you solve the first bottleneck. With advances in database engine (especially in optimizer and planners), there are many tweaks available at different level (data ingestion, SQL, engine, workload management, resource allocation) to improve workload performance. You can start looking into historical workload data (query timings, resource allocation, plan files) and find bottlenecks. Once you find bottlenecks, it will be easy to come up avenues to resolve those bottlenecks, which help improve performance.

4. Ethical Scaling

Scaling data warehouse will help throw more resources, which will cater to increased resource demand from workloads. You will end up without of control costs and unhappy CFO if you don’t set appropriate scaling policies. Popular databases provide rudimentary features like starting a cluster immediately after the first query arrives in queue or only after the system estimates there is enough query load to keep the cluster busy. As I mentioned earlier, most of the database companies lack motivation for coming up with solid features which will avoid cost overruns due to scaling.

What the industry needs is workload informed autoscale, based on AI. While, industry awaits such a feature, come up with your own policies around scaling may be using following –

1. Do we need to increase capacity by a factor of 2X? Will it improve performance of workload with same (2X) factor?

2. How we can scale up and scale down in time bound fashion, so that cost-predictability is achieved?

3. Do we need autoscale (especially starting cluster immediately after query is queued)? Wouldn’t some system resources wasted?

IF the underlying platform provides “Ethical Scaling” THEN databases like Netezza with time tested workload management capabilities can provide the best overall utilization of the scaled/provided resources even if those resources are constrained. In short, SCALE ethically…

Conclusion

Scaling is a great feature if customers design workload, concurrency and scaling policies considering price/performance and use case driven approach. And don’t let themselves run into marketing gimmicks, resulting cost overruns.