Performance Benchmarks

Proof of the pudding for scalability of Obsrv

Note: This is a work in progress page. Following results are from initial benchmarks. Detailed benchmarks will be added shortly once the benchmark exercise is completed

Cluster Size

Config Name
Config Value

Number of Nodes

4

Node Size

4 core, 16 Gb

PV size

1 TB

Installation Mode

Obsrv with monitoring and real-time storage

Processing Benchmarks

Processing benchmark is independent on number of datasets created, hence the strategy is to test with volume with all configurations enabled. Disabling any configuration is going to improve throughput

Configuration 1

  1. Dedup turned on

  2. De-normalization configured on 2 master datasets

  3. Transformations configured on 2 fields

  4. Event size of 1 kb

Results

Flink Configuration
Events per Min
Events per hour
Events per Day

1 CPU, 1GB, 1 task slot, 1 parallelism

~ 13k | 13 Mb

~ 750k | 780Mb

~ 18Million | 18Gb

2 CPU, 2GB, 2 task slot, 2 parallelism

~ 30k | 30 Mb

~ 1.8Million | 1.8Gb

~ 40Million | 40Gb

4 CPU, 4Gb, 4 task slots, 4 parallelism

In Progress

In Progress

In Progress

Note: Many other scenarios with varying flink configurations are under benchmarking and will be updated post completion

Secor Backups Benchmark

To ensure there is no data loss across obsrv pipeline all data is backuped to object store using S3. Following are the benchmark results of Secor backups in real-time

Configuration 1

  1. Total Secor processes - 7

  2. Total CPU Allocated - 1.5 cpu

  3. Event size of 1 kb

Results

Events per Min
Events per hour
Events per Day
Events per process

~ 1.6 Million | 1.6Mb

~ 100 Million | 100Gb

~ 2.4 Billion | 2.4Tb

~ 300 Million | 300Gb

Note: In DIKSHA we have observed each secor process with 1cpu was able to upload 200Million events (200 Gb) to Azure blob storage

Druid Indexing Benchmark

Druid indexing benchmark is dependent on number of datasets created and number of aggregate tables. This benchmark is done with minimal configuration only and can actually linearly scale with the number of CPUs provided

Minimum Configuration

Config Name
Config Value

Process Name

Druid Indexer

CPU

0.5

Direct Memory

2Gi

Heap

9Gi

GlobalIngestionHeap

8Gi

Workers Count

30

Pod Memory

11Gi

Results

Num of Tables
Events per Min
Events per hour
Events per Day

1

~ 80k | 80 Mb

~ 4.8 Million | 4.8 Gb

~ 110 Million | 110 Gb

2

~ 40k | 40 Mb

~ 2.4 Million | 2.4 Gb

~ 55 Million | 55 Gb

3

In Progress

In Progress

In Progress

4

In Progress

In Progress

In Progress

5

~ 35k | 35 Mb

~ 2.1 Million | 2.1 Gb

~ 50 Million | 50 Gb

Note: How does the indexing scale when more cpu resources are provided will be added once the benchmark is complete

Query Benchmark

Similar to processing, query benchmark is dependent on the volume of data but not on the number of datasets (or tables) created. Query performance will increase linearly with the amount of CPU/Memory assigned to the Druid Historical process

Minimum Configuration

Config Name
Config Value

Process Name

Druid Historical

CPU

2

Direct Memory

4608Mi

Heap

1Gi

Pod Memory

5700Mi

Segment Size

4.77Gi

No. of rows per segment

5000000

processing.numThreads

2

processing.numMergeBuffers

6

Concurrency

100

RAW Table Results

Query
Query Interval
Throughput
Response Times (in ms)

Group by on Raw Data

1 Day

25 r/s

Avg | Min | Max | 90th

392 | 80 | 686 | 472

Group by on Raw Data

7 Days

4 r/s

Avg | Min | Max | 90th

4933 | 1277 | 8382 | 5154

Group by on Raw Data

30 Days

In Progress

In Progress

Aggregate (Rollup) Table Results

Query
Query Interval
Throughput
Response Times (in ms)

Group by on Aggregate Data

1 Day

In Progress

In Progress

Group by on Aggregate Data

7 Days

In Progress

In Progress

Group by on Aggregate Data

30 Days

In Progress

In Progress

Note: Multiple query types with varying interval and historical configuration combinations are being benchmarked actively and results will be updated once the activity is completed.

Last updated