Getting Started with Obsrv Deployment Using Helm
Sunbird Obsrv is a high-performance, cost-effective data stack with several components such as ingestion, querying, processing, backup, visualisation and monitoring. Obsrv 2.0 can be either installed using Terraform
(Infrastructure as Code tool) or using Helm
(Kubernets Package Manager).
Prerequisites
Obsrv runs completely on a Kubernetes cluster. A completely functional Kubernetes cluster is expected for a seamless Obsrv installation.
Hardware
Obsrv can support a volume of 5 million events per day with an average size of each event to be around 5 kb with the following specifications.
Kubernetes version of 1.25 or greater
Minimum of 16 cores of CPU
Minimum of 64 GB of RAM
PersistentVolume support in the Kubernetes cluster
Support for LoadBalancer service to externally expose some of the Obsrv services. Popular implementations such as MetalLB or Traefik can be used to expose the services using external IPs.
Software
Helm
Helm Dependencies
Run the following helm repo add
command to download the required dependencies for running Obsrv.
monitoring -
https://prometheus-community.github.io/helm-charts
redis -
https://charts.bitnami.com/bitnami
loki (version - 4.8.0 ) -
https://grafana.github.io/helm-charts
promtail (version - 6.9.3 ) -
https://grafana.github.io/helm-charts
velero (version - 3.1.6 ) -
https://vmware-tanzu.github.io/helm-charts
Source Code
Clone the obsrv-automation github repository. The required list of helm charts to deploy Obsrv will be under the terraform/modules/helm
directory.
Resources and Services
Please be advised that the list of resources will be completely different for different cloud service providers.
Common
The following list of buckets/containers need to be created for different services to store the data. This is applicable to Object Storage such as MinIO/Ceph
as well.
flink-checkpoints
velero-backup
obsrv
AWS
IAM role with
AmazonS3FullAccess
policy. Services such as Api, Druid, Flink, Secor need to read and write access to S3 buckets.Velero is a service which provides backups of the entire Obsrv cluster state through snapshots. Velero backup service needs a restricted user access to upload the snapshot state onto S3. The following IAM role policy needs to be attached to user created for velero backup. The access keys needs to be generated for the velero backup user as well.
Serive Accounts: Service accounts enable access of the S3 object storage without the need for the access keys. If you prefer to use keys instead, you can skip the creation of service accounts. The list of service accounts needed
Dataset API with the name
dataset-api-sa
Druid with the name
druid-raw-sa
Flink with the name
flink-sa
Secor with the name
secor-sa
Deployment Instructions
Helm package manager provides an easy way to install specific components using a generic command. Configurations can be overriden by updating the
values.yaml
file in the respective Helm charts.
Prerequisites
Kubernetes Cluster Access
Helm package manager needs access to the Kubernetes cluster. The path to the KUBECONFIG file needs to be exported as an environment variable, either in the current shell or in environment configuration files such as .bashrc
Postgres
Postgres is a RDBMS database which is used as the metadata store
Redis
Redis is an in-memory key-value store primarily used as a distributed cache
Prometheus
Prometheus is a monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
Kafka
Apache Kafka is a distributed event store and stream-processing platform.
The following list of kafka topics are created by default. If you would like to add more topics to the list, you can do so by adding it to provisioning.topics
configuration in the values.yaml file.
dev.ingest
masterdata.ingest
Druid
Druid is a high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale
Druid CRD
Druid Cluster
Druid requires the following set of configurations to be provided for specific storage systems such as AWS S3, Azure Blob Storage, GCP Storage or MinIO/Ceph
AWS
MinIO/Ceph
Azure
GCP
API
This service provides metadata APIs related to various resources such as datasets/datasources in Obsrv. The following configurations need to be specified in the values.yaml file.
AWS
Flink Streaming Jobs
Flink jobs are used to process and enrich the data ingested into Obsrv in near-realtime.
Configuration Overrides
AWS
MinIO/Ceph
Azure
GCP
Flink Merged Pipeline Job
Flink Master Data Processor Job
Backup Processes
Secor
Configuration Overrides
AWS
MinIO/Ceph
Azure
GCP
Hadoop
Secor backups are performed from various kafka topics which are part of the data processing pipeline. The following list of backup names need to be replaced in the below mentioned command.
List of backup names
ingest-backup
extractor-duplicate-backup
extractor-failed-backup
raw-backup
failed-backup
invalid-backup
unique-backup
duplicate-backup
denorm-backup
denorm-failed-backup
system-stats
system-events
Velero
Monitoring Services
Monitoring Dashboards
Monitoring Alert Rules
Druid Exporter
Kafka Exporter
Postgres Exporter
Loki
Promtail
Ingestion
This helm chart is used to submit the default ingestion tasks required for the system statistics events
Visualization
Superset
Loadbalancers
Following is a list of services which are exposed as a LoadBalancer service.
Component | Service Name | Description |
---|---|---|
Dataset API | service/dataset-api-service | Meta APIs |
Superset | service/superset | Data Visualization Tool |
Post Deployment
Please find documentation related to various application level functionalities in Obsrv below
Last updated