Sunbird Obsrv
  • Introduction
    • The Value of Data
    • Data Value Chain
    • Challenges
    • The Solution: Obsrv
  • Core Concepts
    • Obsrv Overview
    • Key Capabilities
    • Datasets
    • Connectors
    • High Level Architecture
    • Tech Stack
    • Monitoring
  • Explore
    • Roadmap
    • Case Studies
      • Agri Climate Advisory
      • Learning Analytics at Population Scale
      • IOT Observations Infra
      • Data Driven Features in Learning Platform
      • Network Observability
      • Fraud Detection
    • Performance Benchmarks
  • Guides
    • Installation
      • AWS Installation Guide
      • Azure Installation Guide
      • GCP Installation Guide
      • OCI Installation Guide
      • Data Center Installation Guide
    • Dataset Management APIs
    • Dataset Management Console
    • Connector APIs
    • Data In & Out APIs
    • Alerts and Notification Channels APIs
    • Developer Guide
    • Example Datasets
    • Connectors Developer Guide
      • SDK Assumptions
      • Required Files
        • metadata.json
        • ui-config.json
        • metrics.yaml
        • alerts.yaml
      • Obsrv Base Setup
      • Dev Requirements
      • Interfaces
        • Stream Interfaces
        • Batch Interfaces
      • Classes
        • ConnectorContext Class
        • ConnectorStats Class
        • ConnectorState Class
        • ErrorData Class
        • MetricData Class
      • Verifying
      • Packaging Guide
      • Reference Implementations
    • Coming Soon!
  • Community
  • Previous Versions
    • SB-5.0 Version
      • Overview
      • USE
        • Release Notes
          • Obsrv 2.0-Beta
          • Obsrv 2.1.0
          • Obsrv 2.2.0
          • Obsrv 2.0.0-GA
          • Obsrv 5.3.0-GA
          • Release V 5.1.0
          • Release V 5.1.2
          • Release V 5.1.3
          • Release V 5.0.0
          • Release V 4.10.0
        • Installation Guide
        • Obsrv 2.0 Installation Guide
          • Getting Started with Obsrv Deployment Using Helm
        • System Requirements
      • LEARN
        • Functional Capabilities
        • Dependencies
        • Product Roadmap
        • Product & Developer Guide
          • Telemetry Service
          • Data Pipeline
          • Data Service
          • Data Product
            • On Demand Druid Exhaust Job
              • Component Diagram
              • ML CSV Reports
              • Folder Struture
          • Report Service
          • Report Configurator
          • Summarisers
      • ENGAGE
        • Discuss
        • Contribute to Obsrv
      • Raise an Issue
  • Release Notes
    • Obsrv 1.1.0 Beta Release
    • Obsrv 1.2.0-RC Release
Powered by GitBook
On this page
  • Infrastructure Requirements
  • 1. System Specifications
  • 2. Networking Setup
  • Prerequisites
  • Installation Steps
  • 1. Clone the Obsrv Repository
  • 2. Configure the Kubernetes Cluster
  • 3. Run the Installation Script
  • 4. Verify the Cluster
  • Helm Chart Configuration
  • 1. Navigate to the Helm Chart Directory
  • 2. Update AWS Cloud Configuration
  • 3. Update Domain Configuration
  • 4. Update Private and Public Keys
  • 5. Install Obsrv
  • Post-Installation Verification
  • 1. Check Kubernetes Components
  • Upgrade Steps

Was this helpful?

Edit on GitHub
  1. Guides
  2. Installation

AWS Installation Guide

This guide provides detailed, step-by-step instructions for installing and configuring Obsrv on AWS, utilizing Terraform, Terragrunt, and Helm.


Infrastructure Requirements

1. System Specifications

  • CPU Requirements:

    • Minimum: 19 CPUs.

    • Optimal Configuration: 5 nodes with 4 cores each, totaling 80GB of RAM.

The installation package includes both lakehouse and real-time OLAP storage by default. If the lakehouse component is not required, only the real-time OLAP storage can be installed, reducing requirements to 16 CPUs and 64GB of RAM.

In this case, we recommend using 2 nodes with 8 cores each, totaling 64GB of RAM, by selecting the t2.2xlarge AWS instance type.

  • Availability Zones: All instances should be within the same availability zone to minimize cross-zone data transfer costs. The Obsrv installer will automatically create the EKS (Elastic Kubernetes Service) cluster for you.

2. Networking Setup

  • CIDR Block: Use a /23 CIDR range (512 IPs) for your environment.

    • Example: A VPC with 10.0.0.0/23 provides IPs from 10.0.0.0 to 10.0.1.255.

  • Subnets: Ensure subnets are created in all availability zones within your AWS region.


Prerequisites

Before beginning the installation, make sure the following tools are installed on your Linux-based system:

Tool

Version

Installation Command

Official Documentation

Terraform

1.5.x or earlier

curl "https://releases.hashicorp.com/terraform/1.5.2/terraform_1.5.2_linux_amd64.zip" -o terraform.zip && unzip terraform.zip && sudo mv terraform /usr/local/bin/ && rm terraform.zip

Terragrunt

0.48 or later

curl -OL https://github.com/gruntwork-io/terragrunt/releases/download/v0.49.0/terragrunt_linux_amd64 && sudo mv terragrunt_linux_amd64 /usr/local/bin/terragrunt && sudo chmod +x /usr/local/bin/terragrunt

Helm

3.10.2 or later

curl https://get.helm.sh/helm-v3.10.2-linux-amd64.tar.gz -o helm.tar.gz && tar -zxvf helm.tar.gz && sudo mv linux-amd64/helm /usr/local/bin/

AWS CLI

2.10 or later

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && unzip awscliv2.zip && sudo ./aws/install



Installation Steps

1. Clone the Obsrv Repository

Start by cloning the Obsrv automation repository and checkout to either latest release tag or master.

git clone https://github.com/Sunbird-Obsrv/obsrv-automation.git

2. Configure the Kubernetes Cluster

By executing the following commands which will bring up the kubernetes cluster in the AWS environment of configured region.

  1. Navigate to the Configuration Directory:

    cd ./obsrv-automation/terraform/aws/vars
  2. Update Configuration Files:

    • Open cluster_overides.tf and modify the configuration values to match your environment.

    building_block = "obsrv"
    env = "dev"
    region = "us-east-2"
    availability_zones = ["us-east-2a", "us-east-2b", "us-east-2c"]
    timezone = "UTC"
    create_kong_ingress = "true"
    create_vpc = "true"
    create_velero_user = "true"
    eks_node_group_instance_type = ["t2.xlarge"] # Choose depending on your requirements by considering the CPU requirements
    eks_node_group_capacity_type = "ON_DEMAND"
    eks_node_group_scaling_config = { desired_size = 5, max_size = 5, min_size = 1 } # Choose depending on your requirements by considering the CPU requirements
    eks_node_disk_size = 100
  3. Configure S3 for Cluster State:

    • Open obsrv.conf and update your AWS credentials and bucket names.

    AWS_ACCESS_KEY_ID=<your_access_key_id>
    AWS_SECRET_ACCESS_KEY=<your_secret_access_key>
    AWS_DEFAULT_REGION="us-east-2"
    KUBE_CONFIG_PATH="$HOME/.kube/obsrv-kube-config.yaml"
    AWS_TERRAFORM_BACKEND_BUCKET_NAME="obsrv-tfstate"
    AWS_TERRAFORM_BACKEND_BUCKET_REGION="us-east-2"

3. Run the Installation Script

  1. Make the Script Executable:

    chmod +x ./obsrv.sh
  2. Run the Installation:

    • To start the installation, run the script:

    ./obsrv.sh install --config ./obsrv.conf --install_dependencies false
    • If you want the installer to automatically handle dependencies, set install_dependencies=true.

4. Verify the Cluster

Once the installation completes, verify that your Kubernetes cluster is up and running:

kubectl get nodes

This should show the nodes in your Kubernetes cluster.


Helm Chart Configuration

1. Navigate to the Helm Chart Directory

cd ./obsrv-automation/helmcharts/

2. Update AWS Cloud Configuration

Modify global-cloud-values-aws.yaml with the appropriate values for your environment:

global:
  cloud_storage_provider: "aws"
  cloud_store_provider: "s3"
  cloud_storage_region: "<region>"
  dataset_api_cloud_bucket: "<dataset_bucket_name>"
  config_api_cloud_bucket: "<config_bucket_name>"
  postgresql_backup_cloud_bucket: "<backup_bucket_name>"
  redis_backup_cloud_bucket: "<redis_backup_bucket_name>"
  velero_backup_cloud_bucket: "<velero_backup_bucket_name>"
  cloud_storage_bucket: "<storage_bucket_name>"
  hudi_metadata_bucket: "s3a://<hudi_bucket_name>/hudi"
  cloud_storage_config: |
    '{"identity":"<access-key>","credential":"<secret-key>","region":"<region-name>"}'

  storage_class_name: "gp2"
  checkpoint_bucket: "s3://<checkpoint-bucket-name>"
  s3_access_key: "<aws-access-key>"
  s3_secret_key: "<aws-secret-key>"

kong_annotations:
  service.beta.kubernetes.io/aws-load-balancer-type: nlb
  service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
  service.beta.kubernetes.io/aws-load-balancer-eip-allocations: "<elastic-ip>"
  service.beta.kubernetes.io/aws-load-balancer-subnets: "<subnet-id>"

service_accounts:
  enabled: true
  secor: eks.amazonaws.com/role-arn: "<role-arn>"
  dataset_api: eks.amazonaws.com/role-arn: "<role-arn>"
  config_api: eks.amazonaws.com/role-arn: "<role-arn>"
  druid_raw: eks.amazonaws.com/role-arn: "<role-arn>" 
  flink: eks.amazonaws.com/role-arn: "<role-arn>" 
  postgresql_backup: eks.amazonaws.com/role-arn: "<role-arn>" 
  redis_backup: eks.amazonaws.com/role-arn: "<role-arn>" 
  s3_exporter: eks.amazonaws.com/role-arn: "<role-arn>" 
  spark: eks.amazonaws.com/role-arn: "<role-arn>" 

velero-backup:
  credentials:
    useSecret: true
    secretContents:
      cloud: |
        [default]
        aws_access_key_id="<aws-access-key>"
        aws_secret_access_key="<aws-secret-key>"

trino:
  additionalCatalogs:
    lakehouse: |-
      connector.name=hudi
      hive.metastore.uri=thrift://hudi-hms.hms.svc:9083
      hive.s3.aws-access-key=<aws-access-key>
      hive.s3.aws-secret-key=<aws-secret-key>
      hive.s3.ssl.enabled=false

3. Update Domain Configuration

In global-values.yaml, replace <domain> with your actual domain or Elastic IP:

domain: "<domain>.sslip.io"

4. Update Private and Public Keys

Follow these steps to generate and configure the private and public keys for the web console and dataset API:

Step 1: Generate a Private Key

Run the following command to generate a private key for the web console:

openssl genpkey -algorithm RSA -out private_key.pem -pkeyopt rsa_keygen_bits:2048
  • Open the generated private_key.pem file.

  • Copy its contents and update the USER_TOKEN_PRIVATE_KEY field in the following file: obsrv-automation/helmcharts/services/web-console/values.yaml

Example:

USER_TOKEN_PRIVATE_KEY: |-
    <paste-private-key-here>

Step 2: Generate a Public Key

Using the private key generated above, create a public key with the following command:

openssl rsa -pubout -in private_key.pem -out public_key.pem
  • Open the generated public_key.pem file.

  • Copy its contents and update the user_token_public_key field in the following file: obsrv-automation/helmcharts/services/dataset-api/values.yaml

Example:

user_token_public_key: <paste-public-key-here>

5. Install Obsrv

Make the script executable and set the environment variables and run the installation:

export cloud_env=aws
export AWS_ACCESS_KEY_ID=<aws-access-key>
export AWS_SECRET_ACCESS_KEY=<aws-secret-key>
export AWS_DEFAULT_REGION=<aws-region>
export KUBE_CONFIG_PATH="$HOME/.kube/obsrv-kube-config.yaml"
export KUBECONFIG="$HOME/.kube/obsrv-kube-config.yaml"
chmod +x ./kitchen/install.sh
./kitchen/install.sh all

Post-Installation Verification

After completing the installation, follow these steps to verify that all components are running correctly:

1. Check Kubernetes Components

  1. Verify all pods are running:

    kubectl get pods -A

    All pods should be in Running state. Common namespaces to check:

    • flink: Core Pipeline

    • monitoring: Monitoring stack

    • dataset-api: Dataset APIs

    • web-console: Dataset Management console

  2. Check Services:

    kubectl get svc -A

    Verify that essential services have external IPs assigned, particularly the Kong service.

If any component fails these checks, refer to the component-specific logs:

kubectl logs -f <pod-name> -n <namespace>

Upgrade Steps

  1. Pull the Latest Code:

    cd ./obsrv-automation
    git pull
    cd ./automation-scripts/infra-setup
  2. Update Configurations: Review and update configuration values as needed.

  3. Run Terraform for Upgrade:

    ./obsrv.sh install --config ./obsrv.conf --install_dependencies false
  4. Upgrade with Updated Cloud Values:

    export cloud_env=aws
    export AWS_ACCESS_KEY_ID=<aws-access-key>
    export AWS_SECRET_ACCESS_KEY=<aws-secret-key>
    export AWS_DEFAULT_REGION=<aws-region>
    export KUBE_CONFIG_PATH="$HOME/.kube/obsrv-kube-config.yaml"
    export KUBECONFIG="$HOME/.kube/obsrv-kube-config.yaml"
    chmod +x ./kitchen/install.sh
    ./kitchen/install.sh all

By following these steps, you will ensure a successful installation and configuration of Obsrv on AWS.

PreviousInstallationNextAzure Installation Guide

Last updated 5 months ago

Was this helpful?

Terraform Install
Terragrunt Install
Helm Install
AWS CLI Install