Product Roadmap

Sunbird Obsrv 2.0 has been redesigned from ground up to allow ingestion, processing and querying of the telemetry data to be agnostic of the data specification of the telemetry data. The last stable release of Sunbird Obsrv 1.0, which is tightly coupled of the Sunbird Telemetry Specification will be Release 5.1.1. The product roadmap for Sunbird Obsrv 2.0 has been detailed out below with the features being logically grouped under specific functional features.

Sunbird Obsrv ISSUE TRACKER : This is the link to the set of issues/ submissions or requests that are being considered for development as part of the Sunbird Obsrv roadmap. You can upvote an issue if you find it relevant, or add a new issue to the list

Obsrv AMJ-2024 Stories

  1. APIs Refactoring

    • Dataset CRUD API

    • Dataset Management API

    • Data In Out APIs

    • Query Template APIs

  2. Automation Refactoring

    • Helm charts refactoring

    • Wrapper over helm charts

  3. Connectors

    • Connectors Framework

    • Connectors Management

    • Connectors Implementation

  4. Hudi Integration

    • Create an ingestion spec for Lakehouse tables

    • Streaming job implementation to write Raw Data to Lakehouse

    • Timestamp Based Partitioner for Apache Hudi

    • Hudi Sink Configuration Optimization

    • Deployment Automation for Hudi Sink Connector - AWS/Local Datacenter

    • Query API Unification for the Lakehouse/Real-Time store - Design

    • Query API Unification for the Lakehouse/Real-Time store - Implementation

    • Dedup Challenges with introduction of Lakehouse - Design

    • Dedup Challenges with introduction of Lakehouse - Implementation

    • Design Rollups for the Lakehouse

    • Rollups implementation for the Lakehouse

    • Deployment Automation for Hudi Sink Connector - Azure

    • Deployment Automation for Hudi Sink Connector - GCP

Obsrv 2.0.1 GA Release date - 29th Feb'24

Enhancements

  1. Core pipeline enhancments

    • To handle denormalization logic for empty keys, text & numeric keys.

    • Moved failed events sinking into a common base class.

    • Updated framework to created dynamicKafkaSink object.

    • Master dataset processor can now do denormalization with another master dataset as well.

  2. Tech software upgrades - Postgres, Druid, Superset, NodeJS, Kubernetes.

  3. Dataset Management - Added timezone handling to store the data in druid in the TZ specified by the dataset.

  4. Infra reliability - These feature enhances data management capabilities and provides flexibility in dataset maintenance.

    • Verification of metrics for various services.

    • Metrics intrumentation for few of the services.

  5. Enhancments to dataset API service - API endpoint changes.

  6. Benchmarking - Processing, Ingestion & Querying.

  7. Bug Fixes and Improvements

    • Fix unit tests and improve coverage.

    • Automation script enhancements to support new changes.

Feature

  1. Connector Framework implementation

  2. Hudi Data Ingestion

Obsrv 2.0.0 GA Release date - 31st Dec'23

360 degree observability

  1. Enhancements in datasets management - These feature enhances data management capabilities and provides flexibility in dataset maintenance.

    • Users can now delete draft datasets using the API.

    • User can now retire live datasets using the API.

    • Ability to configure the denorm on the master datasets

  2. Core pipeline enhancments - Core pipeline changes to route all the failed events with detailed summary to failed topic.

  3. Detailed Debugging with Query Store - Indexing all failed events into the query store for comprehensive and detailed debugging. This feature improves troubleshooting capabilities and accelerates issue resolution.

  4. Automated Backup Configuration - Automated the configuration of the backup system during installation, allowing users to define their preferred timezone. This ensures a seamless and customized backup setup for enhanced data protection.

  5. Aggregated and Filtered Data Sources - Introduced the option to create aggregated data sources and filtered data sources through the API. Also enhanced API to query on the rollup datasources. This feature provides users with more control over data sources, enabling customization based on specific requirements.

  6. Filtered Rollups - Ability to create rollup on a filter using API

  7. Bug Fixes and Improvements

    • Fixed the auto-conversion of the timestamp property to a string.

    • Automatically convert numeric field to Integer during ingestion into Query Store/Druid.

    • Improve the code coverage of obsrv core to 100%

Connector Ecosystem

These connectors enable the data IN and OUT of the platform and expand the reach of our platform.

  1. JDBC connector - This connector supports popular databases such as MySQL, PostgreSQL.

  2. Data Stream Source Connector - This connector supports real-time streaming data sources such as Apache Kafka.

Obsrv 2.2.0 Release date - 31st Oct'23

360 degree observability

  1. Addressing major vulnerabilities, making Obsrv BB free from vulnerabilities.

  2. Restarting Flink to pick up new datasets. The command service with the Flink restart command needs to be open-sourced.

  3. Few Bug fixes with components & deployment.

Obsrv 2.1.0 Release date - 30th Sep'23

360 degree observability

  1. Restarting Flink to pick up new datasets. The command service with the Flink restart command needs to be open-sourced.

  2. Refactor of Sunbird Ed Cache Updater Jobs

  3. Enhance the Device register/profile API to send transaction event to obsrv 2.0 system using data IN API or Kafka connector

Connector Ecosystem

  1. Object Storage Connectors - Cloud storages

  2. MinIO connector

  3. Connector Marketplace/Framework

  4. Postgresql Connector for data & denormalization

Obsrv 2.1.0 Release date - 31st Aug'23

360 degree observability

  1. Data Exhaust API - Ability to download the raw data using the Data Exhaust APIs

  2. Druid Query and Ingestion Submission Wrapper APIs - Ability to query and ingestion spec submission wrapper API

  3. APIs System to generate the Data IN/OUT Metrics Generation

  4. Integrate the Obsrv superset with Query Wrapper APIs

Simplified Operations

  1. Configure the obsrv to run with MinIO object store

  2. Support on multi channel alerts

  3. Enabling the labels for all the services

Obsrv 2.0.0 GA(Planned release date - 2 May'23)

360 degree observability

  1. Data Set Creation via APIs

  2. Data Denormalization with API & Push to Kafka

  3. Real time querying

  4. Druid SQL & JSON query interface

  5. Out-of-the-box visualizations with Superset

Simplified Operations

  1. One-click installation for AWS, Azure & GCP

  2. Standard Monitoring Capability

  3. Log Streaming with Grafana UI

Connector Ecosystem

  1. Kafka connector for data & denormalization

Unified Web Console

  1. General Cluster metrics

Release-5.1.0 (Planned release date - 04 Nov'22) - Projects

Project : Enabling ease of adoption

Task : One click install enhancements

a. Include the data products (work flow summary) as part of the one-click installer package. These allow for easy generation of common summaries once raw telemetry is generated by the adopter.

b. Include a sample data set and some pre-configured charts as part of the one-click installer package so that an adopter trying it out can get a feel of the types of charts that can be generated, and the kind of data required to be generated for this purpose.

Project: Enabling ease of adoption

Task : Additional documentation

Additional pending Sunbird Obsrv documentation work

Project: Enabling ease of adoption

Task: Creation of a Learning module for Sunbird Obsrv

Compile existing learning resources/ create new resources in order to put together a learning module/ course that will allow a developer to get familiar with the Obsrv building block, and potentially get "Obsrv certified" by earning a certificate after taking an assessment.

4.10.0 : Planned for 06 Jun '22

1. Functional Requirements: 4.10.0

- API level accsess to data files that power the reports and charts on the portal

- Ability to configure Sunbird datasets as Public or Private at an instance level

- KT for indexing variables into Druid

2. Deployment and Release Processes: 4.10.0

Build, Deploy and provisioning scripts : refactoring of the provisioning and deployment scripts for the BB- Sunbird dev and staging environments will be repurposed for each BB. The deployment scripts need to be refactored to sandbox the environment on Kubernetes for Sunbird Obsrv BB. This will be done by deploying the services and components on Kubernetes onto a separate configurable namespace for Sunbird BB

5.0.0 : Planned for 19 Aug'22

1. One click mini installation of data pipeline components on Kubernetes: 5.0.0

- Currently not all components are deployed onto Kubernetes and it takes significant effort from the adopter to get all the components up and running. This capability will allow the adopter to quickly install required components on Kubernetes and have the entire analytics platform up and running.

2. Publish the minimum number of Kubernetes nodes required for one click installation: 5.0.0

- Analyse and publish minimum number of nodes required to get the analytics platform up and running on Kubernetes. Also, publish the mandatory components required to get the installation up and running.

3. Migration of API Swagger documentation to Sunbird Obsrv Building block pages: 5.0.0

- Migrate existing Sunbrid Swagger API for analytics API and data exhaust APIs documentation to Sunbird Obsrv Building block pages.

4. Multi-cloud support for blob store: 5.0.0

Update the framework, cloud-storage sdk, Secor and Flink checkpoints to work with GCP as well- Generalize the analytics framework, cloud-storage-sdk, Secor and Flink checkpoints to work with AWS S3, Azure Blob Storage and Google Cloud Storage- Add relevant configuration files required for the generalization

- Be able to deploy existing microservices into a different namespace (SB Ed)

1. Documentation to configure data exhaust reports using the APIs: Detailed explanation of the different sections of the configuration schema.

- Add documentation to explain the configuration schema for the various data exhaust report API

- Add detailed documentation on the usage of the APIs

Last updated