Sunbird Obsrv
  • Introduction
    • The Value of Data
    • Data Value Chain
    • Challenges
    • The Solution: Obsrv
  • Core Concepts
    • Obsrv Overview
    • Key Capabilities
    • Datasets
    • Connectors
    • High Level Architecture
    • Tech Stack
    • Monitoring
  • Explore
    • Roadmap
    • Case Studies
      • Agri Climate Advisory
      • Learning Analytics at Population Scale
      • IOT Observations Infra
      • Data Driven Features in Learning Platform
      • Network Observability
      • Fraud Detection
    • Performance Benchmarks
  • Guides
    • Installation
      • AWS Installation Guide
      • Azure Installation Guide
      • GCP Installation Guide
      • OCI Installation Guide
      • Data Center Installation Guide
    • Dataset Management APIs
    • Dataset Management Console
    • Connector APIs
    • Data In & Out APIs
    • Alerts and Notification Channels APIs
    • Developer Guide
    • Example Datasets
    • Connectors Developer Guide
      • SDK Assumptions
      • Required Files
        • metadata.json
        • ui-config.json
        • metrics.yaml
        • alerts.yaml
      • Obsrv Base Setup
      • Dev Requirements
      • Interfaces
        • Stream Interfaces
        • Batch Interfaces
      • Classes
        • ConnectorContext Class
        • ConnectorStats Class
        • ConnectorState Class
        • ErrorData Class
        • MetricData Class
      • Verifying
      • Packaging Guide
      • Reference Implementations
    • Coming Soon!
  • Community
  • Previous Versions
    • SB-5.0 Version
      • Overview
      • USE
        • Release Notes
          • Obsrv 2.0-Beta
          • Obsrv 2.1.0
          • Obsrv 2.2.0
          • Obsrv 2.0.0-GA
          • Obsrv 5.3.0-GA
          • Release V 5.1.0
          • Release V 5.1.2
          • Release V 5.1.3
          • Release V 5.0.0
          • Release V 4.10.0
        • Installation Guide
        • Obsrv 2.0 Installation Guide
          • Getting Started with Obsrv Deployment Using Helm
        • System Requirements
      • LEARN
        • Functional Capabilities
        • Dependencies
        • Product Roadmap
        • Product & Developer Guide
          • Telemetry Service
          • Data Pipeline
          • Data Service
          • Data Product
            • On Demand Druid Exhaust Job
              • Component Diagram
              • ML CSV Reports
              • Folder Struture
          • Report Service
          • Report Configurator
          • Summarisers
      • ENGAGE
        • Discuss
        • Contribute to Obsrv
      • Raise an Issue
  • Release Notes
    • Obsrv 1.1.0 Beta Release
    • Obsrv 1.2.0-RC Release
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. Core Concepts

Obsrv Overview

How & Where does Obsrv fit in the data landscape

PreviousCore ConceptsNextKey Capabilities

Last updated 1 year ago

Was this helpful?

As touched upon in the introduction section, A data value chain consists of ingesting & processing of data via data pipelines, storage of the processed data in a data-warehouse or data lake, and querying of the data for analytical purposes. The querying of data is either via a batch request or real-time depending upon the underlying storage layer configured.

Therefore many tools and technologies have come up in this space (see diagram below) trying to solve very specific problems as listed below:

  1. Data Integration Platforms: Data integration platforms (and tools) are used to move the data from operational sources (like OLTP databases, object stores, log streams) be it structured, semi-structured or unstructured into a data store where further processing and querying can happen. Some of the integration platforms also provide the ability to transform the data while moving into the data platform. This movement is either batch or streaming depending on the sources themselves.

  2. Data Warehouses: Data warehouses are used to store the data in the format that is friendly for analytical queries. Typical analytical queries crunch large amounts of data on any dimension depending on user needs. Typical OLTP databases cannot support these kinds of adhoc and interactive querying needs.

  3. Lake Houses: While data warehouses are present for storage, most of them support only structured data. In addition, data warehouses have strong schema affinity which make them very slow to adapt for changing needs. For ex: what if new attributes are added to the data? It is huge engineering work to prepare them to be stored in a data warehouse. In addition data warehouses are not AI/ML friendly or efficient. Lake-houses are an evolved architecture pattern to handle the limitations of data warehouses while providing cheap storage (as they are built on top of data lakes) and efficient and fast querying capabilities to AI/ML algorithms.

  4. Full-stack solutions: While there are many tools, to realize an end-to-end data value chain, many tools have to be stitched together to get an usable data platform. While the tools are scalable by themselves, architecturally ensuring reliability of each tool is a challenge and when stitched together the complexity increases exponentially. There are many full-stack solutions that have tried to solve this problem by providing all the capabilities of data integration tools, data warehouses and lake-houses.

While full-stack solutions themselves offer a complete solution, almost all of them are not built as real-time solutions ground up and are neither open-sourced.

As explained in the diagram below, this is the reason why Obsrv has been built - fully open source, stitching together the best tools for pipelines, storage and querying and real-time first by design.

Capability Landscape