Component Diagram

On Demand Druid Exhaust service will generate CSV reports based on user request. As this is a generic data-product user can request a CSV report for selected columns using filters.

Database Layer:

PostgreSQL Database (job_request): This is where job requests are stored. These requests include information about job configuration. This data appended to postgress by Filter Format From UI.
Druid: This is where flattened data is stored with the help ml-analytics ingestion specs and can retrieve data using Druid queries for specific datasorces using Model Config. Data provider

Database

Table/Datasouces

PostgreSQL

job_request

Druid

sl-project, sl-observation, sl-observation-status, sl-survey, ml-survey-status

Data Processing Layer: Apache Spark is used to perform transformations, sort columns, eliminate duplicates, and replace unknown values with null. This process enhances data quality, organizes data logically before storing to CSV.

User Interaction Diagram

This interaction diagram details the complete process of requesting and generating reports. The user can request a specific report through SunbirdEd from the program dashboard. Using exhaust APIs, this will map the request to SunbirdObsrv. OnDemondDruidExhaust data-product will be triggered by a scheduled cron task, which will query postgress and druid to get data and process it using Spark to transform data and generate the report. The user receives the same report once it has been created.

PreviousOn Demand Druid Exhaust Job NextML CSV Reports

Last updated 1 year ago

Was this helpful?