Component Diagram
On Demand Druid Exhaust service will generate CSV reports based on user request. As this is a generic data-product user can request a CSV report for selected columns using filters.
Database Layer:
PostgreSQL Database (job_request): This is where job requests are stored. These requests include information about job configuration. This data appended to postgress by Filter Format From UI.
Druid: This is where flattened data is stored with the help ml-analytics ingestion specs and can retrieve data using Druid queries for specific datasorces using Model Config. Data provider
Database | Table/Datasouces |
---|---|
PostgreSQL | job_request |
Druid | sl-project, sl-observation, sl-observation-status, sl-survey, ml-survey-status |
Data Processing Layer: Apache Spark is used to perform transformations, sort columns, eliminate duplicates, and replace unknown values with null. This process enhances data quality, organizes data logically before storing to CSV.
User Interaction Diagram
This interaction diagram details the complete process of requesting and generating reports. The user can request a specific report through SunbirdEd from the program dashboard. Using exhaust APIs, this will map the request to SunbirdObsrv. OnDemondDruidExhaust data-product will be triggered by a scheduled cron task, which will query postgress and druid to get data and process it using Spark to transform data and generate the report. The user receives the same report once it has been created.
Last updated