ZebClient
Zebware.comCustomer Portal
  • ZebClient Documentation
  • Product Overview
    • Introduction to ZebClient
    • Functional Overview
    • Product Components
    • Data Flows
    • Data Striping
  • USE CASES
    • ZebClient Analytics
      • Architecture
        • Data Pipeline
        • Data Storage
        • Consumption
      • Deploy a ZebClient Advanced Cluster
        • Step 0 - Setup
      • FAQ
  • Planning & Getting Started
    • How to Choose your Deployment Mode
    • Deployment Modes and Tuning
    • Performance
    • License
    • Order Your ZebClient License
    • Pricing
  • Installation
    • Types of Installations
    • Guided Installation
      • ZebClient with Azure Blob Storage
        • Defining Backend with Azure Blob Storage
        • Mounting ZebClient with Azure Blob Storage
      • ZebClient with AWS S3
        • Defining Backend with AWS S3
        • Mounting ZebClient with AWS S3
      • Mount Additional Agent Node
    • Kubernetes
      • Azure Quickstart Guide
      • ZebClient CSI
      • ZebClient Helm
      • ZebClient Terraform
    • Virtual Machines
      • Azure Installation Guide
        • Installing Using Terraform
        • Uninstalling Using Terraform
    • Checking Installation
    • Running First Test
  • Management HOW-TOS
    • Add a New Agent VM into an Existing Cluster
    • Retrieve Cluster Log Files
    • CloudFormation Deployments
      • Understanding our CloudFormation Template
      • Uninstalling Using CloudFormation
    • Command-line Interface
  • Operations & Monitoring
    • Importing Your Data
    • Inlets
      • Data from External S3 Bucket
    • System Recovery Guide
      • Restore KeyDB Backup from S3
    • Port a Deploy
    • Add Resources to a Cluster
      • Add Application Node to Existing Machine (Manually)
      • Add Application Node to Existing Machine (via zc-cli)
      • Add New Application Node
      • Add Jumpbox
    • Verifying License Validity
    • Monitoring Your ZebClient Cluster with Netdata
Powered by GitBook
On this page
  1. USE CASES
  2. ZebClient Analytics

Deploy a ZebClient Advanced Cluster

PreviousConsumptionNextStep 0 - Setup

Last updated 1 year ago

To deploy a cluster for advance analytics we should have in mind that the flow of information will follow a pattern like:

  1. Getting raw data from one or several source systems.

  2. Process raw data to transform it into meaningful and structured data.

  3. Collect structured data, validate and update the information that will be used from output consumer systems as BI or LLM.

To support this pipeline, the infrastructure needed would look like this (This is a suggested deployment using open-source tools but it can vary depending on each business case):

In this infrastructure we will need a Kubernetes cluster with 4 node pools:

  • default: To run the Kubernetes services kube-system.

  • general: It will run the open-source services required to run the suggested infrastructure for advanced analytics with ZebClient:

    • Kubeflow

    • Jupyterhub

    • Minio

    • mysql

  • executor: Run the pods that will execute each pipeline step.

  • executor-gpu: Run the pods that will execute each pipeline step that requires GPU processing.

All node pools will have a pod which will run the csi-driver for ZebClient storage class and this pod will also run a ZebClient agent that will interact with the ZebClient's cold storage, and ZebClient's acceleration nodes if enabled.

What will happen under the hood is that a pipeline implemented with will fetch the data from a source system and place it as it comes, in raw format, into what we call the "bronze layer" of information. This is a path inside a shared k8s volume (pvc) supported by ZebClient. Then, A next step in the pipeline will process the raw data and transform it into structured data in format and will validate the quality of the information, this is the "silver layer" which is another path inside the shared volume managed with ZebClient. The process of transforming raw data into structured data might need some specific hardware requirements to get the job done like GPU/CPU usage or certain amount of memory. This hardware requirements are specified in Elyra and each pipeline step is executed on isolated pods in Kubernetes by with the hardware specifications defined in Elyra. Later, the last step from the data pipeline will take the structured data and update the files that are actually being used by the consumer systems. This is the gold layer.

Elyra
parquet
Kubeflow