Compute

The Dystr Compute system manages the execution of deterministic code and data processing tasks. It provides a secure, isolated computing environment for each project, eliminating the need for local installations or complex setup. Every task execution is automatically tracked and stored, creating a complete history of your computations without requiring any version control system.

Compute is ideal for a wide range of technical tasks, including:

  • Engineering calculations and simulations.
  • Data analysis and visualization.
  • Statistical modeling and research.
  • Image and signal processing.
  • Financial calculations and modeling.
  • Scientific computing and numerical analysis.

Creating Compute Tasks

There are several ways to create compute tasks:

  1. Direct Code Creation: Write Python code directly in the Code View.
  2. Natural Language with Compute Bot: Use natural language to describe what you want to accomplish.
  3. Through Assistants: Workers and Chat can create and modify compute tasks on your behalf as you discuss your project.

Using the Compute Bot

The Compute Bot helps you create and modify compute tasks using natural language. Simply type your request in the input box below the code window. For example:

Create a script that reads the temperature data from sensors.csv and plots it over time

The Compute Bot can:

  • Access Workspace files and understand its contents.
  • Read documentation and requirements.
  • Analyze data structures.
  • Generate appropriate code based on context.

Some example capabilities:

  • "Read the test requirements from Section 3.2 of requirements.pdf and create a validation script."
  • "Look at sensors.csv and create a script to calculate moving averages."
  • "Create a script that implements the equations from page 5 of standards.pdf."

Math View

Every compute task includes a Math View that provides a mathematical representation of the code's logic. If you're not familiar with Python programming, this view helps you understand the underlying calculations and algorithms.

The Math View:

  • Explains each step's execution logic in engineering terms.
  • Presents mathematical expressions using LaTeX notation.
  • Includes relevant code snippets for context.
  • Updates in real-time as code changes.
  • Breaks down complex operations into understandable components.

For example, a simple RC filter calculation in code:

This dual-view approach ensures that both programmers and domain experts can collaborate effectively on the same computations.

Compute Lifecycle

Dystr automates environment lifecycle management, allowing you to focus on analysis rather than infrastructure. Virtual machines are automatically provisioned when needed and cleaned up after periods of inactivity, ensuring you always have a fresh, properly configured environment without any manual setup.

Phase Description
Task Creation You define a task, which includes the code to execute and any data it needs to access, within the Dystr interface. Tasks are associated with a specific Workspace.
Environment Initialization The system checks if a compute environment is already active for your Workspace. A new compute environment is automatically provisioned if needed. The environment runs Ubuntu 22.04 LTS with pre-installed packages (see below for details).
File Synchronization The system automatically syncs your Workspace files to the compute environment. All files become available in the /files directory. Your code can access these files just like a local filesystem.
Task Execution Your code runs in the secure compute environment. Real-time output is streamed back to your interface. Results are stored and accessible through the Workspace.
Inactivity Shutdown Compute environments automatically shut down after 20 minutes of inactivity. New environments are provisioned automatically when you run your next task. This ensures efficient resource usage while maintaining availability.

File System Access

All Workspace files are automatically mounted at /files in your compute environment. Your code can read these files using standard Python file operations. Files written to /files/files are automatically synced back to the Workspace.

Compute Resources

Each compute environment provides:

  • 1.5 GHz CPU allocation.
  • 1024 MB memory.
  • Internet access for package installation and external data retrieval.
  • Ubuntu 22.04 LTS base system.

Note: For private deployments, compute resources can be customized to match your organization's specific needs and requirements.

Pre-installed Packages

The compute environment is based on Ubuntu 22.04 LTS and comes with a comprehensive set of pre-installed packages.

Python Environment

Python 3.11 is installed via Conda with these key packages:

Data Science & Analysis

Package Description
numpy Fundamental package for numerical computing
pandas Data manipulation and analysis library
scipy Scientific computing tools
statsmodels Statistical modeling and econometrics
sympy Symbolic mathematics
dask Flexible parallel computing
numexpr Fast numerical array evaluator
pytables Hierarchical dataset management
numpy_financial Financial functions

Machine Learning & Image Processing

Package Description
scikit-learn Machine learning library
scikit-image Image processing toolbox
mediapipe Cross-platform ML solutions
numba JIT compiler for numerical functions

PackageDescriptionscikit-learnMachine learning libraryscikit-imageImage processing toolboxmediapipeCross-platform ML solutionsnumbaJIT compiler for numerical functions

Visualization

Package Description
matplotlib-base Comprehensive plotting library
seaborn Statistical data visualization
bokeh Interactive visualization library
altair Declarative statistical visualization
ipympl Matplotlib Jupyter integration
facets Machine learning visualization

Data Processing & Storage

Package Description
beautifulsoup4 HTML/XML parsing library
cloudpickle Extended pickling support
dill Extended serialization
h5py HDF5 binary data format interface
protobuf Data interchange format
sqlalchemy SQL toolkit and ORM

Engineering & Control Systems

Package Description
control Control systems library

Installing Additional Packages

Additional Python packages can be installed during task execution using pip or conda if needed. For example:

!pip install package-name

To prevent package installation progress output from cluttering your results, use pip's --quiet flag:

!pip install package-name --quiet

Note: To see the exact versions of installed packages in your environment, you can run !conda list within your compute environment.

System Packages

The compute environment comes with essential Linux packages pre-installed:

  • Build tools (build-essential)
  • Text editors (nano, vim-tiny)
  • Version control (git)
  • Process management (tini)
  • Document processing:
    • texlive-xetex
    • texlive-fonts-recommended
    • texlive-plain-generic
    • pandoc
  • System utilities:
    • wget
    • unzip
    • bzip2
    • openssh-client
    • ca-certificates
    • sudo
  • Fonts and graphics:
    • fonts-liberation
    • fontconfig
    • cm-super
    • dvipng

Known Limitations

Coming Soon:

  • Advanced compute configurations: Support for larger compute instances, GPU acceleration, and specialized hardware will be available in a future release. Current environments are limited to CPU-only workloads with fixed resource allocations.

Related Links