Files
Files in your Workspace provide centralized storage for project data, documentation, and computation inputs/outputs. They serve as a shared set of data accessible to all Workspace components including Compute tasks, Chat, and Workers.
Core Functionality
- Secure file storage within isolated Workspace boundaries.
- Automatic indexing and semantic search capabilities.
- Direct access from Compute tasks via
/files
mount point. - Integration with Chat and Worker assistants.
File Access
In Compute Tasks
Files are automatically mounted at /files
in your compute environment, making it easy to load data, read configurations, and access documentation during analysis. This means you can directly reference project files in your calculations:
# Read data directly from workspace files
data = pd.read_csv('/files/data/measurements.csv')
Through Chat / Workers
Assistants can read, write, and search files in your Workspace. See the Assistants page for more details.
Intelligent Search
When you add files to your Workspace, the system processes them to enable semantic search across your project's contents.
Semantic Search
Unlike simple keyword matching, semantic search understands the meaning of your questions and finds relevant information within your project's documents. This means:
- Through Assistants, you can ask natural questions like "what's the minimum pulse width for the SPI chip select signal?" and get answers from your datasheets and specifications.
- The system will find relevant information even if the exact words don't match.
- Results include related concepts (e.g., a search for "power requirements" might find sections about "voltage ratings" and "current draw" in your documentation).
For example, in Chat if you ask:
"What are the timing requirements for the reset signal?"
The system might find and return information from:
- A datasheet section titled "Reset Timing Specifications".
- A technical note discussing "RST_ pulse width".
- Design requirements mentioning "minimum reset assertion time".
Each result will include a clickable citation that takes you directly to the exact page in the source document where the information was found.
This capability is particularly useful when:
- Searching across many technical documents.
- Looking for specific requirements or specifications.
- Finding related information across multiple files.
- Answering technical questions about your project.
You can add files to your Workspace by dragging and dropping files into the Workspace UI.
The system supports adding large volumes of documentation - including datasheets, requirements documents, regulatory standards, technical references, and textbooks. Once added, all files are automatically indexed and made searchable, allowing you to instantly find relevant information across your project's entire knowledge base.
How It Works Behind the Scenes
- When files are added, the content is split into meaningful chunks.
- Each chunk is analyzed to understand its meaning and context.
- When you search, your question is analyzed the same way.
- The system finds chunks that match the meaning of your question.
- Results are returned with exact citations to the source documents - click any citation to jump directly to that page in the source material.
Using Search Across the Platform
This search capability is available throughout your Workspace:
In Chat
- Interactively ask questions about your project's documentation.
- Get immediate answers with source citations.
- Follow up with clarifying questions.
Through Workers
- Create automated processes that reference project documentation.
- Set up workers to answer questions from team members.
- Build automated documentation checks and requirement validation.
In Compute
- Use the Compute Bot to find relevant information while writing calculations.
- Reference specifications and requirements directly in your analysis.
- Automatically document sources in your computational notebooks.
Supported File Types
The system processes these file types with specialized handling:
Documents
- PDF files (
.pdf
) - Full text extraction with page-level chunking. - Text files (
.txt
, .md
, etc) - Plain text processing. - JSON files (
.json
) - Structured data parsing. - Microsoft Word files (
.docx
) - Full text extraction. - Microsoft PowerPoint files (
.pptx
) - Full text extraction with slide boundary preservation.
Data Files
- CSV files (
.csv
) - Automatic conversion to queryable SQLite databases.
File Processing
When files are added to a Workspace, they undergo automatic processing based on their MIME type:
- Content Extraction
- PDFs: Full text extraction with page boundary preservation.
- Text files: Direct content reading with encoding detection.
- CSVs: Schema analysis and data validation.
- JSON: Structure validation and parsing.
- Microsoft Word: Full text extraction.
- Microsoft PowerPoint: Full text extraction with slide boundary preservation.
- Chunking
- Content is split into semantic chunks for efficient processing.
- Metadata is preserved and linked to chunks.
- Indexing
- Chunks are embedded for semantic search capabilities.
- Full text search indices are maintained.
- Original document structure is preserved.
CSV to SQLite Conversion (Advanced)
When you upload CSV files to your Workspace, they are automatically converted into SQLite databases for efficient querying and data analysis.
How It Works
- Schema Detection
- Column types are automatically inferred (numbers, text, dates).
- Date / time formats are standardized.
- Missing values are handled consistently.
- Column names are sanitized for SQL compatibility.
- Database Creation
- A SQLite database is created with optimized table structure.
- Appropriate indices are added for common query patterns.
- The original CSV remains available for direct access.
Using SQL Databases in Compute
You can access your data in multiple ways:
# Direct SQL queries
import sqlite3
conn = sqlite3.connect('/files/data/measurements.db')
results = conn.execute("""
SELECT timestamp, temperature
FROM measurements
WHERE temperature > 100
ORDER BY timestamp
""").fetchall()
# Through pandas
import pandas as pd
df = pd.read_sql("SELECT * FROM measurements", '/files/data/measurements.db')
# Original CSV still accessible
df = pd.read_csv('/files/data/measurements.csv')
Storage and Access Control
Files remain private to your Workspace:
- Only accessible to authorized members.
- Protected by Organization access controls.
- Isolated from other Workspaces.
- Available only to AI models running in the same Workspace.
Known Limitations
Coming Soon:
- Microsoft Office support: Processing for Excel® (
.xlsx
) files will be available in an upcoming release. - Advanced file type support: Additional file type processors and specialized handlers will be added in future releases.
- Bulk file operations: Tools for managing large sets of files and automated file organization are planned for future updates.
Trademark Notice
Microsoft, Word, PowerPoint, Excel, and Office are trademarks of the Microsoft group of companies.
Related Links