Installation

System Requirements

The computational requirements for running schema- miner pro vary depending on the model being used. If utilizing OpenAI models such as GPT-4o and GPT-4-turbo, no specialized hardware is needed since inference is performed via API calls. A basic system with a stable internet connection is sufficient for executing API-based workflow.

For users opting to run open-source models such as Llama 3.1 8B or other large-scale transformer-based models, local execution demands significantly higher computational resources. While these models can be executed on a CPU, inference times will be considerably longer. However, for efficient execution, a dedicated GPU with VRAM (specified by the model’s documentation) is strongly recommended.

While the hardware configuration can be adjusted based on the model size and performance needs, using a GPU significantly accelerates inference processes, reducing execution time drastically compared to CPU-only setups.

It is best practice to install the project in a virtual environment to avoid dependency conflicts:

python -m venv .venv
source .venv/bin/activate

Installation with PIP (PyPI)

Schema miner pro is published on PyPI, you can install it directly:

pip install schema-miner

This will install the latest stable release along with its dependencies.

Installation from source

To work with the development version or contribute to the project, clone the GitHub repository and install locally:

git clone https://github.com/sciknoworg/schema-miner.git
cd schema-miner
pip install -r requirements.txt

Configuration of API keys

Schema-miner pro uses large language models (LLMs) that require API access (e.g., OpenAI). API keys and other secrets are managed either via a .env file at the project root or with the EnvConfig Class.

Configuration Using `.env`

Copy the example configuration file:

cp .env.example .env

Open .env in a text editor and add your keys:

OPENAI_API_KEY = 'Your OpenAI API key'
OPENAI_ORGANIZATION_ID = 'Your OpenAI Organization ID'

Schema-miner automatically loads these values at runtime using the provided configuration utilities.

Configuration Using `EnvConfig`

from schema_miner.config.envConfig import EnvConfig

# OpenAI Keys
EnvConfig.OPENAI_api_key = '<insert-your-openai-key>'
EnvConfig.OPENAI_organization_id = '<insert-your-openi-organization-id>'

# Ollama
EnvConfig.OLLAMA_base_url = '<Ollama Base URL or empty if Ollama running locally>'

# HuggingFace
EnvConfig.HUGGINGFACE_access_token = '<Your huggingface access token>'

Next steps

Once installed and configured, head over to the Quickstart section to run your first schema extraction workflow.