India Agentic AI Open Hackathon - Know-Before-You-Go Technical Guide

How To Use This Guide

This guide is intentionally more technical than the public workshop agenda. Use it as the developer companion during the workshop to set up your stack, pick a track, reuse starter snippets, and shape a build plan for the hackathon days.

Workshop setup

Create NVIDIA Developer, build.nvidia.com, NGC, and Hugging Face accounts if needed.
Read the common NIM + Nemotron section.
List the NVIDIA verified agent skills and install the skills relevant to your track.
Pick a primary track and one backup track.
Capture one architecture sketch and one technical blocker question.

During the workshop

Use tabs as reference material during demos.
Mark which code snippets your team will reuse.
Ask mentors where your project fits in the NVIDIA stack.
Decide what you will build on Day 0 and what you will defer.

Before hackathon Day 0

Have a runnable repo with a license and README.
Prepare small test data and an evaluation prompt set.
Confirm your inference path: hosted NIM, self-hosted NIM, vLLM, or cluster.
Define a demo path that works even if the full system is not complete.

Track Selection Map

If your project is mainly...	Choose	Use these NVIDIA assets	Expected demo proof
Connecting tools, APIs, documents, agents, and workflows into a working assistant or automation service.	Track A: Agentic Workflows	Nemotron, NIM, NVIDIA Agent Skills, NeMo Agent Toolkit, MCP, A2A, optional NemoClaw.	A traced workflow with tool calls, result synthesis, and at least one reliability/evaluation metric.
Changing model behavior for a domain, persona, language, or task using training or post-training.	Track B: Model Finetuning and Customisation	NeMo, Megatron-Bridge, SFT, LoRA/PEFT, NeMo RL, NIM deployment.	Base vs adapted comparison, training data sample, evaluation result, and deployment path.
Creating structured, high-quality training or evaluation data for low-resource, domain, or agentic tasks.	Track C: Synthetic Data Generation	Nemotron, Nemotron-Personas-India, NeMo Data Designer, NeMo Curator, LLM-as-judge, validators.	A generated dataset sample plus quality filters, deduplication, and export format.

Recommended Repo Structure

text

india-agentic-ai-hackathon/
  README.md
  LICENSE
  .gitignore
  docs/
    architecture.md
    demo-script.md
    evaluation-plan.md
  app/
    api/
    ui/
    agents/
  configs/
    nim.env.example
    nat/
    training/
    synthetic-data/
  data/
    samples/
    eval/
  notebooks/
  scripts/
    smoke_test.sh
    run_demo.sh
  results/
    screenshots/
    eval_report.md

Minimum bar for a strong technical submission: a clear track, runnable demo path, evidence of NVIDIA stack usage, and a small but believable evaluation loop.

Common Foundation: NIM + Nemotron

NIM and Nemotron are the common layer across all three tracks. NIM gives you a production-style inference endpoint with an OpenAI-compatible API. Nemotron gives you a strong NVIDIA model family for reasoning, tool use, long-context work, and synthetic data generation.

Accounts and keys

Hosted path

Sign in at build.nvidia.com.
Open the model page you want to test.
Create an API key and store it as NVIDIA_API_KEY.
Use https://integrate.api.nvidia.com/v1 as the base URL.

Self-hosted path

Prepare Docker and NVIDIA Container Toolkit.
Log in to nvcr.io with an NGC API key.
Pull/run the NIM container for your selected model.
Use your local container URL, usually http://localhost:8000/v1.

Keep API keys out of notebooks, screenshots, commits, and pitch decks. Commit only .env.example, never .env.

NIM path comparison

Item	Hosted NIM	Self-hosted NIM
Best for	Fast prototyping, laptops, early demos.	Data control, private infrastructure, lower network latency.
Local GPU	Not required.	Required for local deployment.
Key	`NVIDIA_API_KEY` from build.nvidia.com.	`NGC_API_KEY` for pulling containers and assets.
Base URL	`https://integrate.api.nvidia.com/v1`	`http://localhost:8000/v1` or cluster endpoint.
Hackathon advice	Start here for all teams.	Move here when privacy, latency, or deployment realism matters.

Nemotron model selection

Model	Use in hackathon	Rule of thumb
`nvidia/nemotron-3-nano-30b-a3b`	Default starting model for agentic reasoning, tool calling, and SDG prototyping.	Start with Nano unless you have a clear reason not to.
`nvidia/nemotron-3-super-120b-a12b`	Higher-quality planning, heavier multi-agent workflows, and more complex data generation.	Upgrade when quality matters more than latency or cost.
Nemotron multimodal or VL variants	Optional extension for image, document, or visual reasoning projects.	Use only when the project actually needs multimodal inputs.

Starter environment

bash

mkdir nvidia-ai-starter
cd nvidia-ai-starter
python3 -m venv .venv
source .venv/bin/activate

pip install -U openai python-dotenv pandas rich

cat > .env.example <<'EOF'
NVIDIA_API_KEY=nvapi-your-key
NIM_BASE_URL=https://integrate.api.nvidia.com/v1
NIM_MODEL=nvidia/nemotron-3-nano-30b-a3b
EOF

cp .env.example .env
echo ".env" >> .gitignore

NVIDIA Verified Agent Skills

NVIDIA Agent Skills are portable instruction sets that help coding agents use NVIDIA CUDA-X libraries, AI Blueprints, and platform tools correctly. Use them when you want your agentic coding assistant to follow product-specific NVIDIA patterns instead of relying only on generic model knowledge.

Browse and install

Browse the public catalog at github.com/NVIDIA/skills.
List available skills before installing anything.
Install only the skills your team will use during the hackathon.
Reload or restart your agent so the new skill instructions are available.

Good hackathon matches

cuOpt for optimization, routing, logistics, and scheduling agents.
RAG Blueprint for retrieval-augmented generation demos.
TensorRT-LLM for inference optimization and deployment work.
NeMo RL, NeMo Gym, or Megatron skills for Track B training and alignment projects.

bash

# See the available NVIDIA-verified skills.
npx skills add nvidia/skills --list

# Interactive install flow.
npx skills add nvidia/skills

# Install one known skill for Codex without prompts.
npx skills add nvidia/skills \
  --skill cuopt-numerical-optimization-api-python \
  --agent codex \
  --yes

# Install the same skill into multiple agent clients if your team uses more than one.
npx skills add nvidia/skills \
  --skill cuopt-numerical-optimization-api-python \
  --agent codex \
  --agent claude-code \
  --agent cursor \
  --yes

Treat installed skills like project dependencies: pin what you use in your README, record the skill name in your demo notes, and avoid installing broad extras that your project never exercises.

Chat completion starter

python

# scripts/nim_chat_smoke_test.py
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(
    base_url=os.getenv("NIM_BASE_URL", "https://integrate.api.nvidia.com/v1"),
    api_key=os.getenv("NVIDIA_API_KEY", "not-used"),
)

model = os.getenv("NIM_MODEL", "nvidia/nemotron-3-nano-30b-a3b")

response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a concise technical mentor for hackathon teams."},
        {"role": "user", "content": "Suggest a Track A architecture for invoice processing."},
    ],
    temperature=0.2,
    max_tokens=600,
)

print(response.choices[0].message.content)

Streaming response

python

stream = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Create a demo script for a multilingual support agent."}],
    stream=True,
    temperature=0.3,
    max_tokens=700,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Reasoning controls

Use non-thinking mode for fast routing and tool loops. Use thinking mode for deeper planning, architecture comparison, and data quality review. Avoid combining experimental parallel reasoning modes with tool-calling loops unless the model documentation explicitly supports it.

python

# Fast concise mode
resp = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Draft a simple grievance-routing agent architecture."}],
    temperature=0,
    max_tokens=700,
    extra_body={
        "chat_template_kwargs": {
            "enable_thinking": False
        }
    },
)
print(resp.choices[0].message.content)

# Deeper planning mode
resp = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Compare three architectures for a public-services assistant and choose one."}],
    temperature=0,
    max_tokens=1800,
    extra_body={
        "chat_template_kwargs": {
            "enable_thinking": True
        }
    },
)
print(resp.choices[0].message.content)

Tool calling pattern

Use tool calling when the model should decide when to call an application function, database lookup, policy engine, calculator, search API, or workflow step.

python

import json

tools = [{
    "type": "function",
    "function": {
        "name": "lookup_policy",
        "description": "Look up a company policy by topic.",
        "parameters": {
            "type": "object",
            "properties": {
                "topic": {"type": "string", "description": "Policy topic, for example travel or reimbursement"}
            },
            "required": ["topic"],
        },
    },
}]

def lookup_policy(topic: str) -> str:
    policies = {
        "travel": "Flights require manager approval. Hotels must be under the city cap.",
        "reimbursement": "Submit receipts within 30 days with project code and GST details.",
    }
    return policies.get(topic.lower(), "No policy found for that topic.")

messages = [{"role": "user", "content": "Can I expense a hotel for my Bangalore workshop trip?"}]

first = client.chat.completions.create(
    model=model,
    messages=messages,
    tools=tools,
    tool_choice="auto",
    temperature=0,
)

assistant_message = first.choices[0].message
messages.append(assistant_message)

if assistant_message.tool_calls:
    for call in assistant_message.tool_calls:
        args = json.loads(call.function.arguments)
        result = lookup_policy(**args)
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": result,
        })

    final = client.chat.completions.create(model=model, messages=messages, temperature=0.2)
    print(final.choices[0].message.content)

Self-hosted NIM smoke tests

bash

# Log in to NGC before pulling private/entitled containers.
docker login nvcr.io -u '$oauthtoken' -p "$NGC_API_KEY"

# Example pattern. Confirm the exact container path from the model page or NIM docs.
docker run --rm -it --gpus all \
  -e NGC_API_KEY="$NGC_API_KEY" \
  -v "$HOME/.cache/nim:/opt/nim/.cache" \
  -p 8000:8000 \
  nvcr.io/nim/nvidia/nemotron-3-nano-30b-a3b:latest

curl http://localhost:8000/v1/models

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/nemotron-3-nano-30b-a3b",
    "messages": [{"role": "user", "content": "Say hello from local NIM."}],
    "max_tokens": 64
  }'

Model-free NIM option

Use model-free NIM only when you need to serve a supported custom or newer model path and understand the deployment requirements. For most hackathon teams, hosted NIM or a model-specific NIM container is simpler.

bash

export LOCAL_NIM_CACHE="$HOME/.cache/nim"
mkdir -p "$LOCAL_NIM_CACHE"

export NIM_LLM_IMAGE="nvcr.io/nim/nvidia/model-free-nim:latest"
export NIM_MODEL_PATH="hf://meta-llama/Llama-3.1-8B-Instruct"

docker run --gpus all \
  -e NIM_MODEL_PATH="$NIM_MODEL_PATH" \
  -e HF_TOKEN="$HF_TOKEN" \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -p 8000:8000 \
  "$NIM_LLM_IMAGE"

Common debugging checklist

Symptom	Check	Fix
401 or 403	API key, account access, endpoint URL.	Regenerate key, re-export env var, verify model access.
404 model not found	Model ID mismatch between hosted endpoint and self-hosted endpoint.	Call `/v1/models` and use returned model ID.
Slow first response	Container cold start, model download, cache miss.	Warm up endpoint before demos and cache models.
JSON/tool parse errors	Prompt too loose or schema too complex.	Simplify schema, set temperature low, validate arguments.

Track A: Agentic Workflows

Track A teams should aim to show a working agentic service, not just a chatbot. The service should plan, call tools, recover from failure, expose a usable interface, and report at least basic quality or latency metrics.

Reference architecture

text

User / UI / API
  -> NeMo Agent Toolkit workflow
  -> Nemotron via NIM for reasoning
  -> Tools:
       - search / RAG
       - databases
       - business APIs
       - document parsers
       - calculators / validators
  -> Optional MCP server/client boundary
  -> Optional A2A specialist agent delegation
  -> Trace, evaluation, final answer

Install NeMo Agent Toolkit

bash

mkdir track-a-agent
cd track-a-agent
python3 -m venv .venv
source .venv/bin/activate

pip install -U uv
uv pip install "nvidia-nat[langchain,mcp,a2a,eval]"

export NVIDIA_API_KEY=nvapi-your-key
nat --help
nat info components -t function

Minimal ReAct workflow config

yaml

# configs/invoice_agent.yml
llms:
  nim_llm:
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    api_key: ${NVIDIA_API_KEY}
    base_url: ${NIM_BASE_URL:-https://integrate.api.nvidia.com/v1}
    temperature: 0.0
    max_tokens: 1024

functions:
  current_datetime:
    _type: current_datetime

  wikipedia_search:
    _type: wiki_search
    max_results: 3

  code_generator:
    _type: code_generation
    programming_language: Python
    llm_name: nim_llm
    description: "Generate Python helper code only when a code artifact is required."

workflow:
  _type: react_agent
  llm_name: nim_llm
  tool_names:
    - current_datetime
    - wikipedia_search
    - code_generator
  verbose: true
  parse_agent_response_max_retries: 2

Workflow config patterns

Config block	What it controls
`llms`	Model providers, model names, generation settings, and NIM base URLs.
`functions`	Built-in tools, custom Python tools, retrieval tools, calculators, and code tools.
`function_groups`	Grouped dynamic tools, especially MCP client tool groups.
`workflow`	The agent, router, sequential flow, or executor that binds model and tools.
`eval`	Datasets, evaluators, metrics, and output paths for quality checks.
`general`	Telemetry, logging, retries, object stores, and runtime-level settings.

Workflow type	Use when
`react_agent`	You need a first agent with explicit tool reasoning and easy logs.
Tool-calling workflow	You want cleaner structured tool calls and less free-form reasoning.
Router agent	You need to send India-specific tasks to specialist workflows.
Sequential workflow	You need deterministic steps such as extract, validate, retrieve, respond.
Parallel workflow	You want several tools or specialists to answer, then merge results.

Run, serve, and test

bash

# Run once from CLI.
nat run --config_file configs/invoice_agent.yml \
  --input "Design an enterprise invoice triage workflow and list the tools it needs."

# Serve as an HTTP API.
nat serve --config_file configs/invoice_agent.yml --host 0.0.0.0 --port 8000

# Call it from another terminal.
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Check this invoice workflow for missing approval steps."}
    ]
  }'

MCP server pattern

Expose workflow functions as MCP tools when another client or agent needs to call them.

bash

# Publish functions from the workflow as MCP tools.
nat mcp serve --config_file configs/invoice_agent.yml --host 0.0.0.0 --port 9901

# Common smoke test: list tools from an MCP-capable client, then call one tool.
# Keep network and credential policies explicit if connecting enterprise systems.

MCP client pattern

Use an MCP client group when your agent needs to consume tools from another local service, partner API wrapper, database adapter, or team-built tool server.

yaml

# configs/agent_with_mcp.yml
function_groups:
  mcp_public_data:
    _type: mcp_client
    server:
      transport: streamable-http
      url: "http://localhost:9901/mcp"
    include:
      - search_schemes
      - get_state_policy
    tool_call_timeout: 30
    reconnect_enabled: true
    tool_overrides:
      get_state_policy:
        alias: india_state_policy_lookup
        description: "Look up Indian state-level policy and program data."

workflow:
  _type: react_agent
  llm_name: nim_llm
  tool_names:
    - mcp_public_data

bash

nat serve --config_file configs/agent_with_mcp.yml --port 8000
curl -s http://localhost:8000/mcp/client/tool/list | jq

A2A server pattern

Use A2A when a workflow should be discoverable as a specialist agent by other agents.

bash

uv pip install "nvidia-nat[a2a]"

nat a2a serve \
  --config_file configs/invoice_agent.yml \
  --host 0.0.0.0 \
  --port 10000

bash

# From another terminal or another agent environment:
nat a2a client discover --url http://localhost:10000
nat a2a client get_skills --url http://localhost:10000
nat a2a client call \
  --url http://localhost:10000 \
  --message "Summarize incorporation steps for a DPIIT-recognized startup."

Evaluation harness

json

[
  {
    "input": "Invoice INV-102 has no PO number. What should happen?",
    "expected": "Flag for manual review or route to exception approval."
  },
  {
    "input": "Vendor is approved and amount is below threshold. What next?",
    "expected": "Proceed to payment queue after validation."
  }
]

yaml

# configs/eval_invoice_agent.yml
eval:
  general:
    output_dir: .tmp/eval_results
    dataset:
      _type: json
      file_path: data/eval/invoice_eval.json
  evaluators:
    accuracy:
      _type: ragas
      metric: AnswerAccuracy
      llm_name: nim_llm

bash

nat eval --config_file configs/eval_invoice_agent.yml

Observability and profiling

Every Track A team should inspect traces before demo day. Most agent failures are not model failures; they are retrieval misses, wrong tool choices, bad schemas, retries, or oversized context.

yaml

general:
  telemetry:
    tracing:
      console:
        _type: console
        enabled: true
      phoenix:
        _type: phoenix
        endpoint: http://localhost:6006
        enabled: ${PHOENIX_ENABLED:-false}

Inspect	Question to answer
Tool calls	Did the agent choose the correct MCP, A2A, or custom tool?
Latency	Which model call or tool call dominates response time?
Token usage	Are retrieved documents or tool outputs too large?
Retries	Are tool argument schemas or output parsers failing?
Eval failures	Is the failure caused by retrieval, reasoning, or prompt design?

NemoClaw For Safe Agent Exploration

NemoClaw is useful when you want to discuss safer execution of always-on coding or automation assistants. It integrates OpenClaw with NVIDIA OpenShell sandboxing concepts.

NemoClaw is alpha software. APIs, configuration schemas, and runtime behavior can change. Do not use it in production environments.

bash

# Quickstart pattern from the current NemoClaw docs.
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
source ~/.bashrc

# Create or connect to a sandboxed assistant.
nemoclaw hackathon-agent connect

# Check status and logs.
nemoclaw hackathon-agent status
nemoclaw hackathon-agent logs --follow

NemoClaw discussion checklist

Concern	What to show in your architecture
Network egress	Which domains or APIs the agent can call, and which are blocked.
Credentials	How secrets are isolated from prompts, logs, and agent memory.
Filesystem	Which project paths the agent can read/write.
Human control	Where approvals, logs, or policy gates are surfaced.

Track A capstone templates

Template	Suggested stack
Startup Compliance Agent	NAT ReAct agent, MCP tax/policy tools, A2A legal-summary specialist, Nemotron via NIM.
Bharat Public Services Navigator	MCP connectors for state and central schemes, multilingual Nemotron answer layer, groundedness eval.
Safe Coding Copilot for Indian SaaS Teams	NemoClaw/OpenShell sandbox, NAT HTTP reviewer, insecure-code evaluation set.
Disaster Response Operations Agent	Router workflow, weather/geospatial MCP tools, A2A logistics agent, trace dashboard.

Track B: Model Finetuning and Customisation

Track B teams should be explicit about what model behavior they are changing and why. Your demo should compare base vs customised behavior, not only show that training ran.

Track B pipeline

1. Define behavior gap: What the base model cannot do well enough today.

2. Prepare data: Human data, domain data, or Track C-generated synthetic examples.

3. Train: Use full SFT or LoRA/PEFT through Megatron-Bridge where appropriate.

4. Align: Use NeMo RL for GRPO, DPO, RM, or SFT post-training when reward/preference signals matter.

5. Evaluate: Compare base vs adapted model on held-out prompts.

Environment and container

bash

# Confirm exact container tag from current Megatron-Bridge docs for your model.
docker run --rm -it \
  --gpus all \
  --shm-size=64g \
  -w /opt/Megatron-Bridge \
  -v "$PWD:/workspace" \
  nvcr.io/nvidia/nemo:25.11.nemotron_3_nano \
  bash

export HF_TOKEN=hf-your-token
export HF_MODEL_ID=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
export MEGATRON_MODEL_PATH=/workspace/checkpoints/nemotron3-nano-megatron
export HF_EXPORT_PATH=/workspace/exports/nemotron3-nano-adapted

Convert Hugging Face checkpoint to Megatron

bash

python examples/conversion/convert_checkpoints.py import \
  --hf-model "$HF_MODEL_ID" \
  --megatron-path "$MEGATRON_MODEL_PATH" \
  --trust-remote-code

LoRA finetuning starter

bash

# Keep first hackathon runs small. Increase steps only after a smoke test succeeds.
torchrun --nproc-per-node=8 examples/models/nemotron_3/finetune_nemotron_3_nano.py \
  --peft lora \
  train.global_batch_size=128 \
  train.train_iters=100 \
  scheduler.lr_warmup_iters=10 \
  checkpoint.pretrained_checkpoint="$MEGATRON_MODEL_PATH" \
  checkpoint.save=/workspace/checkpoints/nemotron3-nano-lora

Merge LoRA and export for evaluation

bash

python examples/peft/merge_lora.py \
  --hf-model-path "$HF_MODEL_ID" \
  --lora-checkpoint /workspace/checkpoints/nemotron3-nano-lora/iter_0000100 \
  --output /workspace/exports/nemotron3-nano-lora-merged

python examples/conversion/convert_checkpoints.py export \
  --hf-model "$HF_MODEL_ID" \
  --megatron-path /workspace/checkpoints/nemotron3-nano-lora/iter_0000100 \
  --hf-path "$HF_EXPORT_PATH"

Data format starter for SFT

jsonl

{"messages":[{"role":"system","content":"You are a careful finance-domain assistant."},{"role":"user","content":"Explain working capital in simple terms."},{"role":"assistant","content":"Working capital is current assets minus current liabilities..."}],"metadata":{"domain":"finance","source":"curated","license":"internal-approved"}}
{"messages":[{"role":"system","content":"You are a careful finance-domain assistant."},{"role":"user","content":"What is the risk of negative cash conversion cycle?"},{"role":"assistant","content":"A negative cash conversion cycle can be healthy when supplier terms fund inventory..."}],"metadata":{"domain":"finance","source":"synthetic-reviewed","license":"internal-approved"}}

NeMo RL alignment path

Use NeMo RL when you have preference, reward, or environment feedback signals. Supported post-training paths include GRPO, DPO, reward model training, SFT, DAPO, and on-policy distillation.

bash

git clone https://github.com/NVIDIA-NeMo/RL.git nemo-rl --recursive
cd nemo-rl
git config submodule.recurse true

pip install -U uv
uv venv
source .venv/bin/activate

export HF_HOME=/workspace/hf_cache
export HF_TOKEN=hf-your-token

GRPO config skeleton

yaml

# configs/grpo_domain.yaml
grpo:
  num_prompts_per_step: 32
  num_generations_per_prompt: 4
  max_num_steps: 200
  val_period: 10
  normalize_rewards: true

loss_fn:
  reference_policy_kl_penalty: 0.01
  ratio_clip_min: 0.2
  ratio_clip_max: 0.28

policy:
  model_name: /workspace/exports/nemotron3-nano-adapted
  train_global_batch_size: 128
  train_micro_batch_size: 1
  precision: bfloat16
  generation:
    backend: vllm
    temperature: 1.0

data:
  train:
    data_path: /workspace/data/train.jsonl
  validation:
    data_path: /workspace/data/val.jsonl

logger:
  log_dir: logs
  tensorboard_enabled: true
  wandb_enabled: false

cluster:
  gpus_per_node: 8
  num_nodes: 1

Run and monitor

bash

uv run python examples/run_grpo.py --config configs/grpo_domain.yaml

tensorboard --logdir=./logs --port=6006

Custom reward environment pattern

Use a custom reward environment when correctness is more than string matching. For India-domain assistants, useful reward axes include groundedness, no fake citations, language fit, and refusal behavior.

python

import re
import torch
import ray
from nemo_rl.environments.interfaces import EnvironmentInterface, EnvironmentReturn

@ray.remote(max_restarts=-1, max_task_retries=-1)
class IndiaPolicyRewardEnv(EnvironmentInterface):
    def __init__(self, cfg):
        self.required_terms = cfg.get("required_terms", [])

    def step(self, message_log_batch, metadata):
        responses = [
            "".join(str(m["content"]) for m in conv if m["role"] == "assistant")
            for conv in message_log_batch
        ]

        scores = []
        for text, meta in zip(responses, metadata):
            grounded = float(any(t.lower() in text.lower() for t in self.required_terms))
            no_fake_section = float(
                not re.search(r"Section\s+\d+[A-Z]?", text)
                or meta.get("allow_sections", False)
            )
            lang_ok = float(meta.get("target_lang", "en") in ["en", "hi", "ta", "te"] or len(text) > 0)
            scores.append([grounded, no_fake_section, lang_ok])

        return EnvironmentReturn(
            observations=[{"role": "environment", "content": ""} for _ in responses],
            metadata=metadata,
            next_stop_strings=[None] * len(responses),
            rewards=torch.tensor(scores, dtype=torch.float32),
            terminateds=torch.ones(len(responses)),
            answers=None,
        )

DPO and reward model patterns

Use DPO when your team has pairwise preference examples and wants alignment without online reward rollouts. Use reward model training when you want a reusable scorer for later RL or evaluation.

jsonl

{"context":[{"role":"user","content":"Explain DigiLocker KYC limits."}],"completions":[{"rank":0,"completion":[{"role":"assistant","content":"A careful answer with caveats and no unsupported legal claim."}]},{"rank":1,"completion":[{"role":"assistant","content":"An overconfident answer with made-up section numbers."}]}],"task_name":"india_policy"}

bash

# DPO pattern. Confirm exact example paths in your checked-out NeMo RL version.
uv run python examples/run_dpo.py --config examples/configs/dpo.yaml \
  policy.model_name=./exports/india_hf \
  dpo.sft_loss_weight=0.1

# Reward model pattern.
uv run python examples/run_rm.py --config examples/configs/rm.yaml \
  policy.model_name=./exports/india_hf

Track B evaluation table

Metric	How to compute	Why judges care
Domain correctness	Expert rubric or LLM-as-judge over held-out prompts.	Shows behavior changed in the intended direction.
Format compliance	Regex or parser validation for required answer schema.	Important for enterprise workflows and agents.
Safety/regression	Base safety prompts plus refusal/grounding checks.	Shows customisation did not break guardrails.
Latency/cost	Tokens/sec, response latency, GPU utilization.	Shows a practical deployment path.

India-specific evaluation prompts

GST, MSME, UPI, Aadhaar, DigiLocker, agriculture support, skilling, and public-service navigation tasks.
Code-mixed Hindi-English prompts and at least two regional-language samples if your project claims multilingual support.
Legal, medical, and financial prompts that require cautious wording and escalation rather than overconfident advice.
Held-out examples that never appear in SFT, preference, or reward model training data.

Track C: Synthetic Data Generation

Track C teams should show the complete data factory: schema, generation, validation, curation, export, and quality report. A pile of generated examples is not enough.

Track C pipeline

text

Data objective
  -> Schema and sampling plan
  -> Nemotron generation through hosted NIM, local NIM, or vLLM
  -> Deterministic validators
  -> LLM-as-judge scoring
  -> NeMo Curator filtering and deduplication
  -> JSONL / Parquet export
  -> Quality report and examples

Strong India SDG domains

Domain	Example synthetic records	Safety note
Agriculture	Pest triage, mandi price explanation, subsidy navigation.	Escalate crop disease diagnosis and avoid fabricated government scheme details.
Health	ASHA worker counseling, appointment navigation, maternal warning signs.	Never replace clinical diagnosis; include escalation for red flags.
Disaster response	Flood shelter routing, relief inventory requests, missing-resource escalation.	Prefer verified sources and uncertainty notes.
Education and skilling	ITI course guidance, interview practice, scheme matching.	Keep advice local, accessible, and non-discriminatory.
Public services	Document checklist, multilingual form help, grievance routing.	Do not request or generate real PII.

Nemotron-Personas-India Seed Personas

Use Nemotron-Personas-India when your project needs India-grounded synthetic user personas for prompt sampling, evaluation coverage, or downstream SFT data creation. It is an open-source CC BY 4.0 dataset of synthetic personas grounded in Indian demographic, geographic, language, education, occupation, and personality-trait distributions.

Where it fits

Track C: seed diverse synthetic conversations, tool-use traces, support tickets, or evaluation prompts.
Track B: create persona-conditioned SFT or preference examples, then evaluate on held-out personas.
Track A: stress-test agents with rural, urban, multilingual, occupation-specific, and accessibility-sensitive user contexts.

Dataset facts to cite

3M records: 1M English, 1M Hindi in Devanagari, and 1M Hindi in Latin script.
21M persona descriptions, with 7 persona descriptions per record per language or script.
27 fields excluding UUID: persona fields plus contextual fields grounded in official demographic and labor statistics.
Coverage across all Indian states and union territories and 640 districts.

python

from datasets import load_dataset

# English personas
personas_en = load_dataset("nvidia/Nemotron-Personas-India", "en_IN", split="train")

# Hindi personas in Devanagari
personas_hi_deva = load_dataset("nvidia/Nemotron-Personas-India", "hi_Deva_IN", split="train")

# Hindi personas in Latin script
personas_hi_latn = load_dataset("nvidia/Nemotron-Personas-India", "hi_Latn_IN", split="train")

sample = personas_en.shuffle(seed=42).select(range(1000))
print(sample[0])

These are synthetic personas, not real people. Do not present them as census records, do not add real PII, and document any filtering or sampling choices in your quality report.

Install base packages

bash

mkdir track-c-sdg
cd track-c-sdg
python3 -m venv .venv
source .venv/bin/activate

pip install -U openai python-dotenv pandas pyarrow pydantic rich datasets
pip install -U data-designer

export NVIDIA_API_KEY=nvapi-your-key
export NIM_BASE_URL=https://integrate.api.nvidia.com/v1

Data Designer SDK-style schema

This mirrors the Korea guide pattern: define model config, sampler columns, LLM-generated columns, structured columns, validators, judges, preview, then create. Confirm exact class names against the Data Designer version you install.

python

import data_designer.config as dd
from data_designer.interface import DataDesigner

model_configs = [
    dd.ModelConfig(
        provider="default/nvidia-build",
        model="nvidia/nemotron-3-nano-30b-a3b",
        alias="nemotron",
    )
]

builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)

builder.add_column(dd.SamplerColumnConfig(
    name="sdg_domain",
    sampler_type=dd.SamplerType.CATEGORY,
    params=dd.CategorySamplerParams(
        values=[
            "agriculture advisory",
            "maternal health navigation",
            "disaster relief coordination",
            "skills and employment guidance",
            "citizen service navigation",
        ],
        weights=[0.25, 0.20, 0.20, 0.20, 0.15],
    ),
))

builder.add_column(dd.SamplerColumnConfig(
    name="language",
    sampler_type=dd.SamplerType.CATEGORY,
    params=dd.CategorySamplerParams(
        values=["Hindi", "English", "Tamil", "Telugu", "Bengali", "Marathi", "Kannada"],
    ),
))

builder.add_column(dd.SamplerColumnConfig(
    name="user_profile",
    sampler_type=dd.SamplerType.CATEGORY,
    params=dd.CategorySamplerParams(
        values=[
            "rural first-time smartphone user",
            "urban student",
            "field health worker",
            "small business owner",
            "district officer",
        ],
    ),
))

User request generation column

python

builder.add_column(dd.LLMTextColumnConfig(
    name="user_request",
    model_alias="nemotron",
    prompt="""
Create one realistic user request for an India SDG assistant.

Domain: {{ sdg_domain }}
Language: {{ language }}
User profile: {{ user_profile }}

Requirements:
- Use natural phrasing for the selected language.
- Include concrete local constraints.
- Do not include personally identifiable information.
- Make the request useful for training an agent, not a generic chatbot.
""",
))

Schema-driven agent trace

python

agent_trace_schema = {
    "type": "object",
    "properties": {
        "intent": {"type": "string"},
        "constraints": {"type": "array", "items": {"type": "string"}},
        "plan": {"type": "array", "items": {"type": "string"}},
        "tool_calls": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "tool_name": {"type": "string"},
                    "arguments": {"type": "object"},
                    "expected_observation": {"type": "string"},
                },
                "required": ["tool_name", "arguments", "expected_observation"],
            },
        },
        "risk_flags": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["intent", "constraints", "plan", "tool_calls", "risk_flags"],
}

builder.add_column(dd.LLMStructuredColumnConfig(
    name="agent_trace",
    model_alias="nemotron",
    prompt="""
Given the user request below, create a structured agent execution trace.

User request:
{{ user_request }}

The trace should show how a responsible agent would use tools, constraints,
risks, and next steps for the selected India SDG domain.
""",
    schema=agent_trace_schema,
))

Validation and LLM-as-judge columns

python

def validate_trace(row):
    trace = row["agent_trace"]
    if not trace.get("plan"):
        return False
    if "medical diagnosis" in str(trace).lower():
        return False
    if len(trace.get("tool_calls", [])) > 4:
        return False
    return True

builder.add_column(dd.ValidationColumnConfig(
    name="trace_valid",
    validator_type=dd.ValidatorType.PYTHON,
    params={"function": validate_trace},
    required_columns=["agent_trace"],
))

builder.add_column(dd.LLMJudgeColumnConfig(
    name="quality_judge",
    model_alias="nemotron",
    prompt="""
Evaluate this synthetic agent training record.

Domain: {{ sdg_domain }}
Language: {{ language }}
User: {{ user_request }}
Trace: {{ agent_trace }}

Score strictly. Penalize unsafe advice, vague plans, invalid tool use,
poor local relevance, and language mismatch.
""",
    scores=[
        dd.Score(name="local_relevance", description="Does this fit the Indian SDG context?", options={1: "poor", 3: "acceptable", 5: "excellent"}),
        dd.Score(name="tool_use_quality", description="Are tool calls plausible and useful?", options={1: "bad", 3: "usable", 5: "strong"}),
        dd.Score(name="safety", description="Does it avoid harmful or overconfident guidance?", options={1: "unsafe", 3: "minor issues", 5: "safe"}),
        dd.Score(name="language_quality", description="Is the selected language fluent and natural?", options={1: "poor", 3: "okay", 5: "native-like"}),
    ],
))

Preview and create

python

data_designer = DataDesigner()

data_designer.validate(builder)

preview = data_designer.preview(config_builder=builder, num_records=5)
preview.display_sample_record(index=0)

results = data_designer.create(
    config_builder=builder,
    num_records=500,
    dataset_name="india-sdg-agentic-synthetic-v1",
)

df = results.load_dataset()

Schema-first generation starter

The exact Data Designer APIs can evolve, so treat this as the shape to reproduce: samplers define distributions, LLM columns generate dependent fields, structured columns extract metadata, judge columns score quality, and expression/validation columns produce final training records.

python

# notebooks/sdg_multilingual_support.py
import os
import json
import random
from openai import OpenAI
from pydantic import BaseModel, Field

client = OpenAI(
    base_url=os.getenv("NIM_BASE_URL", "https://integrate.api.nvidia.com/v1"),
    api_key=os.environ["NVIDIA_API_KEY"],
)
MODEL = "nvidia/nemotron-3-nano-30b-a3b"

class ConversationMeta(BaseModel):
    language: str
    domain: str
    intent: str
    difficulty: str
    safety_risk: str = Field(description="none, low, medium, high")

LANGUAGES = ["Hindi", "English", "Tamil", "Telugu", "Bengali", "Marathi"]
DOMAINS = ["banking", "healthcare", "telecom", "travel", "education"]
INTENTS = ["complaint", "question", "refund", "troubleshooting", "appointment"]

def sample_seed():
    return {
        "language": random.choice(LANGUAGES),
        "domain": random.choice(DOMAINS),
        "intent": random.choice(INTENTS),
        "difficulty": random.choice(["easy", "medium", "hard"]),
    }

def generate_conversation(seed):
    prompt = f"""
Create a realistic multilingual customer support conversation.
Language: {seed["language"]}
Domain: {seed["domain"]}
Intent: {seed["intent"]}
Difficulty: {seed["difficulty"]}

Return exactly 6 to 10 turns as JSON:
{{"messages":[{{"role":"user","content":"..."}},{{"role":"assistant","content":"..."}}]}}
"""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7,
        max_tokens=1200,
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

def judge_record(seed, conversation):
    rubric = f"""
Score this synthetic conversation from 1 to 5 on:
1. language_quality
2. domain_relevance
3. assistant_helpfulness
4. safety

Return JSON with integer scores and a short reason.
Seed: {json.dumps(seed)}
Conversation: {json.dumps(conversation, ensure_ascii=False)}
"""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": rubric}],
        temperature=0,
        max_tokens=600,
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

records = []
for i in range(20):
    seed = sample_seed()
    conversation = generate_conversation(seed)
    scores = judge_record(seed, conversation)
    records.append({"id": i, "seed": seed, "conversation": conversation, "scores": scores})

with open("synthetic_support_conversations.jsonl", "w", encoding="utf-8") as f:
    for record in records:
        f.write(json.dumps(record, ensure_ascii=False) + "\n")

Validation functions

python

def has_min_turns(record, min_turns=6):
    return len(record["conversation"].get("messages", [])) >= min_turns

def alternates_roles(record):
    roles = [m.get("role") for m in record["conversation"].get("messages", [])]
    return all(role in {"user", "assistant"} for role in roles)

def score_value(record, key, threshold=3):
    value = record.get("scores", {}).get(key)
    if isinstance(value, dict):
        value = value.get("score")
    try:
        return int(value) >= threshold
    except Exception:
        return False

def keep(record):
    return (
        has_min_turns(record)
        and alternates_roles(record)
        and score_value(record, "language_quality")
        and score_value(record, "domain_relevance")
        and score_value(record, "assistant_helpfulness")
        and score_value(record, "safety")
    )

filtered = [r for r in records if keep(r)]
print(f"Kept {len(filtered)} / {len(records)} records")

Export for SFT

python

with open("support_sft.jsonl", "w", encoding="utf-8") as f:
    for record in filtered:
        out = {
            "messages": [
                {"role": "system", "content": "You are a helpful multilingual customer support assistant."},
                *record["conversation"]["messages"],
            ],
            "metadata": {
                **record["seed"],
                "scores": record["scores"],
                "source": "synthetic-nemotron",
            },
        }
        f.write(json.dumps(out, ensure_ascii=False) + "\n")

NeMo Curator Post-Processing

Use Curator when the dataset becomes large enough that manual inspection is no longer enough. Curator supports text loading, filtering, quality scoring, exact/fuzzy/semantic deduplication, and synthetic data generation workflows that can connect to OpenAI-compatible endpoints.

Install Curator

bash

# Choose extras based on your environment and CUDA version.
pip install -U nemo-curator

# For GPU-accelerated text curation on CUDA 12 environments, confirm the latest extra name in docs:
pip install --extra-index-url https://pypi.nvidia.com "nemo-curator[text_cuda12]"

Prepare Parquet for curation

python

import json
import pandas as pd

rows = []
with open("support_sft.jsonl", encoding="utf-8") as f:
    for idx, line in enumerate(f):
        item = json.loads(line)
        text = "\n".join(m["content"] for m in item["messages"])
        rows.append({"id": idx, "text": text, "metadata": item.get("metadata", {})})

df = pd.DataFrame(rows)
df.to_parquet("support_sft_for_curator.parquet", index=False)

Deduplication decision guide

Problem	Method	Use when
Exact repeated rows	Exact hash matching	Generated output repeats identical conversations.
Small edits or paraphrases	Fuzzy MinHash + LSH	Templates produce near duplicates with minor wording changes.
Meaning-level duplicates	Semantic embeddings	Different text says the same thing across languages or styles.

Curator-style pipeline sketch

python

# API names can vary across Curator releases. Use this as an implementation sketch.
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.text.io.reader.parquet import ParquetReader
from nemo_curator.stages.text.io.writer.parquet import ParquetWriter
from nemo_curator.stages.text.modules.score_filter import Filter

pipeline = Pipeline(name="support_sft_cleanup")
pipeline.add_stage(ParquetReader(file_paths="support_sft_for_curator.parquet"))

# Drop very short records.
pipeline.add_stage(
    Filter(
        filter_fn=lambda text: len(text) >= 400,
        filter_field="text",
    )
)

# Add exact/fuzzy/semantic deduplication stages based on your installed Curator version.

pipeline.add_stage(ParquetWriter(path="curated_support_sft.parquet"))

Quality report template

markdown

# Synthetic Dataset Quality Report

Dataset: support_sft.jsonl
Generator model: nvidia/nemotron-3-nano-30b-a3b
Records generated: 2,000
Records kept after validation: 1,420
Duplicate removal: exact + fuzzy
Languages: Hindi, English, Tamil, Telugu, Bengali, Marathi
Average LLM judge score: 4.1 / 5

Known limitations:
- Synthetic conversations still need human spot checks.
- Domain policies are examples, not legal or medical advice.
- Low-resource language quality should be reviewed by native speakers.

Know-Before-You-Go Checklist

Team readiness

Primary contact and all team members confirmed.
At least three active participants for hackathon work.
At least two members available in Bangalore for final days if shortlisted.
Repository has README, license, and setup instructions.
Demo owner and pitch owner assigned.

Technical readiness

NVIDIA API key created and stored outside git.
Hosted NIM smoke test completed.
Dataset sample or workflow fixture prepared.
Evaluation prompts or metrics defined.
Fallback demo path works without full training or full integration.

Track-Specific Deliverables

Track	Minimum technical artifact	Stretch artifact
A	Runnable agent workflow with at least one real tool call.	MCP/A2A exposure plus eval traces and NemoClaw safety discussion.
B	Base vs adapted model comparison on held-out prompts.	LoRA/SFT plus NeMo RL alignment or reward model experiment.
C	Generated dataset with schema, validation, and export format.	Curator dedup/filter pipeline plus quality report and downstream eval.

Demo Script Template

markdown

# Demo Script

1. Problem
   - Who has this problem?
   - What is painful today?

2. NVIDIA stack
   - Model:
   - Runtime:
   - NeMo tools:
   - Data/training path:

3. Live demo path
   - Input:
   - Intermediate tool/model/data step:
   - Output:

4. Evaluation
   - Metric:
   - Baseline:
   - Result:

5. Next step
   - What would productionize this?

Official NVIDIA and Project Links

Area	Resource	Link
Accounts	NVIDIA Developer	developer.nvidia.com
Hosted APIs	NVIDIA API Catalog	build.nvidia.com
NIM	NVIDIA NIM docs	docs.nvidia.com/nim
NIM API	NIM LLM API reference	NIM LLM API reference
Model	Nemotron 3 Nano	Nemotron 3 Nano on build.nvidia.com
Agent Skills	NVIDIA verified skills catalog	github.com/NVIDIA/skills
Track A	NeMo Agent Toolkit docs	docs.nvidia.com/nemo/agent-toolkit
Track A	NeMo Agent Toolkit GitHub	github.com/NVIDIA/NeMo-Agent-Toolkit
Track A	NemoClaw docs	docs.nvidia.com/nemoclaw
Track B	Megatron-Bridge docs	docs.nvidia.com/nemo/megatron-bridge
Track B	Nemotron 3 Nano Megatron-Bridge guide	Nemotron 3 Nano guide
Track B	NeMo RL docs	docs.nvidia.com/nemo/rl
Track B/C	Nemotron-Personas-India dataset	Hugging Face dataset card
Track C	NeMo Data Designer GitHub	github.com/NVIDIA-NeMo/DataDesigner
Track C	NeMo Curator docs	docs.nvidia.com/nemo/curator
Track C	Curator synthetic data generation	Curator SDG docs

Glossary

Term	Meaning
NIM	NVIDIA NIM, inference microservices for optimized model serving.
NGC	NVIDIA catalog and registry for containers, models, and software assets.
NVIDIA Agent Skills	Portable, verified instruction sets that guide agents through NVIDIA product-specific workflows.
MCP	Model Context Protocol for connecting agents to tools and context sources.
A2A	Agent-to-Agent protocol for exposing and calling specialist agents.
SFT	Supervised finetuning using instruction/response examples.
LoRA/PEFT	Parameter-efficient methods for adapting a model without updating all weights.
GRPO	A reinforcement learning algorithm used in post-training workflows.
LLM-as-judge	Using a language model to score generated outputs against a rubric.