Know-Before-You-Go

India Agentic AI Open Hackathon Technical Guide

A developer-facing starter repo in document form. Use it before the workshop and hackathon to set up accounts, choose the right NVIDIA path, copy starter configs, and arrive ready to build.

For eligible applicants and developer teams No hands-on workshop required Primary focus: Nemotron, NIM, NemoClaw, and NeMo

How To Use This Guide

This guide is intentionally more technical than the public workshop agenda. It is the companion material developers should use to prepare their machines, teams, repositories, and technical plan before the hackathon build days.

Before the workshop

  • Create NVIDIA Developer, build.nvidia.com, NGC, and Hugging Face accounts if needed.
  • Read the common NIM + Nemotron section.
  • Pick a primary track and one backup track.
  • Bring one architecture sketch and one technical blocker question.

During the workshop

  • Use tabs as reference material during demos.
  • Mark which code snippets your team will reuse.
  • Ask mentors where your project fits in the NVIDIA stack.
  • Decide what you will build on Day 0 and what you will defer.

Before hackathon Day 0

  • Have a runnable repo with a license and README.
  • Prepare small test data and an evaluation prompt set.
  • Confirm your inference path: hosted NIM, self-hosted NIM, vLLM, or cluster.
  • Define a demo path that works even if the full system is not complete.

Track Selection Map

If your project is mainly... Choose Use these NVIDIA assets Expected demo proof
Connecting tools, APIs, documents, agents, and workflows into a working assistant or automation service. Track A: Agentic Workflows Nemotron, NIM, NeMo Agent Toolkit, MCP, A2A, optional NemoClaw. A traced workflow with tool calls, result synthesis, and at least one reliability/evaluation metric.
Changing model behavior for a domain, persona, language, or task using training or post-training. Track B: Model Finetuning and Customisation NeMo, Megatron-Bridge, SFT, LoRA/PEFT, NeMo RL, NIM deployment. Base vs adapted comparison, training data sample, evaluation result, and deployment path.
Creating structured, high-quality training or evaluation data for low-resource, domain, or agentic tasks. Track C: Synthetic Data Generation Nemotron, NeMo Data Designer, NeMo Curator, LLM-as-judge, validators. A generated dataset sample plus quality filters, deduplication, and export format.

Recommended Repo Structure

text
india-agentic-ai-hackathon/
  README.md
  LICENSE
  .gitignore
  docs/
    architecture.md
    demo-script.md
    evaluation-plan.md
  app/
    api/
    ui/
    agents/
  configs/
    nim.env.example
    nat/
    training/
    synthetic-data/
  data/
    samples/
    eval/
  notebooks/
  scripts/
    smoke_test.sh
    run_demo.sh
  results/
    screenshots/
    eval_report.md
Minimum bar for a strong technical submission: a clear track, runnable demo path, evidence of NVIDIA stack usage, and a small but believable evaluation loop.

Common Foundation: NIM + Nemotron

NIM and Nemotron are the common layer across all three tracks. NIM gives you a production-style inference endpoint with an OpenAI-compatible API. Nemotron gives you a strong NVIDIA model family for reasoning, tool use, long-context work, and synthetic data generation.

Accounts and keys

Hosted path

  1. Sign in at build.nvidia.com.
  2. Open the model page you want to test.
  3. Create an API key and store it as NVIDIA_API_KEY.
  4. Use https://integrate.api.nvidia.com/v1 as the base URL.

Self-hosted path

  1. Prepare Docker and NVIDIA Container Toolkit.
  2. Log in to nvcr.io with an NGC API key.
  3. Pull/run the NIM container for your selected model.
  4. Use your local container URL, usually http://localhost:8000/v1.
Keep API keys out of notebooks, screenshots, commits, and pitch decks. Commit only .env.example, never .env.

NIM path comparison

ItemHosted NIMSelf-hosted NIM
Best forFast prototyping, laptops, early demos.Data control, private infrastructure, lower network latency.
Local GPUNot required.Required for local deployment.
KeyNVIDIA_API_KEY from build.nvidia.com.NGC_API_KEY for pulling containers and assets.
Base URLhttps://integrate.api.nvidia.com/v1http://localhost:8000/v1 or cluster endpoint.
Hackathon adviceStart here for all teams.Move here when privacy, latency, or deployment realism matters.

Nemotron model selection

ModelUse in hackathonRule of thumb
nvidia/nemotron-3-nano-30b-a3bDefault starting model for agentic reasoning, tool calling, and SDG prototyping.Start with Nano unless you have a clear reason not to.
nvidia/nemotron-3-super-120b-a12bHigher-quality planning, heavier multi-agent workflows, and more complex data generation.Upgrade when quality matters more than latency or cost.
Nemotron multimodal or VL variantsOptional extension for image, document, or visual reasoning projects.Use only when the project actually needs multimodal inputs.

Starter environment

bash
mkdir nvidia-ai-starter
cd nvidia-ai-starter
python3 -m venv .venv
source .venv/bin/activate

pip install -U openai python-dotenv pandas rich

cat > .env.example <<'EOF'
NVIDIA_API_KEY=nvapi-your-key
NIM_BASE_URL=https://integrate.api.nvidia.com/v1
NIM_MODEL=nvidia/nemotron-3-nano-30b-a3b
EOF

cp .env.example .env
echo ".env" >> .gitignore

Chat completion starter

python
# scripts/nim_chat_smoke_test.py
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(
    base_url=os.getenv("NIM_BASE_URL", "https://integrate.api.nvidia.com/v1"),
    api_key=os.getenv("NVIDIA_API_KEY", "not-used"),
)

model = os.getenv("NIM_MODEL", "nvidia/nemotron-3-nano-30b-a3b")

response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a concise technical mentor for hackathon teams."},
        {"role": "user", "content": "Suggest a Track A architecture for invoice processing."},
    ],
    temperature=0.2,
    max_tokens=600,
)

print(response.choices[0].message.content)

Streaming response

python
stream = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Create a demo script for a multilingual support agent."}],
    stream=True,
    temperature=0.3,
    max_tokens=700,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Reasoning controls

Use non-thinking mode for fast routing and tool loops. Use thinking mode for deeper planning, architecture comparison, and data quality review. Avoid combining experimental parallel reasoning modes with tool-calling loops unless the model documentation explicitly supports it.

python
# Fast concise mode
resp = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Draft a simple grievance-routing agent architecture."}],
    temperature=0,
    max_tokens=700,
    extra_body={
        "chat_template_kwargs": {
            "enable_thinking": False
        }
    },
)
print(resp.choices[0].message.content)

# Deeper planning mode
resp = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Compare three architectures for a public-services assistant and choose one."}],
    temperature=0,
    max_tokens=1800,
    extra_body={
        "chat_template_kwargs": {
            "enable_thinking": True
        }
    },
)
print(resp.choices[0].message.content)

Tool calling pattern

Use tool calling when the model should decide when to call an application function, database lookup, policy engine, calculator, search API, or workflow step.

python
import json

tools = [{
    "type": "function",
    "function": {
        "name": "lookup_policy",
        "description": "Look up a company policy by topic.",
        "parameters": {
            "type": "object",
            "properties": {
                "topic": {"type": "string", "description": "Policy topic, for example travel or reimbursement"}
            },
            "required": ["topic"],
        },
    },
}]

def lookup_policy(topic: str) -> str:
    policies = {
        "travel": "Flights require manager approval. Hotels must be under the city cap.",
        "reimbursement": "Submit receipts within 30 days with project code and GST details.",
    }
    return policies.get(topic.lower(), "No policy found for that topic.")

messages = [{"role": "user", "content": "Can I expense a hotel for my Bangalore workshop trip?"}]

first = client.chat.completions.create(
    model=model,
    messages=messages,
    tools=tools,
    tool_choice="auto",
    temperature=0,
)

assistant_message = first.choices[0].message
messages.append(assistant_message)

if assistant_message.tool_calls:
    for call in assistant_message.tool_calls:
        args = json.loads(call.function.arguments)
        result = lookup_policy(**args)
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": result,
        })

    final = client.chat.completions.create(model=model, messages=messages, temperature=0.2)
    print(final.choices[0].message.content)

Self-hosted NIM smoke tests

bash
# Log in to NGC before pulling private/entitled containers.
docker login nvcr.io -u '$oauthtoken' -p "$NGC_API_KEY"

# Example pattern. Confirm the exact container path from the model page or NIM docs.
docker run --rm -it --gpus all \
  -e NGC_API_KEY="$NGC_API_KEY" \
  -v "$HOME/.cache/nim:/opt/nim/.cache" \
  -p 8000:8000 \
  nvcr.io/nim/nvidia/nemotron-3-nano-30b-a3b:latest

curl http://localhost:8000/v1/models

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/nemotron-3-nano-30b-a3b",
    "messages": [{"role": "user", "content": "Say hello from local NIM."}],
    "max_tokens": 64
  }'

Model-free NIM option

Use model-free NIM only when you need to serve a supported custom or newer model path and understand the deployment requirements. For most hackathon teams, hosted NIM or a model-specific NIM container is simpler.

bash
export LOCAL_NIM_CACHE="$HOME/.cache/nim"
mkdir -p "$LOCAL_NIM_CACHE"

export NIM_LLM_IMAGE="nvcr.io/nim/nvidia/model-free-nim:latest"
export NIM_MODEL_PATH="hf://meta-llama/Llama-3.1-8B-Instruct"

docker run --gpus all \
  -e NIM_MODEL_PATH="$NIM_MODEL_PATH" \
  -e HF_TOKEN="$HF_TOKEN" \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -p 8000:8000 \
  "$NIM_LLM_IMAGE"

Common debugging checklist

SymptomCheckFix
401 or 403API key, account access, endpoint URL.Regenerate key, re-export env var, verify model access.
404 model not foundModel ID mismatch between hosted endpoint and self-hosted endpoint.Call /v1/models and use returned model ID.
Slow first responseContainer cold start, model download, cache miss.Warm up endpoint before demos and cache models.
JSON/tool parse errorsPrompt too loose or schema too complex.Simplify schema, set temperature low, validate arguments.

Track A: Agentic Workflows

Track A teams should aim to show a working agentic service, not just a chatbot. The service should plan, call tools, recover from failure, expose a usable interface, and report at least basic quality or latency metrics.

Reference architecture

text
User / UI / API
  -> NeMo Agent Toolkit workflow
  -> Nemotron via NIM for reasoning
  -> Tools:
       - search / RAG
       - databases
       - business APIs
       - document parsers
       - calculators / validators
  -> Optional MCP server/client boundary
  -> Optional A2A specialist agent delegation
  -> Trace, evaluation, final answer

Install NeMo Agent Toolkit

bash
mkdir track-a-agent
cd track-a-agent
python3 -m venv .venv
source .venv/bin/activate

pip install -U uv
uv pip install "nvidia-nat[langchain,mcp,a2a,eval]"

export NVIDIA_API_KEY=nvapi-your-key
nat --help
nat info components -t function

Minimal ReAct workflow config

yaml
# configs/invoice_agent.yml
llms:
  nim_llm:
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    api_key: ${NVIDIA_API_KEY}
    base_url: ${NIM_BASE_URL:-https://integrate.api.nvidia.com/v1}
    temperature: 0.0
    max_tokens: 1024

functions:
  current_datetime:
    _type: current_datetime

  wikipedia_search:
    _type: wiki_search
    max_results: 3

  code_generator:
    _type: code_generation
    programming_language: Python
    llm_name: nim_llm
    description: "Generate Python helper code only when a code artifact is required."

workflow:
  _type: react_agent
  llm_name: nim_llm
  tool_names:
    - current_datetime
    - wikipedia_search
    - code_generator
  verbose: true
  parse_agent_response_max_retries: 2

Workflow config patterns

Config blockWhat it controls
llmsModel providers, model names, generation settings, and NIM base URLs.
functionsBuilt-in tools, custom Python tools, retrieval tools, calculators, and code tools.
function_groupsGrouped dynamic tools, especially MCP client tool groups.
workflowThe agent, router, sequential flow, or executor that binds model and tools.
evalDatasets, evaluators, metrics, and output paths for quality checks.
generalTelemetry, logging, retries, object stores, and runtime-level settings.
Workflow typeUse when
react_agentYou need a first agent with explicit tool reasoning and easy logs.
Tool-calling workflowYou want cleaner structured tool calls and less free-form reasoning.
Router agentYou need to send India-specific tasks to specialist workflows.
Sequential workflowYou need deterministic steps such as extract, validate, retrieve, respond.
Parallel workflowYou want several tools or specialists to answer, then merge results.

Run, serve, and test

bash
# Run once from CLI.
nat run --config_file configs/invoice_agent.yml \
  --input "Design an enterprise invoice triage workflow and list the tools it needs."

# Serve as an HTTP API.
nat serve --config_file configs/invoice_agent.yml --host 0.0.0.0 --port 8000

# Call it from another terminal.
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Check this invoice workflow for missing approval steps."}
    ]
  }'

MCP server pattern

Expose workflow functions as MCP tools when another client or agent needs to call them.

bash
# Publish functions from the workflow as MCP tools.
nat mcp serve --config_file configs/invoice_agent.yml --host 0.0.0.0 --port 9901

# Common smoke test: list tools from an MCP-capable client, then call one tool.
# Keep network and credential policies explicit if connecting enterprise systems.

MCP client pattern

Use an MCP client group when your agent needs to consume tools from another local service, partner API wrapper, database adapter, or team-built tool server.

yaml
# configs/agent_with_mcp.yml
function_groups:
  mcp_public_data:
    _type: mcp_client
    server:
      transport: streamable-http
      url: "http://localhost:9901/mcp"
    include:
      - search_schemes
      - get_state_policy
    tool_call_timeout: 30
    reconnect_enabled: true
    tool_overrides:
      get_state_policy:
        alias: india_state_policy_lookup
        description: "Look up Indian state-level policy and program data."

workflow:
  _type: react_agent
  llm_name: nim_llm
  tool_names:
    - mcp_public_data
bash
nat serve --config_file configs/agent_with_mcp.yml --port 8000
curl -s http://localhost:8000/mcp/client/tool/list | jq

A2A server pattern

Use A2A when a workflow should be discoverable as a specialist agent by other agents.

bash
uv pip install "nvidia-nat[a2a]"

nat a2a serve \
  --config_file configs/invoice_agent.yml \
  --host 0.0.0.0 \
  --port 10000
bash
# From another terminal or another agent environment:
nat a2a client discover --url http://localhost:10000
nat a2a client get_skills --url http://localhost:10000
nat a2a client call \
  --url http://localhost:10000 \
  --message "Summarize incorporation steps for a DPIIT-recognized startup."

Evaluation harness

json
[
  {
    "input": "Invoice INV-102 has no PO number. What should happen?",
    "expected": "Flag for manual review or route to exception approval."
  },
  {
    "input": "Vendor is approved and amount is below threshold. What next?",
    "expected": "Proceed to payment queue after validation."
  }
]
yaml
# configs/eval_invoice_agent.yml
eval:
  general:
    output_dir: .tmp/eval_results
    dataset:
      _type: json
      file_path: data/eval/invoice_eval.json
  evaluators:
    accuracy:
      _type: ragas
      metric: AnswerAccuracy
      llm_name: nim_llm
bash
nat eval --config_file configs/eval_invoice_agent.yml

Observability and profiling

Every Track A team should inspect traces before demo day. Most agent failures are not model failures; they are retrieval misses, wrong tool choices, bad schemas, retries, or oversized context.

yaml
general:
  telemetry:
    tracing:
      console:
        _type: console
        enabled: true
      phoenix:
        _type: phoenix
        endpoint: http://localhost:6006
        enabled: ${PHOENIX_ENABLED:-false}
InspectQuestion to answer
Tool callsDid the agent choose the correct MCP, A2A, or custom tool?
LatencyWhich model call or tool call dominates response time?
Token usageAre retrieved documents or tool outputs too large?
RetriesAre tool argument schemas or output parsers failing?
Eval failuresIs the failure caused by retrieval, reasoning, or prompt design?

NemoClaw For Safe Agent Exploration

NemoClaw is useful when you want to discuss safer execution of always-on coding or automation assistants. It integrates OpenClaw with NVIDIA OpenShell sandboxing concepts.

NemoClaw is alpha software. APIs, configuration schemas, and runtime behavior can change. Do not use it in production environments.
bash
# Quickstart pattern from the current NemoClaw docs.
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
source ~/.bashrc

# Create or connect to a sandboxed assistant.
nemoclaw hackathon-agent connect

# Check status and logs.
nemoclaw hackathon-agent status
nemoclaw hackathon-agent logs --follow

NemoClaw discussion checklist

ConcernWhat to show in your architecture
Network egressWhich domains or APIs the agent can call, and which are blocked.
CredentialsHow secrets are isolated from prompts, logs, and agent memory.
FilesystemWhich project paths the agent can read/write.
Human controlWhere approvals, logs, or policy gates are surfaced.

Track A capstone templates

TemplateSuggested stack
Startup Compliance AgentNAT ReAct agent, MCP tax/policy tools, A2A legal-summary specialist, Nemotron via NIM.
Bharat Public Services NavigatorMCP connectors for state and central schemes, multilingual Nemotron answer layer, groundedness eval.
Safe Coding Copilot for Indian SaaS TeamsNemoClaw/OpenShell sandbox, NAT HTTP reviewer, insecure-code evaluation set.
Disaster Response Operations AgentRouter workflow, weather/geospatial MCP tools, A2A logistics agent, trace dashboard.

Track B: Model Finetuning and Customisation

Track B teams should be explicit about what model behavior they are changing and why. Your demo should compare base vs customised behavior, not only show that training ran.

Track B pipeline

1. Define behavior gap: What the base model cannot do well enough today.
2. Prepare data: Human data, domain data, or Track C-generated synthetic examples.
3. Train: Use full SFT or LoRA/PEFT through Megatron-Bridge where appropriate.
4. Align: Use NeMo RL for GRPO, DPO, RM, or SFT post-training when reward/preference signals matter.
5. Evaluate: Compare base vs adapted model on held-out prompts.

Environment and container

bash
# Confirm exact container tag from current Megatron-Bridge docs for your model.
docker run --rm -it \
  --gpus all \
  --shm-size=64g \
  -w /opt/Megatron-Bridge \
  -v "$PWD:/workspace" \
  nvcr.io/nvidia/nemo:25.11.nemotron_3_nano \
  bash

export HF_TOKEN=hf-your-token
export HF_MODEL_ID=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
export MEGATRON_MODEL_PATH=/workspace/checkpoints/nemotron3-nano-megatron
export HF_EXPORT_PATH=/workspace/exports/nemotron3-nano-adapted

Convert Hugging Face checkpoint to Megatron

bash
python examples/conversion/convert_checkpoints.py import \
  --hf-model "$HF_MODEL_ID" \
  --megatron-path "$MEGATRON_MODEL_PATH" \
  --trust-remote-code

LoRA finetuning starter

bash
# Keep first hackathon runs small. Increase steps only after a smoke test succeeds.
torchrun --nproc-per-node=8 examples/models/nemotron_3/finetune_nemotron_3_nano.py \
  --peft lora \
  train.global_batch_size=128 \
  train.train_iters=100 \
  scheduler.lr_warmup_iters=10 \
  checkpoint.pretrained_checkpoint="$MEGATRON_MODEL_PATH" \
  checkpoint.save=/workspace/checkpoints/nemotron3-nano-lora

Merge LoRA and export for evaluation

bash
python examples/peft/merge_lora.py \
  --hf-model-path "$HF_MODEL_ID" \
  --lora-checkpoint /workspace/checkpoints/nemotron3-nano-lora/iter_0000100 \
  --output /workspace/exports/nemotron3-nano-lora-merged

python examples/conversion/convert_checkpoints.py export \
  --hf-model "$HF_MODEL_ID" \
  --megatron-path /workspace/checkpoints/nemotron3-nano-lora/iter_0000100 \
  --hf-path "$HF_EXPORT_PATH"

Data format starter for SFT

jsonl
{"messages":[{"role":"system","content":"You are a careful finance-domain assistant."},{"role":"user","content":"Explain working capital in simple terms."},{"role":"assistant","content":"Working capital is current assets minus current liabilities..."}],"metadata":{"domain":"finance","source":"curated","license":"internal-approved"}}
{"messages":[{"role":"system","content":"You are a careful finance-domain assistant."},{"role":"user","content":"What is the risk of negative cash conversion cycle?"},{"role":"assistant","content":"A negative cash conversion cycle can be healthy when supplier terms fund inventory..."}],"metadata":{"domain":"finance","source":"synthetic-reviewed","license":"internal-approved"}}

NeMo RL alignment path

Use NeMo RL when you have preference, reward, or environment feedback signals. Supported post-training paths include GRPO, DPO, reward model training, SFT, DAPO, and on-policy distillation.

bash
git clone https://github.com/NVIDIA-NeMo/RL.git nemo-rl --recursive
cd nemo-rl
git config submodule.recurse true

pip install -U uv
uv venv
source .venv/bin/activate

export HF_HOME=/workspace/hf_cache
export HF_TOKEN=hf-your-token

GRPO config skeleton

yaml
# configs/grpo_domain.yaml
grpo:
  num_prompts_per_step: 32
  num_generations_per_prompt: 4
  max_num_steps: 200
  val_period: 10
  normalize_rewards: true

loss_fn:
  reference_policy_kl_penalty: 0.01
  ratio_clip_min: 0.2
  ratio_clip_max: 0.28

policy:
  model_name: /workspace/exports/nemotron3-nano-adapted
  train_global_batch_size: 128
  train_micro_batch_size: 1
  precision: bfloat16
  generation:
    backend: vllm
    temperature: 1.0

data:
  train:
    data_path: /workspace/data/train.jsonl
  validation:
    data_path: /workspace/data/val.jsonl

logger:
  log_dir: logs
  tensorboard_enabled: true
  wandb_enabled: false

cluster:
  gpus_per_node: 8
  num_nodes: 1

Run and monitor

bash
uv run python examples/run_grpo.py --config configs/grpo_domain.yaml

tensorboard --logdir=./logs --port=6006

Custom reward environment pattern

Use a custom reward environment when correctness is more than string matching. For India-domain assistants, useful reward axes include groundedness, no fake citations, language fit, and refusal behavior.

python
import re
import torch
import ray
from nemo_rl.environments.interfaces import EnvironmentInterface, EnvironmentReturn

@ray.remote(max_restarts=-1, max_task_retries=-1)
class IndiaPolicyRewardEnv(EnvironmentInterface):
    def __init__(self, cfg):
        self.required_terms = cfg.get("required_terms", [])

    def step(self, message_log_batch, metadata):
        responses = [
            "".join(str(m["content"]) for m in conv if m["role"] == "assistant")
            for conv in message_log_batch
        ]

        scores = []
        for text, meta in zip(responses, metadata):
            grounded = float(any(t.lower() in text.lower() for t in self.required_terms))
            no_fake_section = float(
                not re.search(r"Section\s+\d+[A-Z]?", text)
                or meta.get("allow_sections", False)
            )
            lang_ok = float(meta.get("target_lang", "en") in ["en", "hi", "ta", "te"] or len(text) > 0)
            scores.append([grounded, no_fake_section, lang_ok])

        return EnvironmentReturn(
            observations=[{"role": "environment", "content": ""} for _ in responses],
            metadata=metadata,
            next_stop_strings=[None] * len(responses),
            rewards=torch.tensor(scores, dtype=torch.float32),
            terminateds=torch.ones(len(responses)),
            answers=None,
        )

DPO and reward model patterns

Use DPO when your team has pairwise preference examples and wants alignment without online reward rollouts. Use reward model training when you want a reusable scorer for later RL or evaluation.

jsonl
{"context":[{"role":"user","content":"Explain DigiLocker KYC limits."}],"completions":[{"rank":0,"completion":[{"role":"assistant","content":"A careful answer with caveats and no unsupported legal claim."}]},{"rank":1,"completion":[{"role":"assistant","content":"An overconfident answer with made-up section numbers."}]}],"task_name":"india_policy"}
bash
# DPO pattern. Confirm exact example paths in your checked-out NeMo RL version.
uv run python examples/run_dpo.py --config examples/configs/dpo.yaml \
  policy.model_name=./exports/india_hf \
  dpo.sft_loss_weight=0.1

# Reward model pattern.
uv run python examples/run_rm.py --config examples/configs/rm.yaml \
  policy.model_name=./exports/india_hf

Track B evaluation table

MetricHow to computeWhy judges care
Domain correctnessExpert rubric or LLM-as-judge over held-out prompts.Shows behavior changed in the intended direction.
Format complianceRegex or parser validation for required answer schema.Important for enterprise workflows and agents.
Safety/regressionBase safety prompts plus refusal/grounding checks.Shows customisation did not break guardrails.
Latency/costTokens/sec, response latency, GPU utilization.Shows a practical deployment path.

India-specific evaluation prompts

  • GST, MSME, UPI, Aadhaar, DigiLocker, agriculture support, skilling, and public-service navigation tasks.
  • Code-mixed Hindi-English prompts and at least two regional-language samples if your project claims multilingual support.
  • Legal, medical, and financial prompts that require cautious wording and escalation rather than overconfident advice.
  • Held-out examples that never appear in SFT, preference, or reward model training data.

Track C: Synthetic Data Generation

Track C teams should show the complete data factory: schema, generation, validation, curation, export, and quality report. A pile of generated examples is not enough.

Track C pipeline

text
Data objective
  -> Schema and sampling plan
  -> Nemotron generation through hosted NIM, local NIM, or vLLM
  -> Deterministic validators
  -> LLM-as-judge scoring
  -> NeMo Curator filtering and deduplication
  -> JSONL / Parquet export
  -> Quality report and examples

Strong India SDG domains

DomainExample synthetic recordsSafety note
AgriculturePest triage, mandi price explanation, subsidy navigation.Escalate crop disease diagnosis and avoid fabricated government scheme details.
HealthASHA worker counseling, appointment navigation, maternal warning signs.Never replace clinical diagnosis; include escalation for red flags.
Disaster responseFlood shelter routing, relief inventory requests, missing-resource escalation.Prefer verified sources and uncertainty notes.
Education and skillingITI course guidance, interview practice, scheme matching.Keep advice local, accessible, and non-discriminatory.
Public servicesDocument checklist, multilingual form help, grievance routing.Do not request or generate real PII.

Install base packages

bash
mkdir track-c-sdg
cd track-c-sdg
python3 -m venv .venv
source .venv/bin/activate

pip install -U openai python-dotenv pandas pyarrow pydantic rich
pip install -U data-designer

export NVIDIA_API_KEY=nvapi-your-key
export NIM_BASE_URL=https://integrate.api.nvidia.com/v1

Data Designer SDK-style schema

This mirrors the Korea guide pattern: define model config, sampler columns, LLM-generated columns, structured columns, validators, judges, preview, then create. Confirm exact class names against the Data Designer version you install.

python
import data_designer.config as dd
from data_designer.interface import DataDesigner

model_configs = [
    dd.ModelConfig(
        provider="default/nvidia-build",
        model="nvidia/nemotron-3-nano-30b-a3b",
        alias="nemotron",
    )
]

builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)

builder.add_column(dd.SamplerColumnConfig(
    name="sdg_domain",
    sampler_type=dd.SamplerType.CATEGORY,
    params=dd.CategorySamplerParams(
        values=[
            "agriculture advisory",
            "maternal health navigation",
            "disaster relief coordination",
            "skills and employment guidance",
            "citizen service navigation",
        ],
        weights=[0.25, 0.20, 0.20, 0.20, 0.15],
    ),
))

builder.add_column(dd.SamplerColumnConfig(
    name="language",
    sampler_type=dd.SamplerType.CATEGORY,
    params=dd.CategorySamplerParams(
        values=["Hindi", "English", "Tamil", "Telugu", "Bengali", "Marathi", "Kannada"],
    ),
))

builder.add_column(dd.SamplerColumnConfig(
    name="user_profile",
    sampler_type=dd.SamplerType.CATEGORY,
    params=dd.CategorySamplerParams(
        values=[
            "rural first-time smartphone user",
            "urban student",
            "field health worker",
            "small business owner",
            "district officer",
        ],
    ),
))

User request generation column

python
builder.add_column(dd.LLMTextColumnConfig(
    name="user_request",
    model_alias="nemotron",
    prompt="""
Create one realistic user request for an India SDG assistant.

Domain: {{ sdg_domain }}
Language: {{ language }}
User profile: {{ user_profile }}

Requirements:
- Use natural phrasing for the selected language.
- Include concrete local constraints.
- Do not include personally identifiable information.
- Make the request useful for training an agent, not a generic chatbot.
""",
))

Schema-driven agent trace

python
agent_trace_schema = {
    "type": "object",
    "properties": {
        "intent": {"type": "string"},
        "constraints": {"type": "array", "items": {"type": "string"}},
        "plan": {"type": "array", "items": {"type": "string"}},
        "tool_calls": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "tool_name": {"type": "string"},
                    "arguments": {"type": "object"},
                    "expected_observation": {"type": "string"},
                },
                "required": ["tool_name", "arguments", "expected_observation"],
            },
        },
        "risk_flags": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["intent", "constraints", "plan", "tool_calls", "risk_flags"],
}

builder.add_column(dd.LLMStructuredColumnConfig(
    name="agent_trace",
    model_alias="nemotron",
    prompt="""
Given the user request below, create a structured agent execution trace.

User request:
{{ user_request }}

The trace should show how a responsible agent would use tools, constraints,
risks, and next steps for the selected India SDG domain.
""",
    schema=agent_trace_schema,
))

Validation and LLM-as-judge columns

python
def validate_trace(row):
    trace = row["agent_trace"]
    if not trace.get("plan"):
        return False
    if "medical diagnosis" in str(trace).lower():
        return False
    if len(trace.get("tool_calls", [])) > 4:
        return False
    return True

builder.add_column(dd.ValidationColumnConfig(
    name="trace_valid",
    validator_type=dd.ValidatorType.PYTHON,
    params={"function": validate_trace},
    required_columns=["agent_trace"],
))

builder.add_column(dd.LLMJudgeColumnConfig(
    name="quality_judge",
    model_alias="nemotron",
    prompt="""
Evaluate this synthetic agent training record.

Domain: {{ sdg_domain }}
Language: {{ language }}
User: {{ user_request }}
Trace: {{ agent_trace }}

Score strictly. Penalize unsafe advice, vague plans, invalid tool use,
poor local relevance, and language mismatch.
""",
    scores=[
        dd.Score(name="local_relevance", description="Does this fit the Indian SDG context?", options={1: "poor", 3: "acceptable", 5: "excellent"}),
        dd.Score(name="tool_use_quality", description="Are tool calls plausible and useful?", options={1: "bad", 3: "usable", 5: "strong"}),
        dd.Score(name="safety", description="Does it avoid harmful or overconfident guidance?", options={1: "unsafe", 3: "minor issues", 5: "safe"}),
        dd.Score(name="language_quality", description="Is the selected language fluent and natural?", options={1: "poor", 3: "okay", 5: "native-like"}),
    ],
))

Preview and create

python
data_designer = DataDesigner()

data_designer.validate(builder)

preview = data_designer.preview(config_builder=builder, num_records=5)
preview.display_sample_record(index=0)

results = data_designer.create(
    config_builder=builder,
    num_records=500,
    dataset_name="india-sdg-agentic-synthetic-v1",
)

df = results.load_dataset()

Schema-first generation starter

The exact Data Designer APIs can evolve, so treat this as the shape to reproduce: samplers define distributions, LLM columns generate dependent fields, structured columns extract metadata, judge columns score quality, and expression/validation columns produce final training records.

python
# notebooks/sdg_multilingual_support.py
import os
import json
import random
from openai import OpenAI
from pydantic import BaseModel, Field

client = OpenAI(
    base_url=os.getenv("NIM_BASE_URL", "https://integrate.api.nvidia.com/v1"),
    api_key=os.environ["NVIDIA_API_KEY"],
)
MODEL = "nvidia/nemotron-3-nano-30b-a3b"

class ConversationMeta(BaseModel):
    language: str
    domain: str
    intent: str
    difficulty: str
    safety_risk: str = Field(description="none, low, medium, high")

LANGUAGES = ["Hindi", "English", "Tamil", "Telugu", "Bengali", "Marathi"]
DOMAINS = ["banking", "healthcare", "telecom", "travel", "education"]
INTENTS = ["complaint", "question", "refund", "troubleshooting", "appointment"]

def sample_seed():
    return {
        "language": random.choice(LANGUAGES),
        "domain": random.choice(DOMAINS),
        "intent": random.choice(INTENTS),
        "difficulty": random.choice(["easy", "medium", "hard"]),
    }

def generate_conversation(seed):
    prompt = f"""
Create a realistic multilingual customer support conversation.
Language: {seed["language"]}
Domain: {seed["domain"]}
Intent: {seed["intent"]}
Difficulty: {seed["difficulty"]}

Return exactly 6 to 10 turns as JSON:
{{"messages":[{{"role":"user","content":"..."}},{{"role":"assistant","content":"..."}}]}}
"""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7,
        max_tokens=1200,
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

def judge_record(seed, conversation):
    rubric = f"""
Score this synthetic conversation from 1 to 5 on:
1. language_quality
2. domain_relevance
3. assistant_helpfulness
4. safety

Return JSON with integer scores and a short reason.
Seed: {json.dumps(seed)}
Conversation: {json.dumps(conversation, ensure_ascii=False)}
"""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": rubric}],
        temperature=0,
        max_tokens=600,
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

records = []
for i in range(20):
    seed = sample_seed()
    conversation = generate_conversation(seed)
    scores = judge_record(seed, conversation)
    records.append({"id": i, "seed": seed, "conversation": conversation, "scores": scores})

with open("synthetic_support_conversations.jsonl", "w", encoding="utf-8") as f:
    for record in records:
        f.write(json.dumps(record, ensure_ascii=False) + "\n")

Validation functions

python
def has_min_turns(record, min_turns=6):
    return len(record["conversation"].get("messages", [])) >= min_turns

def alternates_roles(record):
    roles = [m.get("role") for m in record["conversation"].get("messages", [])]
    return all(role in {"user", "assistant"} for role in roles)

def score_value(record, key, threshold=3):
    value = record.get("scores", {}).get(key)
    if isinstance(value, dict):
        value = value.get("score")
    try:
        return int(value) >= threshold
    except Exception:
        return False

def keep(record):
    return (
        has_min_turns(record)
        and alternates_roles(record)
        and score_value(record, "language_quality")
        and score_value(record, "domain_relevance")
        and score_value(record, "assistant_helpfulness")
        and score_value(record, "safety")
    )

filtered = [r for r in records if keep(r)]
print(f"Kept {len(filtered)} / {len(records)} records")

Export for SFT

python
with open("support_sft.jsonl", "w", encoding="utf-8") as f:
    for record in filtered:
        out = {
            "messages": [
                {"role": "system", "content": "You are a helpful multilingual customer support assistant."},
                *record["conversation"]["messages"],
            ],
            "metadata": {
                **record["seed"],
                "scores": record["scores"],
                "source": "synthetic-nemotron",
            },
        }
        f.write(json.dumps(out, ensure_ascii=False) + "\n")

NeMo Curator Post-Processing

Use Curator when the dataset becomes large enough that manual inspection is no longer enough. Curator supports text loading, filtering, quality scoring, exact/fuzzy/semantic deduplication, and synthetic data generation workflows that can connect to OpenAI-compatible endpoints.

Install Curator

bash
# Choose extras based on your environment and CUDA version.
pip install -U nemo-curator

# For GPU-accelerated text curation on CUDA 12 environments, confirm the latest extra name in docs:
pip install --extra-index-url https://pypi.nvidia.com "nemo-curator[text_cuda12]"

Prepare Parquet for curation

python
import json
import pandas as pd

rows = []
with open("support_sft.jsonl", encoding="utf-8") as f:
    for idx, line in enumerate(f):
        item = json.loads(line)
        text = "\n".join(m["content"] for m in item["messages"])
        rows.append({"id": idx, "text": text, "metadata": item.get("metadata", {})})

df = pd.DataFrame(rows)
df.to_parquet("support_sft_for_curator.parquet", index=False)

Deduplication decision guide

ProblemMethodUse when
Exact repeated rowsExact hash matchingGenerated output repeats identical conversations.
Small edits or paraphrasesFuzzy MinHash + LSHTemplates produce near duplicates with minor wording changes.
Meaning-level duplicatesSemantic embeddingsDifferent text says the same thing across languages or styles.

Curator-style pipeline sketch

python
# API names can vary across Curator releases. Use this as an implementation sketch.
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.text.io.reader.parquet import ParquetReader
from nemo_curator.stages.text.io.writer.parquet import ParquetWriter
from nemo_curator.stages.text.modules.score_filter import Filter

pipeline = Pipeline(name="support_sft_cleanup")
pipeline.add_stage(ParquetReader(file_paths="support_sft_for_curator.parquet"))

# Drop very short records.
pipeline.add_stage(
    Filter(
        filter_fn=lambda text: len(text) >= 400,
        filter_field="text",
    )
)

# Add exact/fuzzy/semantic deduplication stages based on your installed Curator version.

pipeline.add_stage(ParquetWriter(path="curated_support_sft.parquet"))

Quality report template

markdown
# Synthetic Dataset Quality Report

Dataset: support_sft.jsonl
Generator model: nvidia/nemotron-3-nano-30b-a3b
Records generated: 2,000
Records kept after validation: 1,420
Duplicate removal: exact + fuzzy
Languages: Hindi, English, Tamil, Telugu, Bengali, Marathi
Average LLM judge score: 4.1 / 5

Known limitations:
- Synthetic conversations still need human spot checks.
- Domain policies are examples, not legal or medical advice.
- Low-resource language quality should be reviewed by native speakers.

Know-Before-You-Go Checklist

Team readiness

  • Primary contact and all team members confirmed.
  • At least three active participants for hackathon work.
  • At least two members available in Bangalore for final days if shortlisted.
  • Repository has README, license, and setup instructions.
  • Demo owner and pitch owner assigned.

Technical readiness

  • NVIDIA API key created and stored outside git.
  • Hosted NIM smoke test completed.
  • Dataset sample or workflow fixture prepared.
  • Evaluation prompts or metrics defined.
  • Fallback demo path works without full training or full integration.

Track-Specific Deliverables

TrackMinimum technical artifactStretch artifact
ARunnable agent workflow with at least one real tool call.MCP/A2A exposure plus eval traces and NemoClaw safety discussion.
BBase vs adapted model comparison on held-out prompts.LoRA/SFT plus NeMo RL alignment or reward model experiment.
CGenerated dataset with schema, validation, and export format.Curator dedup/filter pipeline plus quality report and downstream eval.

Demo Script Template

markdown
# Demo Script

1. Problem
   - Who has this problem?
   - What is painful today?

2. NVIDIA stack
   - Model:
   - Runtime:
   - NeMo tools:
   - Data/training path:

3. Live demo path
   - Input:
   - Intermediate tool/model/data step:
   - Output:

4. Evaluation
   - Metric:
   - Baseline:
   - Result:

5. Next step
   - What would productionize this?