A developer-facing starter repo in document form. Use it before the workshop and hackathon to set up accounts, choose the right NVIDIA path, copy starter configs, and arrive ready to build.
For eligible applicants and developer teamsNo hands-on workshop requiredPrimary focus: Nemotron, NIM, NemoClaw, and NeMo
How To Use This Guide
This guide is intentionally more technical than the public workshop agenda. It is the companion material developers should use to prepare their machines, teams, repositories, and technical plan before the hackathon build days.
Before the workshop
Create NVIDIA Developer, build.nvidia.com, NGC, and Hugging Face accounts if needed.
Read the common NIM + Nemotron section.
Pick a primary track and one backup track.
Bring one architecture sketch and one technical blocker question.
During the workshop
Use tabs as reference material during demos.
Mark which code snippets your team will reuse.
Ask mentors where your project fits in the NVIDIA stack.
Decide what you will build on Day 0 and what you will defer.
Before hackathon Day 0
Have a runnable repo with a license and README.
Prepare small test data and an evaluation prompt set.
Confirm your inference path: hosted NIM, self-hosted NIM, vLLM, or cluster.
Define a demo path that works even if the full system is not complete.
Track Selection Map
If your project is mainly...
Choose
Use these NVIDIA assets
Expected demo proof
Connecting tools, APIs, documents, agents, and workflows into a working assistant or automation service.
Minimum bar for a strong technical submission: a clear track, runnable demo path, evidence of NVIDIA stack usage, and a small but believable evaluation loop.
Common Foundation: NIM + Nemotron
NIM and Nemotron are the common layer across all three tracks. NIM gives you a production-style inference endpoint with an OpenAI-compatible API. Nemotron gives you a strong NVIDIA model family for reasoning, tool use, long-context work, and synthetic data generation.
# scripts/nim_chat_smoke_test.py
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(
base_url=os.getenv("NIM_BASE_URL", "https://integrate.api.nvidia.com/v1"),
api_key=os.getenv("NVIDIA_API_KEY", "not-used"),
)
model = os.getenv("NIM_MODEL", "nvidia/nemotron-3-nano-30b-a3b")
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a concise technical mentor for hackathon teams."},
{"role": "user", "content": "Suggest a Track A architecture for invoice processing."},
],
temperature=0.2,
max_tokens=600,
)
print(response.choices[0].message.content)
Streaming response
python
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Create a demo script for a multilingual support agent."}],
stream=True,
temperature=0.3,
max_tokens=700,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
Reasoning controls
Use non-thinking mode for fast routing and tool loops. Use thinking mode for deeper planning, architecture comparison, and data quality review. Avoid combining experimental parallel reasoning modes with tool-calling loops unless the model documentation explicitly supports it.
python
# Fast concise mode
resp = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Draft a simple grievance-routing agent architecture."}],
temperature=0,
max_tokens=700,
extra_body={
"chat_template_kwargs": {
"enable_thinking": False
}
},
)
print(resp.choices[0].message.content)
# Deeper planning mode
resp = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Compare three architectures for a public-services assistant and choose one."}],
temperature=0,
max_tokens=1800,
extra_body={
"chat_template_kwargs": {
"enable_thinking": True
}
},
)
print(resp.choices[0].message.content)
Tool calling pattern
Use tool calling when the model should decide when to call an application function, database lookup, policy engine, calculator, search API, or workflow step.
python
import json
tools = [{
"type": "function",
"function": {
"name": "lookup_policy",
"description": "Look up a company policy by topic.",
"parameters": {
"type": "object",
"properties": {
"topic": {"type": "string", "description": "Policy topic, for example travel or reimbursement"}
},
"required": ["topic"],
},
},
}]
def lookup_policy(topic: str) -> str:
policies = {
"travel": "Flights require manager approval. Hotels must be under the city cap.",
"reimbursement": "Submit receipts within 30 days with project code and GST details.",
}
return policies.get(topic.lower(), "No policy found for that topic.")
messages = [{"role": "user", "content": "Can I expense a hotel for my Bangalore workshop trip?"}]
first = client.chat.completions.create(
model=model,
messages=messages,
tools=tools,
tool_choice="auto",
temperature=0,
)
assistant_message = first.choices[0].message
messages.append(assistant_message)
if assistant_message.tool_calls:
for call in assistant_message.tool_calls:
args = json.loads(call.function.arguments)
result = lookup_policy(**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": result,
})
final = client.chat.completions.create(model=model, messages=messages, temperature=0.2)
print(final.choices[0].message.content)
Self-hosted NIM smoke tests
bash
# Log in to NGC before pulling private/entitled containers.
docker login nvcr.io -u '$oauthtoken' -p "$NGC_API_KEY"
# Example pattern. Confirm the exact container path from the model page or NIM docs.
docker run --rm -it --gpus all \
-e NGC_API_KEY="$NGC_API_KEY" \
-v "$HOME/.cache/nim:/opt/nim/.cache" \
-p 8000:8000 \
nvcr.io/nim/nvidia/nemotron-3-nano-30b-a3b:latest
curl http://localhost:8000/v1/models
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/nemotron-3-nano-30b-a3b",
"messages": [{"role": "user", "content": "Say hello from local NIM."}],
"max_tokens": 64
}'
Model-free NIM option
Use model-free NIM only when you need to serve a supported custom or newer model path and understand the deployment requirements. For most hackathon teams, hosted NIM or a model-specific NIM container is simpler.
Regenerate key, re-export env var, verify model access.
404 model not found
Model ID mismatch between hosted endpoint and self-hosted endpoint.
Call /v1/models and use returned model ID.
Slow first response
Container cold start, model download, cache miss.
Warm up endpoint before demos and cache models.
JSON/tool parse errors
Prompt too loose or schema too complex.
Simplify schema, set temperature low, validate arguments.
Track A: Agentic Workflows
Track A teams should aim to show a working agentic service, not just a chatbot. The service should plan, call tools, recover from failure, expose a usable interface, and report at least basic quality or latency metrics.
Reference architecture
text
User / UI / API
-> NeMo Agent Toolkit workflow
-> Nemotron via NIM for reasoning
-> Tools:
- search / RAG
- databases
- business APIs
- document parsers
- calculators / validators
-> Optional MCP server/client boundary
-> Optional A2A specialist agent delegation
-> Trace, evaluation, final answer
Install NeMo Agent Toolkit
bash
mkdir track-a-agent
cd track-a-agent
python3 -m venv .venv
source .venv/bin/activate
pip install -U uv
uv pip install "nvidia-nat[langchain,mcp,a2a,eval]"
export NVIDIA_API_KEY=nvapi-your-key
nat --help
nat info components -t function
Grouped dynamic tools, especially MCP client tool groups.
workflow
The agent, router, sequential flow, or executor that binds model and tools.
eval
Datasets, evaluators, metrics, and output paths for quality checks.
general
Telemetry, logging, retries, object stores, and runtime-level settings.
Workflow type
Use when
react_agent
You need a first agent with explicit tool reasoning and easy logs.
Tool-calling workflow
You want cleaner structured tool calls and less free-form reasoning.
Router agent
You need to send India-specific tasks to specialist workflows.
Sequential workflow
You need deterministic steps such as extract, validate, retrieve, respond.
Parallel workflow
You want several tools or specialists to answer, then merge results.
Run, serve, and test
bash
# Run once from CLI.
nat run --config_file configs/invoice_agent.yml \
--input "Design an enterprise invoice triage workflow and list the tools it needs."
# Serve as an HTTP API.
nat serve --config_file configs/invoice_agent.yml --host 0.0.0.0 --port 8000
# Call it from another terminal.
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Check this invoice workflow for missing approval steps."}
]
}'
MCP server pattern
Expose workflow functions as MCP tools when another client or agent needs to call them.
bash
# Publish functions from the workflow as MCP tools.
nat mcp serve --config_file configs/invoice_agent.yml --host 0.0.0.0 --port 9901
# Common smoke test: list tools from an MCP-capable client, then call one tool.
# Keep network and credential policies explicit if connecting enterprise systems.
MCP client pattern
Use an MCP client group when your agent needs to consume tools from another local service, partner API wrapper, database adapter, or team-built tool server.
# From another terminal or another agent environment:
nat a2a client discover --url http://localhost:10000
nat a2a client get_skills --url http://localhost:10000
nat a2a client call \
--url http://localhost:10000 \
--message "Summarize incorporation steps for a DPIIT-recognized startup."
Evaluation harness
json
[
{
"input": "Invoice INV-102 has no PO number. What should happen?",
"expected": "Flag for manual review or route to exception approval."
},
{
"input": "Vendor is approved and amount is below threshold. What next?",
"expected": "Proceed to payment queue after validation."
}
]
Every Track A team should inspect traces before demo day. Most agent failures are not model failures; they are retrieval misses, wrong tool choices, bad schemas, retries, or oversized context.
Did the agent choose the correct MCP, A2A, or custom tool?
Latency
Which model call or tool call dominates response time?
Token usage
Are retrieved documents or tool outputs too large?
Retries
Are tool argument schemas or output parsers failing?
Eval failures
Is the failure caused by retrieval, reasoning, or prompt design?
NemoClaw For Safe Agent Exploration
NemoClaw is useful when you want to discuss safer execution of always-on coding or automation assistants. It integrates OpenClaw with NVIDIA OpenShell sandboxing concepts.
NemoClaw is alpha software. APIs, configuration schemas, and runtime behavior can change. Do not use it in production environments.
bash
# Quickstart pattern from the current NemoClaw docs.
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
source ~/.bashrc
# Create or connect to a sandboxed assistant.
nemoclaw hackathon-agent connect
# Check status and logs.
nemoclaw hackathon-agent status
nemoclaw hackathon-agent logs --follow
NemoClaw discussion checklist
Concern
What to show in your architecture
Network egress
Which domains or APIs the agent can call, and which are blocked.
Credentials
How secrets are isolated from prompts, logs, and agent memory.
Filesystem
Which project paths the agent can read/write.
Human control
Where approvals, logs, or policy gates are surfaced.
Track B teams should be explicit about what model behavior they are changing and why. Your demo should compare base vs customised behavior, not only show that training ran.
Track B pipeline
1. Define behavior gap: What the base model cannot do well enough today.
2. Prepare data: Human data, domain data, or Track C-generated synthetic examples.
3. Train: Use full SFT or LoRA/PEFT through Megatron-Bridge where appropriate.
4. Align: Use NeMo RL for GRPO, DPO, RM, or SFT post-training when reward/preference signals matter.
5. Evaluate: Compare base vs adapted model on held-out prompts.
Environment and container
bash
# Confirm exact container tag from current Megatron-Bridge docs for your model.
docker run --rm -it \
--gpus all \
--shm-size=64g \
-w /opt/Megatron-Bridge \
-v "$PWD:/workspace" \
nvcr.io/nvidia/nemo:25.11.nemotron_3_nano \
bash
export HF_TOKEN=hf-your-token
export HF_MODEL_ID=nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
export MEGATRON_MODEL_PATH=/workspace/checkpoints/nemotron3-nano-megatron
export HF_EXPORT_PATH=/workspace/exports/nemotron3-nano-adapted
{"messages":[{"role":"system","content":"You are a careful finance-domain assistant."},{"role":"user","content":"Explain working capital in simple terms."},{"role":"assistant","content":"Working capital is current assets minus current liabilities..."}],"metadata":{"domain":"finance","source":"curated","license":"internal-approved"}}
{"messages":[{"role":"system","content":"You are a careful finance-domain assistant."},{"role":"user","content":"What is the risk of negative cash conversion cycle?"},{"role":"assistant","content":"A negative cash conversion cycle can be healthy when supplier terms fund inventory..."}],"metadata":{"domain":"finance","source":"synthetic-reviewed","license":"internal-approved"}}
NeMo RL alignment path
Use NeMo RL when you have preference, reward, or environment feedback signals. Supported post-training paths include GRPO, DPO, reward model training, SFT, DAPO, and on-policy distillation.
uv run python examples/run_grpo.py --config configs/grpo_domain.yaml
tensorboard --logdir=./logs --port=6006
Custom reward environment pattern
Use a custom reward environment when correctness is more than string matching. For India-domain assistants, useful reward axes include groundedness, no fake citations, language fit, and refusal behavior.
python
import re
import torch
import ray
from nemo_rl.environments.interfaces import EnvironmentInterface, EnvironmentReturn
@ray.remote(max_restarts=-1, max_task_retries=-1)
class IndiaPolicyRewardEnv(EnvironmentInterface):
def __init__(self, cfg):
self.required_terms = cfg.get("required_terms", [])
def step(self, message_log_batch, metadata):
responses = [
"".join(str(m["content"]) for m in conv if m["role"] == "assistant")
for conv in message_log_batch
]
scores = []
for text, meta in zip(responses, metadata):
grounded = float(any(t.lower() in text.lower() for t in self.required_terms))
no_fake_section = float(
not re.search(r"Section\s+\d+[A-Z]?", text)
or meta.get("allow_sections", False)
)
lang_ok = float(meta.get("target_lang", "en") in ["en", "hi", "ta", "te"] or len(text) > 0)
scores.append([grounded, no_fake_section, lang_ok])
return EnvironmentReturn(
observations=[{"role": "environment", "content": ""} for _ in responses],
metadata=metadata,
next_stop_strings=[None] * len(responses),
rewards=torch.tensor(scores, dtype=torch.float32),
terminateds=torch.ones(len(responses)),
answers=None,
)
DPO and reward model patterns
Use DPO when your team has pairwise preference examples and wants alignment without online reward rollouts. Use reward model training when you want a reusable scorer for later RL or evaluation.
jsonl
{"context":[{"role":"user","content":"Explain DigiLocker KYC limits."}],"completions":[{"rank":0,"completion":[{"role":"assistant","content":"A careful answer with caveats and no unsupported legal claim."}]},{"rank":1,"completion":[{"role":"assistant","content":"An overconfident answer with made-up section numbers."}]}],"task_name":"india_policy"}
bash
# DPO pattern. Confirm exact example paths in your checked-out NeMo RL version.
uv run python examples/run_dpo.py --config examples/configs/dpo.yaml \
policy.model_name=./exports/india_hf \
dpo.sft_loss_weight=0.1
# Reward model pattern.
uv run python examples/run_rm.py --config examples/configs/rm.yaml \
policy.model_name=./exports/india_hf
Track B evaluation table
Metric
How to compute
Why judges care
Domain correctness
Expert rubric or LLM-as-judge over held-out prompts.
Shows behavior changed in the intended direction.
Format compliance
Regex or parser validation for required answer schema.
Important for enterprise workflows and agents.
Safety/regression
Base safety prompts plus refusal/grounding checks.
Code-mixed Hindi-English prompts and at least two regional-language samples if your project claims multilingual support.
Legal, medical, and financial prompts that require cautious wording and escalation rather than overconfident advice.
Held-out examples that never appear in SFT, preference, or reward model training data.
Track C: Synthetic Data Generation
Track C teams should show the complete data factory: schema, generation, validation, curation, export, and quality report. A pile of generated examples is not enough.
Track C pipeline
text
Data objective
-> Schema and sampling plan
-> Nemotron generation through hosted NIM, local NIM, or vLLM
-> Deterministic validators
-> LLM-as-judge scoring
-> NeMo Curator filtering and deduplication
-> JSONL / Parquet export
-> Quality report and examples
Strong India SDG domains
Domain
Example synthetic records
Safety note
Agriculture
Pest triage, mandi price explanation, subsidy navigation.
Escalate crop disease diagnosis and avoid fabricated government scheme details.
This mirrors the Korea guide pattern: define model config, sampler columns, LLM-generated columns, structured columns, validators, judges, preview, then create. Confirm exact class names against the Data Designer version you install.
builder.add_column(dd.LLMTextColumnConfig(
name="user_request",
model_alias="nemotron",
prompt="""
Create one realistic user request for an India SDG assistant.
Domain: {{ sdg_domain }}
Language: {{ language }}
User profile: {{ user_profile }}
Requirements:
- Use natural phrasing for the selected language.
- Include concrete local constraints.
- Do not include personally identifiable information.
- Make the request useful for training an agent, not a generic chatbot.
""",
))
Schema-driven agent trace
python
agent_trace_schema = {
"type": "object",
"properties": {
"intent": {"type": "string"},
"constraints": {"type": "array", "items": {"type": "string"}},
"plan": {"type": "array", "items": {"type": "string"}},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tool_name": {"type": "string"},
"arguments": {"type": "object"},
"expected_observation": {"type": "string"},
},
"required": ["tool_name", "arguments", "expected_observation"],
},
},
"risk_flags": {"type": "array", "items": {"type": "string"}},
},
"required": ["intent", "constraints", "plan", "tool_calls", "risk_flags"],
}
builder.add_column(dd.LLMStructuredColumnConfig(
name="agent_trace",
model_alias="nemotron",
prompt="""
Given the user request below, create a structured agent execution trace.
User request:
{{ user_request }}
The trace should show how a responsible agent would use tools, constraints,
risks, and next steps for the selected India SDG domain.
""",
schema=agent_trace_schema,
))
Validation and LLM-as-judge columns
python
def validate_trace(row):
trace = row["agent_trace"]
if not trace.get("plan"):
return False
if "medical diagnosis" in str(trace).lower():
return False
if len(trace.get("tool_calls", [])) > 4:
return False
return True
builder.add_column(dd.ValidationColumnConfig(
name="trace_valid",
validator_type=dd.ValidatorType.PYTHON,
params={"function": validate_trace},
required_columns=["agent_trace"],
))
builder.add_column(dd.LLMJudgeColumnConfig(
name="quality_judge",
model_alias="nemotron",
prompt="""
Evaluate this synthetic agent training record.
Domain: {{ sdg_domain }}
Language: {{ language }}
User: {{ user_request }}
Trace: {{ agent_trace }}
Score strictly. Penalize unsafe advice, vague plans, invalid tool use,
poor local relevance, and language mismatch.
""",
scores=[
dd.Score(name="local_relevance", description="Does this fit the Indian SDG context?", options={1: "poor", 3: "acceptable", 5: "excellent"}),
dd.Score(name="tool_use_quality", description="Are tool calls plausible and useful?", options={1: "bad", 3: "usable", 5: "strong"}),
dd.Score(name="safety", description="Does it avoid harmful or overconfident guidance?", options={1: "unsafe", 3: "minor issues", 5: "safe"}),
dd.Score(name="language_quality", description="Is the selected language fluent and natural?", options={1: "poor", 3: "okay", 5: "native-like"}),
],
))
The exact Data Designer APIs can evolve, so treat this as the shape to reproduce: samplers define distributions, LLM columns generate dependent fields, structured columns extract metadata, judge columns score quality, and expression/validation columns produce final training records.
python
# notebooks/sdg_multilingual_support.py
import os
import json
import random
from openai import OpenAI
from pydantic import BaseModel, Field
client = OpenAI(
base_url=os.getenv("NIM_BASE_URL", "https://integrate.api.nvidia.com/v1"),
api_key=os.environ["NVIDIA_API_KEY"],
)
MODEL = "nvidia/nemotron-3-nano-30b-a3b"
class ConversationMeta(BaseModel):
language: str
domain: str
intent: str
difficulty: str
safety_risk: str = Field(description="none, low, medium, high")
LANGUAGES = ["Hindi", "English", "Tamil", "Telugu", "Bengali", "Marathi"]
DOMAINS = ["banking", "healthcare", "telecom", "travel", "education"]
INTENTS = ["complaint", "question", "refund", "troubleshooting", "appointment"]
def sample_seed():
return {
"language": random.choice(LANGUAGES),
"domain": random.choice(DOMAINS),
"intent": random.choice(INTENTS),
"difficulty": random.choice(["easy", "medium", "hard"]),
}
def generate_conversation(seed):
prompt = f"""
Create a realistic multilingual customer support conversation.
Language: {seed["language"]}
Domain: {seed["domain"]}
Intent: {seed["intent"]}
Difficulty: {seed["difficulty"]}
Return exactly 6 to 10 turns as JSON:
{{"messages":[{{"role":"user","content":"..."}},{{"role":"assistant","content":"..."}}]}}
"""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
max_tokens=1200,
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
def judge_record(seed, conversation):
rubric = f"""
Score this synthetic conversation from 1 to 5 on:
1. language_quality
2. domain_relevance
3. assistant_helpfulness
4. safety
Return JSON with integer scores and a short reason.
Seed: {json.dumps(seed)}
Conversation: {json.dumps(conversation, ensure_ascii=False)}
"""
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": rubric}],
temperature=0,
max_tokens=600,
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
records = []
for i in range(20):
seed = sample_seed()
conversation = generate_conversation(seed)
scores = judge_record(seed, conversation)
records.append({"id": i, "seed": seed, "conversation": conversation, "scores": scores})
with open("synthetic_support_conversations.jsonl", "w", encoding="utf-8") as f:
for record in records:
f.write(json.dumps(record, ensure_ascii=False) + "\n")
Validation functions
python
def has_min_turns(record, min_turns=6):
return len(record["conversation"].get("messages", [])) >= min_turns
def alternates_roles(record):
roles = [m.get("role") for m in record["conversation"].get("messages", [])]
return all(role in {"user", "assistant"} for role in roles)
def score_value(record, key, threshold=3):
value = record.get("scores", {}).get(key)
if isinstance(value, dict):
value = value.get("score")
try:
return int(value) >= threshold
except Exception:
return False
def keep(record):
return (
has_min_turns(record)
and alternates_roles(record)
and score_value(record, "language_quality")
and score_value(record, "domain_relevance")
and score_value(record, "assistant_helpfulness")
and score_value(record, "safety")
)
filtered = [r for r in records if keep(r)]
print(f"Kept {len(filtered)} / {len(records)} records")
Export for SFT
python
with open("support_sft.jsonl", "w", encoding="utf-8") as f:
for record in filtered:
out = {
"messages": [
{"role": "system", "content": "You are a helpful multilingual customer support assistant."},
*record["conversation"]["messages"],
],
"metadata": {
**record["seed"],
"scores": record["scores"],
"source": "synthetic-nemotron",
},
}
f.write(json.dumps(out, ensure_ascii=False) + "\n")
NeMo Curator Post-Processing
Use Curator when the dataset becomes large enough that manual inspection is no longer enough. Curator supports text loading, filtering, quality scoring, exact/fuzzy/semantic deduplication, and synthetic data generation workflows that can connect to OpenAI-compatible endpoints.
Install Curator
bash
# Choose extras based on your environment and CUDA version.
pip install -U nemo-curator
# For GPU-accelerated text curation on CUDA 12 environments, confirm the latest extra name in docs:
pip install --extra-index-url https://pypi.nvidia.com "nemo-curator[text_cuda12]"
Prepare Parquet for curation
python
import json
import pandas as pd
rows = []
with open("support_sft.jsonl", encoding="utf-8") as f:
for idx, line in enumerate(f):
item = json.loads(line)
text = "\n".join(m["content"] for m in item["messages"])
rows.append({"id": idx, "text": text, "metadata": item.get("metadata", {})})
df = pd.DataFrame(rows)
df.to_parquet("support_sft_for_curator.parquet", index=False)
Deduplication decision guide
Problem
Method
Use when
Exact repeated rows
Exact hash matching
Generated output repeats identical conversations.
Small edits or paraphrases
Fuzzy MinHash + LSH
Templates produce near duplicates with minor wording changes.
Meaning-level duplicates
Semantic embeddings
Different text says the same thing across languages or styles.
Curator-style pipeline sketch
python
# API names can vary across Curator releases. Use this as an implementation sketch.
from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.text.io.reader.parquet import ParquetReader
from nemo_curator.stages.text.io.writer.parquet import ParquetWriter
from nemo_curator.stages.text.modules.score_filter import Filter
pipeline = Pipeline(name="support_sft_cleanup")
pipeline.add_stage(ParquetReader(file_paths="support_sft_for_curator.parquet"))
# Drop very short records.
pipeline.add_stage(
Filter(
filter_fn=lambda text: len(text) >= 400,
filter_field="text",
)
)
# Add exact/fuzzy/semantic deduplication stages based on your installed Curator version.
pipeline.add_stage(ParquetWriter(path="curated_support_sft.parquet"))
Quality report template
markdown
# Synthetic Dataset Quality Report
Dataset: support_sft.jsonl
Generator model: nvidia/nemotron-3-nano-30b-a3b
Records generated: 2,000
Records kept after validation: 1,420
Duplicate removal: exact + fuzzy
Languages: Hindi, English, Tamil, Telugu, Bengali, Marathi
Average LLM judge score: 4.1 / 5
Known limitations:
- Synthetic conversations still need human spot checks.
- Domain policies are examples, not legal or medical advice.
- Low-resource language quality should be reviewed by native speakers.
Know-Before-You-Go Checklist
Team readiness
Primary contact and all team members confirmed.
At least three active participants for hackathon work.
At least two members available in Bangalore for final days if shortlisted.
Repository has README, license, and setup instructions.
Demo owner and pitch owner assigned.
Technical readiness
NVIDIA API key created and stored outside git.
Hosted NIM smoke test completed.
Dataset sample or workflow fixture prepared.
Evaluation prompts or metrics defined.
Fallback demo path works without full training or full integration.
Track-Specific Deliverables
Track
Minimum technical artifact
Stretch artifact
A
Runnable agent workflow with at least one real tool call.
MCP/A2A exposure plus eval traces and NemoClaw safety discussion.
B
Base vs adapted model comparison on held-out prompts.
LoRA/SFT plus NeMo RL alignment or reward model experiment.
C
Generated dataset with schema, validation, and export format.
Curator dedup/filter pipeline plus quality report and downstream eval.
Demo Script Template
markdown
# Demo Script
1. Problem
- Who has this problem?
- What is painful today?
2. NVIDIA stack
- Model:
- Runtime:
- NeMo tools:
- Data/training path:
3. Live demo path
- Input:
- Intermediate tool/model/data step:
- Output:
4. Evaluation
- Metric:
- Baseline:
- Result:
5. Next step
- What would productionize this?