Little AI Projects: Experiments at the Intersection of Government, Civic Tech, and Productivity

3/12/2026 · 6 min read · ai civic-tech government mcp projects

TLDR: I’ve been building a lot of small AI experiments - MCP servers that connect Claude to government data, benefits navigation agents, a policy rules engine, AWS Bedrock guardrail testing, and local model fine-tuning. None of them are “products” yet. They’re bets. Here’s what I’ve been building and why.

The Philosophy

Not every project needs to be a startup. Some of the most interesting work happens at the microproject level - focused experiments that answer one specific question, build one specific skill, or solve one specific problem. This is the post about the little ones.

The common thread across all of them: AI should make government more accessible. Not replace caseworkers or automate away human judgment - but cut through the complexity that currently excludes people from systems they’re entitled to use.

I’ve worked at USDS and the VA. I know how government works from the inside. I also know how AI works. These projects live at that intersection.

MCP Servers: AI-Native Integrations

The Model Context Protocol has been my primary building block this year. It’s a way to connect Claude (and other models) to external services and data sources - think plugins, but more composable.

A few I’ve shipped or prototyped:

USWDS MCP Server - The U.S. Web Design System is how government teams are supposed to build websites. An MCP server that gives Claude full context on USWDS components, patterns, and accessibility guidance makes it dramatically faster to build compliant government UIs. I’ve been testing a fine-tuned local model on top of this for a specialized USWDS coding agent.

SNAP Policy Navigation - SNAP eligibility rules vary by state and are written for administrators, not applicants. This MCP lets Claude answer questions like “Am I eligible? What documentation do I need? What are my state’s income limits?” It turns out there’s real demand for this - most people can’t parse 40-page policy PDFs.

Amtrak Route Search - A simpler one. Help people find train routes conversationally, faster than the Amtrak website. Unexpectedly useful for accessibility reasons too.

Threads Integration - A custom Threads MCP for Claude Desktop that lets me draft and schedule social content via conversation. Speak a thought, get a post. Way less friction than opening the app.

Government AI Prototypes

Beyond MCP integrations, I’ve been building fuller prototypes - agents that actually walk users through complex government processes.

Agentic SNAP Benefits Navigator - A conversational agent that determines eligibility, explains what documentation you need, and clarifies state-specific rules in plain language. The insight that drives this: SNAP documentation is written for program administrators. An AI intermediary that translates it into plain English solves a real problem for millions of people.

Medicaid Work Requirements PoC - As states add work requirements to Medicaid eligibility, understanding what counts as “work,” what exemptions exist (age, disability, caregiving), and how to report your status has gotten genuinely confusing. This PoC walks users through it as a conversation instead of a policy document.

Colorado Benefits Portal Chrome Extension - Prototyped using WebMCP, Chrome’s protocol for AI agents. The idea: Claude sits on a benefits portal, answers real-time questions about what a form field means, and helps fill it out correctly. You’re confused about what “household income” means on a form - you ask Claude, it explains, the form auto-fills. That’s the vision.

Deeper Technical Experiments

Ralph Policy Engine - Rules extracted from legislation, SOPs, and regulatory documents are almost always ambiguous when you try to make them executable. Ralph uses local models to extract policy rules from natural language documents, generates executable rules code, and automatically tests them against edge cases. When a rule fails, it surfaces exactly where the policy interpretation broke down. The system displays the full pipeline - extracted rules, generated code, test results, failures - via a Node.js server. Useful for government teams trying to operationalize complex regulations without guessing.

AWS Bedrock Guardrails with Automated Reasoning - Spent time with AWS Bedrock guardrails and the JavaScript SDK implementing automated reasoning policies - mathematical verification that validates model outputs against formal logic, not just content filters. The key finding: automated reasoning catches factual errors that basic content filtering misses entirely. For government AI where accuracy has real consequences, this matters a lot.

Fine-tuning Local Models - Explored several approaches: QLoRA fine-tuning with Unsloth and Ollama (75% memory reduction on consumer GPUs), cloud fine-tuning via Replicate and SageMaker, and synthetic data generation using Meta’s synthetic-data-kit. The goal in each case: can you take a specific government domain and fine-tune a model to be genuinely useful, then deploy it locally for privacy and cost reasons? The USWDS coding agent mentioned above is the concrete output.

Smaller Side Projects

FEREX - Federal employee retirement is genuinely byzantine (FERS, CSRS, TSP, pension calculations, FEHB). FEREX uses AI to help federal employees model their retirement options conversationally. “If I retire at 60 with 30 years of service, what does my pension look like?” is a question with a non-obvious answer.

RepoPass - What if developers could charge for access to useful private GitHub repos? An experiment in marketplace dynamics and incentivizing public infrastructure. More of a thought experiment that became a prototype.

Local LLM Research - Spent time with Ollama and llama.cpp specifically for government use cases where data residency requirements make cloud APIs a non-starter. On-device AI is increasingly viable for these constraints.

Why This Approach

I’m inspired by practitioners like Dave Guarino who build small, focused experiments rather than waiting for the perfect grand vision. Each little project teaches something. They compound.

It’s also why I’ve been writing about practical AI applications and studying the federal AI use case landscape - there’s a feedback loop between understanding what’s already being attempted and knowing where the gaps are.

The internet has plenty of posts about billion-dollar startup ideas. There’s less written about the small, weird, useful projects that nobody will hear about but solve real problems for real people. These projects matter. They compound. And they’re more fun to build than most things.

If you’re working on similar experiments - government AI, civic tech, MCP servers, benefits navigation - I’d genuinely love to compare notes.

← Back to Blog