I have been doing genealogy research as a hobby for the past several years. Like most people who get serious about it, I eventually accumulated a working family tree large enough that the only sensible way to share it with other tools is the GEDCOM file format. GEDCOM has been the de-facto interchange format for genealogy data since 1984, and although newer formats have been proposed over the years, every piece of genealogy software I have ever used can still read and write some flavor of it.

Earlier this year I started using Claude Code for a number of programming projects, including the vDB database library I wrote about previously. Recently I wanted to dig into some details across my own tree - things like getting a clean list of every ancestor who had immigrated from overseas, or working out which branches went back to colonial-era America. Doing that kind of analysis by clicking around in a desktop genealogy program is tedious, and writing one-off scripts every time I had a new question was not much better. It seemed obvious that Claude would be much more helpful with this kind of work if it could read and reason about my GEDCOM files directly. But Claude had no real tools for doing so. I wanted Claude to be able to open a .ged file, find ancestors and descendants, fix bad records, and generate reports, all without me having to walk it through the file format every time. That meant building a plugin.

This post is about how I built gedcom-skills, the underlying gedcom-lite Python library that grew out of it, and the higher-level gedcom-reports package that came later. Along the way the project turned into a useful case study in how to build agentic tooling: where to draw API boundaries, how to keep the parser honest, and how using the tool for a real piece of work surfaces gaps that are essentially invisible from the outside.

Documentation Before Code

The first thing I did was tell Claude not to write any code.

GEDCOM is one of those formats with a long history and several incompatible variants, and the difference between them matters. GEDCOM 5.5.1 is the 1999 spec that almost every genealogy program implements. GEDCOM 5.5.5, released in 2019, is a cleanup that nobody really adopted. FamilySearch GEDCOM 7.0+, released in 2021 and actively maintained, is the modern path. A useful tool needs to be able to read all three, and Claude needs documentation it can refer back to in order to keep them straight.

So I started by having Claude pull down the official sample files from gedcom.org and gedcom.io. This included the GEDCOM 5.5.5 sample set, the full FamilySearch test suite (which contains the wonderfully named maximal70.ged that exercises every standard tag), and the .gdz GEDZIP archives that 7.0 introduced. Then we wrote a small set of reference docs covering the line grammar, the differences between versions, the most common tags, parser gotchas, and a catalog of which fixture exercised which feature. These docs went into docs/ in the repo and stayed there.

This phase was important because the documentation became the contract that the rest of the project had to honor. "Round-trip fidelity" wasn't a vague aspiration - it was written down with specifics. Preserve the byte-order mark. Preserve line endings. Preserve sibling order. Preserve the exact CONC (concatenation) / CONT (continuation) choice the source file made. Preserve the original ANSEL bytes if the file is ANSEL-encoded. When I later asked Claude to make architectural decisions, we could trace each decision back to a stated principle rather than relitigating them from scratch.

There was also a small comic interlude in this phase. Late one evening I fell asleep at the keyboard and apparently entered "0.3" into the prompt before I woke up. Claude obligingly took this as a directive and produced a plan file with "0.3" baked into its name. Small things like this are a reminder that you are working with a system that takes everything you type at face value, even things you typed by accident.

Planning the Parser

Once the documentation was in place I switched into "plan mode" and we worked through four design decisions before writing any Python.

The first was scope. I decided that the plugin needed to do three things: read a file (and produce a structured summary or JSON dump), search the file (find people, families, dates, places, ancestors, descendants), and update the file (change a single value, add a record, delete a record). Schema validation and conversion between GEDCOM versions are useful, but they could wait.

The second decision was the parser strategy. There is an existing Python library called ged4py that I considered using, but I decided against it. ged4py is built for one-shot reading: it reorders siblings, drops formatting, and only partially supports GEDCOM 7.0. None of those tradeoffs are wrong for its intended use, but they are incompatible with a credible update skill. If a user asks Claude to change a single name in a GEDCOM file, the right behavior is to rewrite exactly that one line and leave every other byte of the file alone. That is impossible if the parser silently rearranges the file on read. I also looked at gedcomtools, which goes the other direction - a large class hierarchy with a lot of functionality I would never use - and felt too heavy for the kind of thin, agentic CLI tooling I was trying to build. So we wrote a custom parser, about 400 lines, designed around a "dirty bit" model: every parsed Structure remembers its original line text, and the writer emits the original verbatim unless the structure has been modified. Round-trip fidelity falls out of this for free.

The third decision was about the update default. Genealogy data is irreplaceable - in many cases it represents years of careful research - and a tool that overwrites a file is a tool that will eventually destroy a tree. I made gedcom-update write to standard output by default. An explicit --in-place flag is required to overwrite the source file, and even then we recommend -o NEW.ged for the cautious case.

The fourth decision was about ANSEL. Many older 5.5.1 files are ANSEL-encoded, which is a 7-bit character set with combining marks for diacritics. None of Python's standard codecs handle ANSEL. Rather than depend on a third-party codec, I bundled a self-contained ANSEL-to-Unicode table inside the library. Users who need ANSEL get it for free; users who don't carry only a small data table.

We also spent a useful side conversation on what read-gedcom and search-gedcom should do operationally before committing to any API. Splitting them turned out to matter for the way Claude triggers skills. "Show me this file" and "find people born in Boston" hit very different SKILL.md descriptions, and a single umbrella skill would have done a worse job at both. After seeing the plan, I added "ancestor" and "descendant" as keywords on the search skill, because real users don't ask Claude to find an INDI (individual) record - they ask it for their great-grandfather's siblings.

What the Test Fixtures Caught

The first cut shipped as a single repo with the parser in lib/gedcom_core.py, the ANSEL codec in lib/ansel_table.py, three skills under skills/*/scripts/, and one round-trip test that ran against every official fixture. That single test caught three real bugs that hand-written test cases would never have surfaced.

The first was that the ANSEL encoder put combining marks in the wrong place. Decoding the German name "Müller" came back as "Mul̈ler" - the diaeresis had drifted onto the wrong letter. ANSEL's convention for combining marks is the opposite of Unicode's: the mark precedes the base character rather than following it. The initial encoder had been written assuming Unicode order. The fix was to walk the NFD-normalized text in base + marks clusters and emit the marks first.

The second was a greedy regex. The year-extraction code was returning 1 when given 1 JAN 2000, because the regex was matching the leading "1" rather than the four-digit token at the end. The fix was to prefer the last 4-digit token in the date string, which is the only one that can possibly be a year.

The third was that the UTF-16-without-BOM detection was wrong. UTF-16-LE-encoded ASCII looks like a sequence of <ascii_byte>\x00 pairs, and each of those pairs is a perfectly valid UTF-8 sequence. So a UTF-8 decode of UTF-16 ASCII succeeds, returns garbage that looks vaguely text-shaped, and the parser happily treats the result as GEDCOM. The fix was a null-byte heuristic: if more than 25% of the decoded text is null characters, retry as UTF-16.

These are the kind of bugs that hand-written tests often fail to cover due to a lack of edge case testing. The official sample files are a much harder critic. After the fixes were in, all 26 fixtures round-tripped byte-identical, and a separate test verified the dirty-bit model by mutating one NAME (name) payload in maximal70.ged and confirming the resulting diff was exactly one line.

The Portability Pivot

About halfway through, I looked at the architecture and noticed something I didn't like. All three skills were importing the parser via sys.path.insert(0, parents[3] / "lib") - a hack that worked fine as long as the three skills shipped together inside one plugin, but which fell apart the instant anyone tried to extract just read-gedcom for use in a different plugin. The skills weren't really portable. They were a single tightly-coupled artifact pretending to be three things.

The fix was to publish the shared parser as its own package. We split the codebase into two repos. The new repo, gedcom-lite, holds the parser, the ANSEL codec, the date utilities, the traversal helpers, and a small set of CLI tools (gedcom-read, gedcom-search, gedcom-update) that wrap them. The "lite" in the name advertises the scope: pure-stdlib Python, no schema validation, no domain modeling beyond GEDCOM itself. It's a parser and a query layer, and that's all. The old repo, gedcom-skills, became a thin Claude Code plugin: three SKILL.md files plus the GEDCOM-domain documentation, with no Python code of its own. Each skill invokes its corresponding CLI tool via uvx --from gedcom-lite gedcom-{read,search,update}, which means the skills work with no setup as long as the user has uv installed. (uv is installed by default in Claude Cowork environments, so for users running the plugin inside Cowork there is no setup at all.)

The split forced me to write a real test suite instead of a single round-trip script. The new suite ended up being 11 files and 158 tests, organized by area: tokenizer, encoding sniff, ANSEL codec, dates, round-trip, mutation, traversal, plus a CLI test file per script that invokes main(argv=[...]) directly with capsys. Writing those tests caught two more bugs. One was a ridiculous one-liner: _predominant_terminator([""]) was returning \r\n instead of the default \n, because the dictionary used to count terminators iterated in insertion order and \r\n happened to be inserted first. The other was a test that had been silently assuming doc.records[0] was an INDI record - a fine assumption for the file we had been hand-testing against, and an embarrassingly wrong assumption for maximal70.ged, where the first record is a FAM record. The test was wrong, but the kind of bug - implicit assumptions about file layout - is exactly what unit tests are supposed to flush out.

After the split, gedcom-skills shrank to 16 files: three SKILL.md wrappers, the GEDCOM-domain docs, four demo fixtures, and a plugin manifest. No Python code. No imports. Each skill is a few paragraphs of prose explaining when Claude should use it and a couple of uvx invocation examples. This is the shape I think a Claude Code plugin should usually take: the heavy lifting lives in a normal Python package on PyPI, and the plugin is just the piece of metadata that teaches the agent how to call it.

Putting It to Work on My Own Family Tree

Once the plugin was working, I pointed Claude at my own family tree. This is where the project stopped being a software exercise and became a research tool.

My tree is centered on me - Andrew Cleburne Young. My initial request was simple: "List all of Andrew Cleburne Young's direct ancestors up to five generations." Claude found the right INDI record, ran --ancestors-of with --depth 5, and produced a list of 62 ancestors in roughly breadth-first order.

That output revealed two things at once. The first would have been a real research finding if I weren't already aware of it: my ancestor Benjamin Franklin Young had two FAMC (child-in-family link) entries in the file, one pointing at Thomas Lawrence Young and Sarah Jane Patton, the other pointing at James Madison Young and Susanna Hicks. Both sets were real candidate parents - I'm still trying to figure out which set is correct - but my GEDCOM had no preferred-parent indicator, so the traversal silently followed both, and Benjamin appeared to have four parents and a tangled set of grandparents. This was useful information about my tree, but it was hidden inside a flat list that gave no hint about which side of the split any given ancestor belonged to.

The second thing the output revealed was a problem with my own tools. Mapping each ancestor back to a generation required running a separate --parents-of query for every individual and reconstructing the parent-child edges by hand. Asking for birth places required parsing each INDI block again with custom Python. And the FAMC ambiguity was being silently flattened, which is exactly the wrong default for genealogy work. The flat list was a textbook case of a tool that does what you literally asked for and nothing else.

I iterated on the report shape over several rounds. Round by round it grew: seven generations instead of five, then ten; structured tables for foreign-born ancestors and for ancestors born in the colonies; further splits by region of origin; explicit Ahnentafel numbering (subject = #1, father = 2N, mother = 2N+1) to give every ancestor a stable identifier across re-runs; a separate "Alternate Family Tree" section for the unresolved Young branches; date and place of death added everywhere; categorical tables for Colonial Immigrants, Ancestors Born in the Colonies, and Early US Immigrants. The final report ran to ten generations, 218 ancestors in the main tree, plus alternate-tree sections for the two contested branches.

The Report Exposed Gaps in the Skill

After the seventh round of iteration, I asked Claude a deliberate question: would this report have been easier to generate if search-gedcom knew how to do more of this work directly?

That question turned out to be the single highest-leverage question I asked in the entire project. Claude identified seven concrete pain points from the work we had just finished:

--ancestors-of returned a flat list with no per-ancestor generation field, and the BFS ordering broke down whenever a FAMC conflict produced multiple paths to the same person.
There was no structured way to extract the facts I cared about (name, birth, death, parents) - the only options were --show-record for raw GEDCOM text or one-line summaries that omitted places.
The traversal silently followed all FAMC links when there were multiple, with no flag to constrain to the primary one or to query the conflicts separately.
There was no built-in support for Ahnentafel/Sosa-Stradonitz numbering, despite this being the standard genealogy numbering scheme.
The existing date and place filters didn't compose: I couldn't ask for "born outside the US AND died in the US AND died before 1776" in a single query.
--xref only accepted a single ID at a time.
GEDCOM place strings are messy and even a basic country-level normaliser would have helped.

Six of those seven turned into direct upgrades to gedcom-lite and gedcom-search. The new feature set includes:

An --ahnentafel traversal mode with a sosa field on every result.
A generation field on --ancestors-of and --descendants-of results.
A --facts JSON output mode with parsed birth, death, and parents substructures.
--primary-famc-only and --famc-conflicts flags that turn the silent multi-path traversal into a deliberate user choice.
Combinable --born-in, --died-in, --born-between, and --died-between filters that compose with each other and with the relationship traversals.
A bulk form of --xref that accepts multiple IDs in one call.
New --cousins-of and --siblings-of traversals (added because they fell out naturally once the rest was in place).

The seventh feature - place classification - I deliberately did not add to gedcom-lite. I'll look at why I excluded that feature in more detail later on.

The ancestor-report Skill and Three-Tier Architecture

After the gedcom-lite improvements, the report-generation Python script was still doing a lot of work that was not really about parsing GEDCOM. It was walking the tree using user-chosen parents, emitting alternate family trees for contested branches, classifying place strings into regions, and rendering everything as Markdown. None of that is parser-shaped work; it is editorial work. And I wanted other people to be able to use it.

So I generalized the script into a new skill: ancestor-report. And rather than fold it into gedcom-lite as another console script, I made it a separate package: gedcom-reports.

When I first floated this idea, my own instinct was to keep everything in one package. Claude pushed back with a very logical argument. gedcom-lite is the parser and the low-level query layer - the equivalent of awk and grep over GEDCOM. An ancestor report is the opposite kind of thing. It bundles a region taxonomy. It picks Ahnentafel as the numbering scheme. It draws a 1776 line for "colonial." It decides that Switzerland and Alsace go under the German section (which is a project-specific cultural choice that an Italian-American user might override). These are editorial choices, and packaging them inside the parser would conflate "what is the data" with "how should we tell its story." Other report styles I might add later - descendant reports, timelines, lineage-society applications - would each have their own opinions, none of which need to live inside the parser. Plus, packaging PyYAML alongside gedcom-lite (which is needed to parse the YAML region configuration file, for example) would force a dependency on users who only want to lint a GEDCOM file.

The result was a three-tier architecture:

The bottom tier is gedcom-lite: a 158-test pure-stdlib Python package with three CLI tools. It knows about GEDCOM, encodings, dates, and tree traversal, and nothing else.

The middle tier is gedcom-reports: a package that depends on gedcom-lite and produces formatted reports that make specific editorial choices. It bundles a default regions.yaml taxonomy that users can override with --regions PATH. It ships a gedcom-ancestor-report console script and a --print-default-regions flag that lets users cp $(gedcom-ancestor-report --print-default-regions) my.yaml to seed a customization without grepping through the install path.

The top tier is gedcom-skills: a thin Claude Code plugin (16 files, no Python) that consists of SKILL.md wrappers plus the GEDCOM domain documentation. Its only job is to teach Claude when and how to invoke the CLI tools from the two tiers below it. The ancestor-report SKILL.md is just a thin wrapper that points Claude at the gedcom-ancestor-report script in gedcom-reports; the other three skills point at the CLI tools in gedcom-lite.

The first version of the report skill had been doing its own GEDCOM parsing rather than calling gedcom-search, mostly because the original one-off script had grown organically without ever being refactored. When we refactored it to import gedcom_lite directly as a Python library (rather than shelling out to gedcom-search and parsing JSON), Claude removed about 150 lines of hand-rolled parsing code in exchange for a single import gedcom_lite as gl and a few thin wrappers over Structure.find() and doc.xrefs. The byte-output of the report stayed identical - I verified this with diff against the prior canonical report - but the implementation was now correctly layered. For sibling packages in the same ecosystem, the library-import boundary is the right boundary; subprocess wrappers belong at the user-facing CLI level, not between two Python packages that ship together.

`--override` vs `--alternate`

One small design decision in the ancestor-report skill turned out to be more interesting than I expected, so I want to call it out specifically.

The first version of the skill auto-generated an "Alternate Family Tree" section every time a user passed --override to redirect a FAMC link. This seemed sensible: if you're overriding the tree, presumably you want to see the alternative. But when I used the flag against my real tree, it became clear that I was trying to use it for two completely different purposes.

In the case of Benjamin Franklin Young and James Madison Young, the un-chosen parent set is a real research question - the lineage is contested, and I want both candidate sets to appear in the report so I (or someone reading my work later) can keep investigating. But in the case of Peter A. Staudt and Wilhelmine Dannecker, the un-chosen parents turned out to be step-parents that someone had recorded as additional family links. And in the case of Jonathan Griffin, the second family link was an outright data-entry error: two records that should have been one, with one record holding only the mother and the other holding only the father. In neither of those last two cases did I want the un-chosen parents to appear anywhere in the report.

So I split the flag. --override is now silent by default - it just redirects the parent link and produces no alternate-tree section. --alternate @CHILD@ is a separate, explicit opt-in that says "I want this branch's un-chosen parents rendered as an Alternate Family Tree section." The two flags compose: the Young branches use both, the Staudt and Griffin overrides use only --override.

The original behavior was conflating two genuinely different user intents - "this is a data error, fix it" and "this is an unresolved question, document it" - into the same code path. Once you give those two intents different names, the rest of the design falls out: the silent override is the right default, the explicit alternate is the opt-in, and the user gets to decide which case applies for each ancestor independently.

Takeaways

A few things stood out from this project that I think generalize beyond GEDCOM:

Use cases drive feature requests. Every one of the seven gedcom-lite features added in the second pass came directly from a painful loop in the report-building work. None of them were theoretical. This is, I think, the strongest argument for building real things with your own tools as soon as they are minimally functional. The gaps in your design only become visible when you try to live inside them.

Documentation before code saves time later on. When I pushed back on the architecture mid-build, we could trace the pushback to a stated principle ("portability of the skills") rather than relitigate from scratch. When Claude needed to make decisions about edge cases later, it could re-read the docs rather than re-invent the contract.

Comprehensive tests that go beyond testing just basic functionality save time and find important bugs. Five separate bugs in gedcom-lite were caught by structured tests after the smoke tests had all reported "looks fine." Three of them were exposed by the official sample files. Two of them were exposed by writing per-module unit tests. Claude makes it easy to generate comprehensive test suites that a single engineer might otherwise cut corners on.

Don't pack opinions into the parser. Region taxonomies, era boundaries, numbering schemes, "did Switzerland count as German?" - these are editorial choices that belong in a sibling package, not in the GEDCOM-mechanics layer. The fact that I almost got this wrong, and would have if Claude hadn't pushed back, is a great example of how an AI agent like Claude can help improve code quality. Claude takes on the role of a second engineer when you're working by yourself.

The plugin layer should be thin. gedcom-skills is 16 files with no Python. Its job is to teach Claude that a tool exists and how to call it. The actual capability lives in a normal package on PyPI, where it can be tested, versioned, and used by people who aren't running Claude Code at all. I think this is the right shape for most Claude Code plugins, and I would be cautious about any plugin that tries to be more than a few SKILL.md files plus a manifest.

I ended up with three small repos that compose cleanly: gedcom-lite for parsing and querying, gedcom-skills for teaching Claude how to use them, and gedcom-reports for opinionated higher-level documents like the Ahnentafel-numbered ancestor report I started this project trying to build. Each one is small enough to read in an afternoon, and each one has a single clear job. That ended up being, I think, the most important thing I learned from the project: when you build for an agent, the boundaries between layers matter more than the layers themselves.

What Else Can It Do?

So far I've shown how Claude, with the use of my plugin, can help you search through your genealogy databases and make corrections, but Claude can do much more than that. AI agents like Claude are trained on vast amounts of historical data. This means we can ask Claude questions that require additional knowledge beyond what can be found in the GEDCOM file itself.

For example, I asked Claude the following question:

Looking at Andrew's ancestors, what is the story of his family's
immigration to and life in the United States? Given the locations
of his ancestors, what inferences can be draw about the communities
they were a part of? Feel free to look farther back in his family
tree as well. Be sure to check for notes and other records on each
of his ancestors as well. I've put some useful information in there
as I've found it. Save your report to a file so I can reference
it again later.

Claude generated a comprehensive ancestry report that made connections between the people and the times in which they lived. Much of the report was familiar to me, but there were also several connections that I hadn't made before, like the source of my middle name and the family connection to both the Moravians and the Pennsylvania Dutch.

Overall I'm very pleased with how these skills turned out and what they've allowed me to accomplish in my genealogical research. I hope they will be helpful to others as well.

If you'd like to use these skills in your own Claude instance, or in another AI agent, you can find them here: https://github.com/vaelen/gedcom-skills.

If you'd like to read the documents that Claude generated, they can be found here:

Building gedcom-skills: A Claude Code Plugin for Genealogy