Freedmen's Bureau
Labor Contracts

A digitized corpus of 6,102 labor contracts between formerly enslaved people and Southern employers, 1864–1868. Transcribed by volunteers at the Smithsonian, structured by machine.

6,102 Contracts
3.2M Words
5 States
19,901 Pages Scraped

Data Sources

These contracts come from the Freedmen's Bureau Records held at the National Archives (NARA) and digitized by the Smithsonian Transcription Center. Thousands of volunteers transcribed handwritten contract pages from microfilmed records spanning the Reconstruction era.

The Bureau of Refugees, Freedmen, and Abandoned Lands (the "Freedmen's Bureau") operated from 1865 to 1872. One of its central functions was overseeing labor contracts between freed people and their employers—often their former enslavers. Bureau agents witnessed and approved these contracts, which specified wages, rations, working hours, and penalties.

We scraped all transcribed labor contract pages from the Transcription Center, covering 6 NARA microfilm publications across 5 states. Four additional states (Alabama, Georgia, South Carolina, Virginia) have been digitized but not yet transcribed by volunteers.

Microfilm State Contracts Words
Not yet transcribed: Alabama (M1900), Georgia (M798), South Carolina (M869), Virginia (M1913) — digitized images exist but no volunteer transcriptions.

Extraction Methodology

We built a five-step Python pipeline to go from the Smithsonian's raw HTML transcription pages to a structured dataset. The key challenge: each "page" on the Transcription Center is a single scanned image, and a single labor contract typically spans 2–4 pages. Our pipeline detects contract boundaries, then extracts structured fields using regular expressions.

1
Enumerate
Collections
72 collection entries found via Smithsonian API
2
Discover
Projects
87 TC projects mapped via EDAN + ID probing
3
Scrape
Pages
19,901 pages at 1 req/sec (~6 hours)
4
Detect
Contracts
6,102 contracts identified by header patterns
5
Extract
Fields
Dates, names, locations via regex

Contract Boundary Detection

Each transcription page is checked for header patterns that signal the start of a new contract. Pages without a header are treated as continuations of the previous contract. The trigger patterns include:

"This Agreement, made and entered into..." "Articles of Agreement..." "Contract made and entered into..." "This Indenture made..." "Know all men by these presents..."

Entity Extraction

From the opening paragraph of each contract, we extract five structured fields using regular expressions. Select a contract below to see how each field is identified:

Date Location Employer Workers Bureau Agent

Extraction Success Rates

The regex approach works well for standardized contract forms (especially Mississippi's "Agreement with Freedmen") but struggles with informal or heavily damaged documents. An LLM-based extraction pass is planned as a follow-up.

Data Analysis

Contracts by State

Tennessee and Mississippi dominate the corpus, together accounting for 95% of all contracts.

Contracts by Year

1865 was the peak year for contracting, as the Bureau oversaw the first full season of free labor after the war.

Top 15 Counties

Shelby (Memphis), Robertson, and Madison counties in Tennessee lead, followed by Hinds (Jackson) in Mississippi.

Contract Length (words)

Most contracts are 200–600 words. The long tail includes multi-page plantation contracts with dozens of named workers.

Extraction Confidence

Confidence is based on how many of the 4 core fields (date, county, employer, workers) were successfully extracted.

Pages per Contract

The modal contract spans 2 pages (front and back of a form). Single-page entries are often cover sheets or brief agreements.

Most Frequent Bureau Agents

Agent names are the hardest field to extract (18.8% success rate) because they appear in less standardized positions—sometimes at the bottom as a witness, sometimes on a separate approval page. After normalizing name variants, these are the most frequently identified agents:

# Agent Name Contracts

Extraction by State

State Contracts Words Dates Counties Employers Agents

Example Contracts

Below are three representative contracts from different states and years, showing the range of formats and terms. Click to expand the full transcribed text.

Future OCR Expansion

The Smithsonian Transcription Center covers only a fraction of the surviving Freedmen's Bureau labor contract records. FamilySearch hosts a curated collection of digitized images across 12+ states, of which we identified 132,084 labor contract images using the Digital Folder Number List.

Our transcription-based dataset captures approximately 19,901 pages (~15%) of this total. The remaining ~112,000 images are digitized scans of the original handwritten microfilm but have no text transcriptions—they could be processed with OCR or handwritten text recognition (HTR) models.

The largest gaps are in South Carolina (~46,000 untranscribed images), Arkansas (~17,500), Tennessee (~13,000), Louisiana (~11,500), and Virginia (~11,000). Six states have zero transcription coverage: SC, LA, KY, GA, AL, and VA.

All images are freely accessible on FamilySearch (free account required) and from the Smithsonian's IDS image service. A future OCR pipeline could expand this dataset from ~6,000 contracts to potentially tens of thousands.

19,901 transcribed ~112,000 untranscribed

What the Source Documents Look Like

Below are three examples of the original handwritten contract pages—scanned from NARA microfilm. Volunteer transcribers at the Smithsonian converted these into the text we extracted. The untranscribed pages look similar but have not yet been converted to text.

Digitized Images by State

State Digitized Images Transcribed Untranscribed Coverage
Source: FamilySearch Digital Folder Number List. Counts reflect specific image ranges identified as labor contracts, indentures, and apprenticeships. Tennessee includes images from M999, T142, and M1911; M999 and T142 cover overlapping counties and may include some duplicate filmings of the same records.