# pdf-files

> Work with PDFs safely and repeatably: extract text/tables, convert pages to images, inspect/fill forms, and produce verifiable outputs (markdown/json/images/filled pdf). Use when a task involves PDF documents.

- Author: Daniel Montero
- Repository: dmonteroh/curated-agent-skills
- Version: 20260208022942
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-08
- Source: https://github.com/dmonteroh/curated-agent-skills
- Web: https://mule.run/skillshub/@@dmonteroh/curated-agent-skills~pdf-files:20260208022942

---

---
name: pdf-files
description: "Work with PDFs safely and repeatably: extract text/tables, convert pages to images, inspect/fill forms, and produce verifiable outputs (markdown/json/images/filled pdf). Use when a task involves PDF documents."
category: docs
---
# PDF Files

Provides deterministic, verifiable workflows for extracting text or tables, converting pages to images, and filling PDF forms. Produces traceable artifacts and explicit verification notes.

## Use this skill when

- Extracting text or tables from PDFs
- Rendering pages to images for review, OCR, or coordinate work
- Inspecting or filling PDF forms (fillable fields or visual placement)
- Verifying that a filled PDF renders correctly

## Do not use this skill when

- Inputs are not PDF files
- Layout or typography editing is required (use a design tool instead)
- A task only needs plain text already provided

## Required inputs

- PDF file path(s)
- Desired output artifacts (text, tables, images, filled PDF)
- Output directory or file names
- Form field values (if filling)
- Constraints (read-only, no network, retention limits)

## Path conventions

Commands assume the working directory is the skill root (`pdf-files/`). Adjust paths if running from another directory.

## Workflow

### 1) Intake and safety

- Actions: confirm PDF paths, create output paths, preserve originals.
- Output: input list, output plan, and working copy locations (if needed).

### 2) Inspect and classify

- Actions: determine whether the PDF is text-based or scanned and check for fillable fields.
- Command: `python3 ./scripts/check_fillable_fields.py input.pdf`
- Output: classification (text vs scanned; fillable vs non-fillable) and chosen path.
- Decision:
  - If fillable fields exist, follow `references/forms-fillable-fields.md`.
  - If no fillable fields and the task is to fill a form, follow `references/forms-visual-annotations.md`.

### 3) Extract or render

- Actions: extract text/tables with available local tools, or render pages to images.
- Command: `python3 ./scripts/convert_pdf_to_images.py input.pdf output_dir/`
- Output: extracted text/tables or `page_*.png` images with recorded paths.

### 4) Fill forms (if needed)

- Actions: use the appropriate form workflow to create field JSON and an output PDF.
- Output: `field_values.json` or `fields.json`, plus filled PDF.

### 5) Verify outputs

- Actions: open rendered images or filled PDF and confirm expected content/placement.
- Output: verification notes (viewer used, pages checked, pass/fail).

## Scripts and dependencies

Dependencies: Python 3, `pypdf`, `pdf2image`, `Pillow`. `pdf2image` requires Poppler binaries available on `PATH`.

- `scripts/check_fillable_fields.py`
  - Usage: `python3 ./scripts/check_fillable_fields.py input.pdf`
  - Output: stdout indicates whether fields exist.
  - Verification: include stdout in the report.

- `scripts/extract_form_field_info.py`
  - Usage: `python3 ./scripts/extract_form_field_info.py input.pdf fields.json`
  - Output: `fields.json` with field metadata.
  - Verification: spot-check page numbers and field IDs.

- `scripts/fill_fillable_fields.py`
  - Usage: `python3 ./scripts/fill_fillable_fields.py input.pdf field_values.json output.pdf`
  - Output: filled `output.pdf`.
  - Verification: open the output PDF and confirm field values.

- `scripts/convert_pdf_to_images.py`
  - Usage: `python3 ./scripts/convert_pdf_to_images.py input.pdf output_dir/`
  - Output: `page_*.png` images.
  - Verification: open at least one page image.

- `scripts/create_validation_image.py`
  - Usage: `python3 ./scripts/create_validation_image.py page_number fields.json input.png output.png`
  - Output: validation image with bounding boxes.
  - Verification: confirm red/blue boxes align with intended areas.

- `scripts/check_bounding_boxes.py`
  - Usage: `python3 ./scripts/check_bounding_boxes.py fields.json`
  - Output: success/failure messages.
  - Verification: require `SUCCESS` before continuing.

- `scripts/fill_pdf_form_with_annotations.py`
  - Usage: `python3 ./scripts/fill_pdf_form_with_annotations.py input.pdf fields.json output.pdf`
  - Output: filled `output.pdf` with annotations.
  - Verification: open the output PDF and confirm placement.

## Common pitfalls

- Empty text extraction indicates a scanned PDF; switch to image conversion or OCR.
- Field IDs or page numbers mismatch; regenerate `fields.json` and recheck.
- Bounding boxes intersect or misalign; regenerate validation images and rerun checks.
- Filled values appear blank in some viewers; verify in another viewer.

## Examples

### Example 1: Text extraction

Input: `contract.pdf`
Output artifacts: `contract.md`
Verification: preview `contract.md` for completeness.

### Example 2: Fillable form

Input: `form.pdf`
Output artifacts: `fields.json`, `field_values.json`, `filled-form.pdf`
Verification: open `filled-form.pdf` and confirm values render.

## Output contract

Provide results using this format:

```
Summary:
Inputs:
Decisions:
Outputs:
Verification:
Notes:
```

## Resources

- Playbook: `resources/implementation-playbook.md`
- References index: `references/README.md`