# landingai-ade

> Parse, extract, and analyze documents using LandingAI's ADE Python SDK. Handles PDFs and images with visual grounding, table extraction, and structured data output.

- Author: avaxia888
- Repository: avaxia8/ade_claude_skills
- Version: 20260205191958
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/avaxia8/ade_claude_skills
- Web: https://mule.run/skillshub/@@avaxia8/ade_claude_skills~landingai-ade:20260205191958

---

---
name: landingai-ade
description: Parse, extract, and analyze documents using LandingAI's ADE Python SDK. Handles PDFs and images with visual grounding, table extraction, and structured data output.
---

# LandingAI ADE (Agentic Document Extraction)

## Quick Start

Parse a document and extract structured data in 3 steps:

```python
from landingai_ade import LandingAIADE
from pydantic import BaseModel
from pathlib import Path

# 1. Initialize client (uses VISION_AGENT_API_KEY env var)
client = LandingAIADE()

# 2. Parse document to get markdown and chunks
response = client.parse(document=Path("invoice.pdf"))

# 3. Extract structured data with a schema
class Invoice(BaseModel):
    invoice_number: str
    total_amount: float

extracted = client.extract(
    markdown=response.markdown,
    schema=Invoice.model_json_schema()
)

print(f"Invoice #{extracted.extraction['invoice_number']}")
print(f"Total: ${extracted.extraction['total_amount']}")
```

## Core Workflow

ADE follows a three-step workflow:

1. **Parse** → Convert documents (PDF/images) to markdown with visual grounding
2. **Split** (optional) → Classify mixed documents by type  
3. **Extract** → Get structured data using schemas

### Visual Grounding

Every piece of content is mapped to its exact location in the original document:

```python
# Access location of any chunk
chunk = response.chunks[0]
print(f"Page: {chunk.grounding.page}")
print(f"Position: {chunk.grounding.box}")  # Normalized 0-1 coordinates
```

## Key Features

- **Document Parsing**: PDFs and images to structured markdown
- **Table Extraction**: Individual cell access with position data
- **Visual Grounding**: Precise bounding boxes for all content
- **Schema-based Extraction**: Use Pydantic models for structured output
- **Async Support**: Process multiple documents concurrently
- **Large File Handling**: Parse jobs API for documents >50MB

## Common Use Cases

### Parse with Page Splitting
```python
response = client.parse(
    document=Path("document.pdf"),
    split="page",  # Split by pages
    save_to="./output"  # Save JSON output
)
```

### Extract Tables with Cell Positions
```python
# Find specific cells in tables
for gid, grounding in response.grounding.items():
    if grounding.type == "tableCell":
        pos = grounding.position
        print(f"Cell at row {pos.row}, col {pos.col}")
```

### Handle Large Files
```python
# Use parse jobs for files >50MB
job = client.parse_jobs.create(document=Path("large.pdf"))
status = client.parse_jobs.get(job.job_id)
```

## Resources

- **[REFERENCE.md](REFERENCE.md)** - Complete API reference with all parameters
- **[scripts/](scripts/)** - Runnable examples for common tasks:
  - `parse_document.py` - Parsing examples
  - `extract_data.py` - Schema-based extraction
  - `split_documents.py` - Document classification
  - `visualize_chunks.py` - Visualization with bounding boxes
  - `handle_tables.py` - Table and cell processing

## Installation

```bash
pip install landingai-ade
export VISION_AGENT_API_KEY="v2_..."
```

## Models

- Parse: `dpt-2-latest`
- Extract: `extract-latest`
- Split: `split-latest`

## Error Handling

```python
from landingai_ade.exceptions import RateLimitError, APITimeoutError

try:
    response = client.parse(document=Path("doc.pdf"))
except RateLimitError:
    time.sleep(10)  # Backoff and retry
except APITimeoutError:
    # Use parse_jobs for large files
    job = client.parse_jobs.create(document=Path("doc.pdf"))
```