# imessage-extraction > Extract, decode, and query iMessage conversations from macOS chat.db. Use when user needs to access iMessage history, export conversations, search messages, decode attributedBody fields, convert Apple timestamps, or create clean SQLite databases from iMessage data. Handles NSKeyedArchiver decoding, artifact cleanup, and schema navigation. - Author: Marcus Vorwaller - Repository: marcus/marcus-skills - Version: 20260120170157 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/marcus/marcus-skills - Web: https://mule.run/skillshub/@@marcus/marcus-skills~imessage-extraction:20260120170157 --- --- name: imessage-extraction description: Extract, decode, and query iMessage conversations from macOS chat.db. Use when user needs to access iMessage history, export conversations, search messages, decode attributedBody fields, convert Apple timestamps, or create clean SQLite databases from iMessage data. Handles NSKeyedArchiver decoding, artifact cleanup, and schema navigation. --- # iMessage Extraction Extract and process iMessage conversations from the macOS Messages database. ## Database Location ``` ~/Library/Messages/chat.db ``` **Full Disk Access required**: Grant Full Disk Access to Terminal in System Preferences > Security & Privacy > Privacy > Full Disk Access. **Setup steps:** 1. Go to **System Preferences > Security & Privacy > Privacy > Full Disk Access** 2. Click the `+` button and add `/Applications/Utilities/Terminal.app` (or your terminal app) 3. Restart Terminal for changes to take effect 4. Test by running: `sqlite3 ~/Library/Messages/chat.db "SELECT COUNT(*) FROM chat;"` ## Quick Start ### 1. List All Conversations ```python import sqlite3 import os conn = sqlite3.connect(os.path.expanduser('~/Library/Messages/chat.db')) cursor = conn.cursor() cursor.execute(""" SELECT c.ROWID, c.chat_identifier, c.display_name, COUNT(cmj.message_id) as message_count FROM chat c LEFT JOIN chat_message_join cmj ON c.ROWID = cmj.chat_id GROUP BY c.ROWID ORDER BY message_count DESC """) for row in cursor.fetchall(): print(f"Chat {row[0]}: {row[1]} ({row[2] or 'No name'}) - {row[3]} messages") ``` ### 2. Extract Messages from a Conversation ```python chat_id = 123 # Replace with actual chat ID cursor.execute(""" SELECT m.ROWID, m.date, m.is_from_me, m.text, m.attributedBody, h.id as sender FROM message m JOIN chat_message_join cmj ON m.ROWID = cmj.message_id LEFT JOIN handle h ON m.handle_id = h.ROWID WHERE cmj.chat_id = ? ORDER BY m.date ASC """, (chat_id,)) ``` ## Apple Timestamp Conversion iMessage uses Apple's Core Data timestamp: **nanoseconds since 2001-01-01**. ```python from datetime import datetime APPLE_EPOCH = 978307200 # Unix timestamp of 2001-01-01 def apple_to_datetime(apple_ns): """Convert Apple nanosecond timestamp to datetime.""" if not apple_ns: return None unix_ts = (apple_ns / 1_000_000_000) + APPLE_EPOCH return datetime.fromtimestamp(unix_ts) def apple_to_unix(apple_ns): """Convert Apple nanosecond timestamp to Unix timestamp.""" return int(apple_ns / 1_000_000_000 + APPLE_EPOCH) if apple_ns else None def datetime_to_apple(dt): """Convert datetime to Apple nanosecond timestamp.""" unix_ts = dt.timestamp() return int((unix_ts - APPLE_EPOCH) * 1_000_000_000) ``` **SQL conversion**: ```sql datetime(m.date/1000000000 + 978307200, 'unixepoch', 'localtime') as readable_date ``` ## Decoding attributedBody Many messages store content in `attributedBody` (NSKeyedArchiver format) rather than `text`. See `references/decoding.md` for the full decoder. **Quick decode with PyObjC** (if installed): ```python from Foundation import NSData, NSKeyedUnarchiver def decode_attributed_body(data): if not data: return None try: ns_data = NSData.dataWithBytes_length_(data, len(data)) unarchiver = NSKeyedUnarchiver.alloc().initForReadingWithData_(ns_data) unarchiver.setRequiresSecureCoding_(False) obj = unarchiver.decodeObjectForKey_("root") return str(obj.string()) if obj and hasattr(obj, 'string') else None except: return None ``` **Install PyObjC**: `pip install pyobjc-framework-Cocoa` **Fallback without PyObjC**: See `references/decoding.md` for manual binary parsing. ## Cleaning Message Artifacts Decoded messages often contain artifacts from the typedstream encoding: ```python import re def clean_message(text): """Clean decoding artifacts from message text.""" if not text: return text # Replace object replacement character (marks images) text = text.replace('\ufffc', '[image]') # Remove non-printable characters text = ''.join(c for c in text if c.isprintable() or c in '\n\r\t') # Remove leading format codes text = re.sub(r'^[\+\*\!\.\,\;\#\%\`\~\^\&\@\$\s]+', '', text) # Remove trailing NSDictionary artifacts text = re.sub(r'\s*NSDictionary\s*$', '', text) text = re.sub(r'\s*NSMutable[A-Za-z]+\s*$', '', text) # Remove trailing iMessage metadata (e.g., &__kIMBaseWritingDirection...) text = re.sub(r'\s*&?__kIM[^\s]*.*$', '', text) # Remove trailing iMessage metadata text = re.sub(r'[ij]I[^\s]*$', '', text) # Remove leading digit artifact text = re.sub(r'^[0-9](?=[a-zA-Z])', '', text) # Remove leading single-letter artifacts from iMessage encoding # Patterns: "dOMG" -> "OMG", "Ohttps://" -> "https://", "Ci believe" -> "i believe" # BUT NOT: "OMG" -> "MG" (uppercase + uppercase = real word) if len(text) > 2 and text[0].isalpha(): second_char = text[1] rest = text[1:] # Case 1: lowercase followed by uppercase (e.g., "dOMG") if text[0].islower() and second_char.isupper(): text = rest # Case 2: any letter followed by "http" (e.g., "Ohttps://") elif rest[:4].lower() == 'http': text = rest # Case 3: any letter followed by "i " or "i'" (e.g., "Ci believe") elif second_char == 'i' and len(text) > 2 and text[2] in " '": text = rest return text.strip() ``` ## Detecting Reactions and Quotes iMessage reactions (tapbacks) and quoted messages require special handling. **Reaction format**: `ReactionWord + space + curly_quote + original_message + curly_quote` - Example: `Loved "When I first heard this song..."` - Uses Unicode curly quotes: `"` (U+201C) and `"` (U+201D), not straight quotes **Important**: Use regex matching, not `startswith()`. The curly quotes vary and cause issues with string prefix matching. ```python # Quote characters (ASCII and Unicode curly quotes) QUOTE_CHARS = '"\'""\u201c\u201d\u2018\u2019' def is_reaction_message(text): """Check if message is a reaction (Loved, Laughed at, etc.). Uses regex instead of startswith() because reaction messages use Unicode curly quotes (U+201C, U+201D) which vary and cause matching issues. """ if not text: return False return bool(re.match(r'^(Reacted|Loved|Laughed|Emphasized|Disliked|Questioned|Liked)\s', text)) def is_quoted_message(text): """Check if message ends with a quote (usually quoting someone).""" if not text: return False stripped = text.rstrip() return stripped[-1] in QUOTE_CHARS if stripped else False ``` Use these to filter analysis: ```python # Skip reactions and quotes when analyzing conversation flow meaningful_messages = [ m for m in messages if not is_reaction_message(m['text']) and not is_quoted_message(m['text']) ] ``` ## Creating a Clean Database For easier querying, create a new database with decoded messages. Run `scripts/create_clean_db.py` or use this schema: ```sql CREATE TABLE messages ( id INTEGER PRIMARY KEY, guid TEXT UNIQUE, text TEXT, decoded_text TEXT, date TEXT, date_unix INTEGER, is_from_me INTEGER, handle_id INTEGER, has_attachments INTEGER, service TEXT ); CREATE TABLE handles ( id INTEGER PRIMARY KEY, contact_id TEXT, service TEXT ); CREATE VIEW messages_readable AS SELECT m.id, m.guid, m.date, CASE WHEN m.is_from_me = 1 THEN 'You' ELSE h.contact_id END as sender, m.decoded_text as message, m.has_attachments FROM messages m LEFT JOIN handles h ON m.handle_id = h.id ORDER BY m.date_unix; ``` ## Common Queries **Search messages:** ```sql SELECT * FROM messages_readable WHERE message LIKE '%keyword%'; ``` **Messages by date range:** ```sql SELECT * FROM messages_readable WHERE date BETWEEN '2023-01-01' AND '2023-12-31'; ``` **Count by sender:** ```sql SELECT sender, COUNT(*) FROM messages_readable GROUP BY sender; ``` **Messages with attachments:** ```sql SELECT m.*, a.filename, a.mime_type FROM message m JOIN message_attachment_join maj ON m.ROWID = maj.message_id JOIN attachment a ON maj.attachment_id = a.ROWID; ``` ## Workflow Summary 1. **Find chat ID**: Query `chat` table to find conversation 2. **Extract messages**: Join `message` + `chat_message_join` + `handle` 3. **Decode text**: Check `text` first, then decode `attributedBody` 4. **Clean artifacts**: Apply cleaning patterns 5. **Convert timestamps**: Apply Apple epoch conversion 6. **Export**: Save to CSV, new SQLite, or other format ## Reference Files - `references/decoding.md` - Full attributedBody decoder with fallback methods - `references/schema.md` - Complete chat.db schema reference - `scripts/create_clean_db.py` - Create queryable clean database