csctf

安装

npx skills add https://github.com/dicklesworthstone/agent_flywheel_clawdbot_skills_and_integrations --skill csctf

CSCTF — Chat Shared Conversation To File

A Bun-native CLI that turns public ChatGPT, Gemini, Grok, and Claude share links into clean Markdown + HTML transcripts with preserved code fences, stable filenames, and optional GitHub Pages publishing.

Why This Exists

Copy/pasting AI share links often:

Breaks fenced code blocks — loses formatting and structure Loses language hints — no syntax highlighting Produces messy filenames — random or unreadable names Requires manual cleanup — inconsistent formatting

CSCTF fixes this with:

Stable slugs — deterministic, collision-proof filenames Language-preserving fences — code blocks retain syntax hints Normalized whitespace — clean, consistent output Static HTML twin — no JS, ready for hosting/archiving One-command GitHub Pages — instant shareable microsite Quick Start

Install

curl -fsSL https://raw.githubusercontent.com/Dicklesworthstone/chat_shared_conversation_to_file/main/install.sh | bash

Convert any share link

csctf https://chatgpt.com/share/69343092-91ac-800b-996c-7552461b9b70 csctf https://gemini.google.com/share/66d944b0e6b9 csctf https://grok.com/share/bGVnYWN5_d5329c61-f497-40b7-9472-c555fa71af9c csctf https://claude.ai/share/549c846d-f6c8-411c-9039-a9a14db376cf

Output:

.md — Clean Markdown with preserved code fences .html — Styled static HTML (zero JavaScript) Supported Providers Provider URL Pattern Method Notes ChatGPT chatgpt.com/share/ Headless Chromium Public shares only Gemini gemini.google.com/share/ Headless Chromium Public shares only Grok grok.com/share/ Headless Chromium Public shares only Claude claude.ai/share/ Your Chrome session Requires login Claude.ai Special Handling

Claude.ai uses Cloudflare protection that blocks standard browser automation. CSCTF handles this automatically:

Copies your Chrome session cookies to a temporary profile Launches Chrome with remote debugging enabled Connects via Chrome DevTools Protocol to extract conversation If Chrome is running, offers to save tabs, restart, and restore afterward

Requirements: Chrome installed + logged into claude.ai in your regular Chrome session.

Design Principles Principle Implementation Determinism Explicit slugging and collision handling Minimal network Only share URL fetched (update checks/publish opt-in) Safety Static HTML (inline CSS/HLJS), no scripts emitted Clarity Colorized step-based logging, confirmation gates Atomicity Temp+rename writes prevent partial files How It Works ChatGPT, Gemini, Grok (End-to-End) 1. Launch headless Playwright Chromium with stealth config (spoofed navigator properties, realistic headers) 2. Navigate twice (domcontentloaded → networkidle) for late-loading assets 3. Detect provider from URL hostname 4. Wait for provider-specific selectors with retry/fallback 5. Extract each role's inner HTML (assistant/user), traverse Shadow DOM 6. Clean pills/metadata, run Turndown with fenced-code rule 7. Normalize whitespace and newlines 8. Write Markdown to temp file, rename atomically 9. Render HTML twin with inline CSS/TOC/HLJS

Claude.ai 1. Copy Chrome session cookies to temporary profile 2. Launch Chrome with remote debugging 3. Connect via Chrome DevTools Protocol 4. Extract conversation HTML 5. Process through same Turndown/normalization pipeline 6. Clean up temporary profile

Processing Algorithms Selector Strategy

Provider-specific selectors with fallback chains:

ChatGPT: article [data-message-author-role] Gemini: Custom web components (share-turn-viewer, response-container) Grok: Flexible data-testid patterns Claude: [data-testid="user-message"] and streaming indicators

Each has multiple fallbacks tried with short timeouts.

Turndown Customization Injects fenced code blocks Detects language via class="language-*" Strips citation pills and data-start/data-end attributes Normalization Converts newlines to \n Removes Unicode LS/PS characters Collapses excessive blank lines Slugging Algorithm Title → lowercase → non-alphanumerics → "_" → trim → max 120 chars → Windows reserved-name suffix → collision suffix (_2, _3, ...)

HTML Rendering Markdown-it + highlight.js Heading slug de-dupe for TOC Inline CSS for light/dark/print Zero JavaScript Command Reference csctf [options]

Output Options Flag Default Description --outfile auto Override output path --no-html / --md-only off Skip HTML output --html-only off Skip Markdown output --quiet off Minimal logging --timeout-ms 60000 Navigation + selector timeout GitHub Pages Publishing Flag Default Description --publish-to-gh-pages off Publish to GitHub Pages --gh-pages-repo my_shared_conversations Target repo --gh-pages-branch gh-pages Target branch --gh-pages-dir

csctf Subdirectory in repo --remember off Save GH settings --forget-gh-pages off Clear saved settings --dry-run off Simulate publish (build index, no push) --yes / --no-confirm off Skip PROCEED confirmation prompt --gh-install off Auto-install gh CLI Other Flag Description --check-updates Print latest release tag --version Print version and exit Output Format Markdown Structure

Conversation: <Title>

Source: https://chatgpt.com/share/... Retrieved: 2026-01-08T15:30:00Z

User

How do I sort an array in Python?

Assistant

Here's how to sort an array in Python:

```python

Sort in place

my_list.sort()

Return new sorted list

sorted_list = sorted(my_list)

HTML Features

  • Standalone — No external dependencies
  • Zero JavaScript — Safe for any hosting
  • Inline CSS — Light/dark mode via prefers-color-scheme
  • Syntax highlighting — highlight.js themes inline
  • Table of contents — Auto-generated from headings
  • Language badges — Code block language indicators
  • Print-friendly — Optimized print styles

Filename Generation

"How to Build a REST API" → how_to_build_a_rest_api.md "Python Tips & Tricks!" → python_tips_tricks.md "File exists already" → file_exists_already_2.md

Rules: - Lowercase - Non-alphanumerics → _ - Trimmed leading/trailing _ - Max 120 characters - Windows reserved names suffixed - Collisions: _2, _3, ...

GitHub Pages Publishing

Quick Recipe

```bash

Publish with defaults

csctf --publish-to-gh-pages --yes

Creates: /my_shared_conversations repo

Branch: gh-pages

Directory: csctf/

Remembered Settings

First time: save settings

csctf --publish-to-gh-pages --remember --yes

Subsequent: just use --yes

csctf --yes

Clear remembered settings

csctf --forget-gh-pages

Custom Configuration csctf --publish-to-gh-pages \ --gh-pages-repo myuser/my-chats \ --gh-pages-branch main \ --gh-pages-dir exports \ --yes

Requirements GitHub CLI (gh) installed and authenticated Verify with: gh auth status Publish Flow Resolve repo/branch/dir (use remembered or defaults) Clone (or create via gh) Copy MD + HTML files Regenerate manifest.json and index.html Commit + push Print viewer URL Recipes Quiet CI Scrape (MD only) csctf --md-only --quiet --outfile /tmp/chat.md

HTML-only for Embedding csctf --html-only --outfile site/chat.html

Slow/Large Conversations csctf --timeout-ms 90000

Custom Browser Cache PLAYWRIGHT_BROWSERS_PATH=/opt/ms-playwright csctf

Batch Archive for url in $URLS; do csctf "$url" --outfile ~/archive/ --quiet done

Security & Privacy Network Behavior Only fetches: The share URL itself Opt-in: Update checks, GitHub publish flows Auth: GitHub CLI (gh) for publishing—no tokens stored HTML Safety Zero JavaScript in output Inline styles only Citation pills and data attributes stripped highlight.js used statically Filesystem Temp+rename write pattern (atomic) Collision-proof naming Config: ~/.config/csctf/config.json Claude.ai Cookies Copied to temporary directory only Used for single scraping session Original Chrome profile never modified Performance Phase Time First run (Chromium download) 30-60s Subsequent runs 5-15s Claude.ai (uses local Chrome) 5-10s Characteristics Playwright browsers cached after first run 60s default timeout, 3-attempt backoff Single page/context, linear processing Atomic writes prevent partial outputs Failure Modes & Remedies Symptom Fix "No messages found" Link is private or layout changed; verify public share, retry with --timeout-ms 90000 Bot detection / challenge page Stealth techniques used; retry or verify link in browser Timeout or blank page Raise --timeout-ms, verify connectivity Publish fails (auth) Ensure gh auth status passes Publish fails (branch/dir) Pass --gh-pages-branch / --gh-pages-dir; use --remember Filename collisions Expected; tool appends _2, _3, ... Claude.ai Cloudflare challenge Complete verification in Chrome window, press Enter Claude.ai won't load Ensure logged into claude.ai in Chrome; close Chrome if prompted Environment Variables Runtime Variable Description PLAYWRIGHT_BROWSERS_PATH Reuse cached Chromium bundle Installer Variable Description Default VERSION Pin release tag latest DEST Install directory ~/.local/bin CHECKSUM_URL Override checksum location — File Locations Path Purpose ~/.local/bin/csctf Binary ~/.config/csctf/config.json GitHub Pages settings ~/.cache/ms-playwright/ Playwright Chromium cache Installation

One-liner (recommended)

curl -fsSL https://raw.githubusercontent.com/Dicklesworthstone/chat_shared_conversation_to_file/main/install.sh | bash

Pin version

VERSION=v1.0.0 curl -fsSL .../install.sh | bash

Custom directory

DEST=/opt/bin curl -fsSL .../install.sh | bash

Verify checksum

curl -fsSL .../install.sh | bash -s -- --verify

From Source bun install bun run build

Binary at dist/csctf

Comparison Feature Copy/Paste csctf Code blocks preserved Often broken Always preserved Language hints Lost Detected and kept Filenames Random/messy Deterministic slugs HTML output None Styled, no-JS twin GitHub Pages Manual One command Collision handling Overwrite Auto-suffix Limitations Requires public share links (except Claude.ai which uses your session) Provider layouts may change (selectors maintained with fallbacks) Markdown/HTML exports require share to be available at scrape time Claude.ai requires Chrome installed with active login session First run downloads Playwright Chromium (~200MB) Integration with Flywheel Tool Integration CASS Archive conversations for session search CM Extract procedural memory from exported chats Agent Mail Attach conversation exports to agent messages NTM Export multi-agent session transcripts

返回排行榜