Release v0.7.2 produces a corrupted CLAUDE.md file #1

Closed
opened 2026-05-19 04:42:13 +00:00 by foravo_admin · 2 comments
Owner

Imported from GitHub issue ggrace519/claude-code-dev-studio#9.

Source: https://github.com/ggrace519/claude-code-dev-studio/issues/9
Original author: @ggrace519
Original state: closed


The latest release seems to add hundreds of KB of corrupted character encoding.

Imported from GitHub issue `ggrace519/claude-code-dev-studio#9`. Source: https://github.com/ggrace519/claude-code-dev-studio/issues/9 Original author: @ggrace519 Original state: closed <!-- foravo:github-issue:ggrace519/claude-code-dev-studio#9 --> --- The latest release seems to add hundreds of KB of corrupted character encoding.
Author
Owner

Imported from GitHub issue comment ggrace519/claude-code-dev-studio#9:4362208526.

Source: https://github.com/ggrace519/claude-code-dev-studio/issues/9#issuecomment-4362208526
Original author: @ggrace519


@claude please review this issue

Imported from GitHub issue comment `ggrace519/claude-code-dev-studio#9:4362208526`. Source: https://github.com/ggrace519/claude-code-dev-studio/issues/9#issuecomment-4362208526 Original author: @ggrace519 <!-- foravo:github-issue-comment:ggrace519/claude-code-dev-studio#9:4362208526 --> --- @claude please review this issue
Author
Owner

Imported from GitHub issue comment ggrace519/claude-code-dev-studio#9:4362209083.

Source: https://github.com/ggrace519/claude-code-dev-studio/issues/9#issuecomment-4362209083
Original author: @claude[bot]


Claude finished @ggrace519's task in 7m 12s —— View jobclaude/issue-9-20260501-2358Create PR ➔


Issue Review: Corrupted CLAUDE.md in Release v0.7.2

Tasks:

  • Gather context and understand the issue
  • Inspect the CLAUDE.md file and installer scripts for encoding issues
  • Identify root cause(s)
  • Implement fix for Get-Content encoding issues
  • Extend null-byte check in build-release.ps1 to cover .md files
  • Commit and push

Summary

Note: v0.7.2 does not appear in the CHANGELOG of this repo (latest tracked release is v0.6.1 from Session 17). The root cause analysis below is based on the current codebase, which has the same latent encoding bug that would manifest for any version.


Root Cause

Install-Playbook.ps1Get-Content without explicit encoding (4 locations)

Set-ClaudePlaybookBlock, Remove-ClaudePlaybookBlock, and Install-Completions all read files using PowerShell's Get-Content -Raw without specifying -Encoding:

# BEFORE (buggy)
$blockContent = (Get-Content -LiteralPath $JitBlockPath -Raw).TrimEnd(...)
$existing     = if (Test-Path $claudeMd) { Get-Content -LiteralPath $claudeMd -Raw } else { '' }

In PowerShell 5.1 (Windows PowerShell, still shipping as the default on Windows), Get-Content without -Encoding reads BOM-less files using the system ANSI codepage (typically Windows-1252). UTF-8 encoded multi-byte sequences — accented characters, smart quotes (""), em-dashes (), arrows (), any non-ASCII — are decoded as the wrong characters. These corrupted characters are then written back by WriteAllText(..., UTF8Encoding::new($false)) as garbled UTF-8 byte sequences.

Why this produces "hundreds of KB of corrupted character encoding":

The CLAUDE.md file written by the installer contains content injected from jit-claude.md. On ANSI codepage read, the Unicode arrows and smart quotes in the JIT block (, , ", etc.) are decoded as Windows-1252 two-character sequences (e.g., ’, â†'). When written back as UTF-8, these expanded sequences permanently inflate the file. Each reinstall compounds the corruption — even with the marker-stripping idempotency guard, the user content outside the markers accumulates the ANSI misread bytes on every run.


Secondary Finding

build-release.ps1 — null-byte check excludes .md and .json

The preflight null-byte check only covers .ps1 and .sh files:

# BEFORE (incomplete coverage)
$scriptFiles = Get-ChildItem -LiteralPath $stageDir -Recurse -File -Include '*.ps1', '*.sh'

Agent files (.md) and catalog.json were not checked. If a binary-corrupted agent or config file reached the stage directory, it would ship in the release ZIP undetected.


Changes Made

Install-Playbook.ps1 — all four Get-Content -Raw calls replaced with [System.IO.File]::ReadAllText($path, [System.Text.Encoding]::UTF8):

  • Set-ClaudePlaybookBlock lines 371–372 (reads jit-claude.md and CLAUDE.md)
  • Remove-ClaudePlaybookBlock line 495 (reads CLAUDE.md for block removal)
  • Install-Completions line 444 / Remove-CompletionBlock line 469 (reads PS profile)

[System.Text.Encoding]::UTF8 is BOM-aware — it correctly handles both UTF-8 with BOM (written by editors like Notepad/VS Code on Windows) and UTF-8 without BOM (written by the installer). This exactly matches the write path (UTF8Encoding::new($false)) already in use throughout the file.

build-release.ps1 — null-byte preflight extended to include .md and .json:

# AFTER
$scriptFiles = Get-ChildItem -LiteralPath $stageDir -Recurse -File -Include '*.ps1', '*.sh', '*.md', '*.json'

Testing Recommendations

To verify the fix on Windows PowerShell 5.1:

  1. Create a ~/.claude/CLAUDE.md containing non-ASCII content (e.g., My instructions — use "smart quotes" → always).
  2. Run the installer. Confirm CLAUDE.md retains the original non-ASCII content unchanged.
  3. Run the installer a second time. Confirm the file size stays stable (idempotent).

The bash installer (ccds-user-setup.sh) is not affected — it uses awk + cat and relies on the shell's default locale (UTF-8 on macOS/Linux).


|

Imported from GitHub issue comment `ggrace519/claude-code-dev-studio#9:4362209083`. Source: https://github.com/ggrace519/claude-code-dev-studio/issues/9#issuecomment-4362209083 Original author: @claude[bot] <!-- foravo:github-issue-comment:ggrace519/claude-code-dev-studio#9:4362209083 --> --- **Claude finished @ggrace519's task in 7m 12s** —— [View job](https://github.com/ggrace519/claude-code-dev-studio/actions/runs/25238345111) • [`claude/issue-9-20260501-2358`](https://github.com/ggrace519/claude-code-dev-studio/tree/claude/issue-9-20260501-2358) • [Create PR ➔](https://github.com/ggrace519/claude-code-dev-studio/compare/main...claude/issue-9-20260501-2358?quick_pull=1&title=fix(encoding)%3A%20use%20ReadAllText(UTF8)%20for%20all%20file%20reads%20in%20PS%20installer&body=Fixes%20corrupted%20CLAUDE.md%20caused%20by%20Get-Content%20reading%20UTF-8%20files%20through%20the%20ANSI%20codepage%20in%20PowerShell%205.1.%0A%0AAll%20four%20Get-Content%20-Raw%20calls%20in%20Install-Playbook.ps1%20are%20replaced%20with%20%5BSystem.IO.File%5D%3A%3AReadAllText(%24path%2C%20%5BSystem.Text.Encoding%5D%3A%3AUTF8)%2C%20which%20is%20BOM-aware%20and%20matches%20the%20write%20path%20already%20in%20use.%0A%0AAlso%20extends%20the%20null-byte%20preflight%20in%20build-release.ps1%20to%20cover%20.md%20and%20.json%20files.%0A%0ACloses%20%239%0A%0AGenerated%20with%20%5BClaude%20Code%5D(https%3A%2F%2Fclaude.ai%2Fcode)) --- ### Issue Review: Corrupted CLAUDE.md in Release v0.7.2 **Tasks:** - [x] Gather context and understand the issue - [x] Inspect the CLAUDE.md file and installer scripts for encoding issues - [x] Identify root cause(s) - [x] Implement fix for `Get-Content` encoding issues - [x] Extend null-byte check in build-release.ps1 to cover .md files - [x] Commit and push --- ### Summary > Note: `v0.7.2` does not appear in the CHANGELOG of this repo (latest tracked release is `v0.6.1` from Session 17). The root cause analysis below is based on the current codebase, which has the same latent encoding bug that would manifest for any version. --- ### Root Cause **`Install-Playbook.ps1` — `Get-Content` without explicit encoding (4 locations)** `Set-ClaudePlaybookBlock`, `Remove-ClaudePlaybookBlock`, and `Install-Completions` all read files using PowerShell's `Get-Content -Raw` without specifying `-Encoding`: ```powershell # BEFORE (buggy) $blockContent = (Get-Content -LiteralPath $JitBlockPath -Raw).TrimEnd(...) $existing = if (Test-Path $claudeMd) { Get-Content -LiteralPath $claudeMd -Raw } else { '' } ``` In **PowerShell 5.1** (Windows PowerShell, still shipping as the default on Windows), `Get-Content` without `-Encoding` reads BOM-less files using the **system ANSI codepage** (typically Windows-1252). UTF-8 encoded multi-byte sequences — accented characters, smart quotes (`""`), em-dashes (`—`), arrows (`→`), any non-ASCII — are decoded as the wrong characters. These corrupted characters are then written back by `WriteAllText(..., UTF8Encoding::new($false))` as garbled UTF-8 byte sequences. **Why this produces "hundreds of KB of corrupted character encoding":** The CLAUDE.md file written by the installer contains content injected from `jit-claude.md`. On ANSI codepage read, the Unicode arrows and smart quotes in the JIT block (`→`, `—`, `"`, etc.) are decoded as Windows-1252 two-character sequences (e.g., `’`, `â†'`). When written back as UTF-8, these expanded sequences permanently inflate the file. Each reinstall compounds the corruption — even with the marker-stripping idempotency guard, the user content *outside* the markers accumulates the ANSI misread bytes on every run. --- ### Secondary Finding **`build-release.ps1` — null-byte check excludes `.md` and `.json`** The preflight null-byte check only covers `.ps1` and `.sh` files: ```powershell # BEFORE (incomplete coverage) $scriptFiles = Get-ChildItem -LiteralPath $stageDir -Recurse -File -Include '*.ps1', '*.sh' ``` Agent files (`.md`) and `catalog.json` were not checked. If a binary-corrupted agent or config file reached the stage directory, it would ship in the release ZIP undetected. --- ### Changes Made **`Install-Playbook.ps1`** — all four `Get-Content -Raw` calls replaced with `[System.IO.File]::ReadAllText($path, [System.Text.Encoding]::UTF8)`: - `Set-ClaudePlaybookBlock` lines 371–372 (reads `jit-claude.md` and `CLAUDE.md`) - `Remove-ClaudePlaybookBlock` line 495 (reads `CLAUDE.md` for block removal) - `Install-Completions` line 444 / `Remove-CompletionBlock` line 469 (reads PS profile) `[System.Text.Encoding]::UTF8` is **BOM-aware** — it correctly handles both UTF-8 with BOM (written by editors like Notepad/VS Code on Windows) and UTF-8 without BOM (written by the installer). This exactly matches the write path (`UTF8Encoding::new($false)`) already in use throughout the file. **`build-release.ps1`** — null-byte preflight extended to include `.md` and `.json`: ```powershell # AFTER $scriptFiles = Get-ChildItem -LiteralPath $stageDir -Recurse -File -Include '*.ps1', '*.sh', '*.md', '*.json' ``` --- ### Testing Recommendations To verify the fix on Windows PowerShell 5.1: 1. Create a `~/.claude/CLAUDE.md` containing non-ASCII content (e.g., `My instructions — use "smart quotes" → always`). 2. Run the installer. Confirm `CLAUDE.md` retains the original non-ASCII content unchanged. 3. Run the installer a second time. Confirm the file size stays stable (idempotent). The bash installer (`ccds-user-setup.sh`) is not affected — it uses `awk` + `cat` and relies on the shell's default locale (UTF-8 on macOS/Linux). --- |
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
foravo/claude-code-dev-studio-comment-proof-20260519044203#1
No description provided.