Convert Disk Usage to Hierarchical Markdown

Objective: to create a well-organized, nested Markdown report showing disk usage of directories and files.

Workflow
Markdown Structure
Generic Example
Full Execution Plan
Popular Markdown Editors & Viewers

Workflow

Data Collection
From your chosen base directory, run du -sh recursively to generate disk usage reports, saving each top-level folder’s output as a .du.txt file.
Conversion
The Python script duDirs2MD.py reads each .du.txt file and converts it into a clean, structured .md file with proper nesting.
Final Output
All .md files are merged in alphabetical order into one master Markdown document.

Markdown Structure

# → Top-level folder (highest level)
## → Level 2 subdirectories
### → Level 3 subdirectories
#### → Level 4 and deeper
- [ ] → Leaf items (files or deepest folders)

Generic Example

Sample input (from du):

120G subdir1/
45G subdir1/subdir2/
15G subdir1/subdir2/subdir3/
8G subdir1/subdir2/subdir3/fileA
5G subdir1/subdir2/subdir3/fileB
30G subdir1/subdir2/subdir4/
75G subdir1/subdir5/

Output Markdown:

# 120G subdir1

## 45G subdir1/subdir2

### 15G subdir1/subdir2/subdir3
- [ ] 8G subdir1/subdir2/subdir3/fileA
- [ ] 5G subdir1/subdir2/subdir3/fileB

### 30G subdir1/subdir2/subdir4

## 75G subdir1/subdir5

Full Execution Plan

Select base directory and generate the du script:

cd BaseDir; mkdir Z-GetSizes
ls | awk -F/ '{print "du -h "$1" > Z-GetSizes/"$1".du.txt"}' > Z-GetSizes/duSubDirs.sh

Execute the script, this will be the single most time-consuming action:
```
chmod 755 Z-GetSizes/dusubdirs.sh
./Z-GetSizes/duSubDirs.sh
```

Create Script to convert the du listing to MarkDown format:

cat < duDirs2MD.py
import sys
from pathlib import Path
from collections import defaultdict
import re

def parse_line(line):
    """Parse a du line: size and path"""
    line = line.strip()
    if not line:
        return None, None
    # Split on first whitespace
    match = re.match(r'^(\S+)\s+(.+)$', line)
    if match:
        size = match.group(1)
        path = match.group(2)
        return size, path
    return None, None

def build_tree(lines):
    """Build a tree from path entries with sizes"""
    tree = {}
    path_to_size = {}
    all_paths = []
    
    for line in lines:
        size, path = parse_line(line)
        if not size or not path:
            continue
        components = [c for c in path.split('/') if c]
        if not components:
            continue
        all_paths.append(components)
        path_to_size[tuple(components)] = size
    
    # Build nested dict
    for components in all_paths:
        current = tree
        for i, comp in enumerate(components):
            if comp not in current:
                current[comp] = {
                    'size': path_to_size.get(tuple(components[:i+1]), ''),
                    'children': {},
                    'is_leaf': False
                }
            current = current[comp]['children']
    
    # Mark leaves (nodes with no children)
    def mark_leaves(node):
        if not node['children']:
            node['is_leaf'] = True
        else:
            for child in node['children'].values():
                mark_leaves(child)
    
    for root_node in tree.values():
        mark_leaves(root_node)
    
    return tree, path_to_size

def generate_markdown(tree, output_file):
    """Generate Markdown with proper nesting"""
    md_lines = []
    
    def write_node(node_dict, name, depth, full_path):
        indent = ''
        size = node_dict.get('size', '')
        header = f"{size} {full_path}" if size else full_path
        
        if node_dict.get('is_leaf', False):
            # Use checkbox for leaves (films etc.)
            md_lines.append(f"- [ ] {header}")
        else:
            # Use heading for directories
            hashes = '#' * depth
            md_lines.append(f"{hashes} {header}")
        
        # Sort children for consistent output
        children = sorted(node_dict['children'].items())
        for child_name, child_node in children:
            new_path = f"{full_path}/{child_name}" if full_path else child_name
            write_node(child_node, child_name, depth + 1, new_path)
    
    # Assume single root (the country)
    for root_name, root_node in tree.items():
        write_node(root_node, root_name, 1, root_name)
        break  # only one root expected
    
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write('\n'.join(md_lines))
    
    print(f"Generated: {output_file}")

def main():
    if len(sys.argv) < 2:
        print("Usage: python text2MD.py ")
        sys.exit(1)
    
    input_file = sys.argv[1]
    input_path = Path(input_file)
    
    if not input_path.exists():
        print(f"Error: File {input_file} not found")
        sys.exit(1)
    
    # Output filename: replace .du.txt with .md (or just change extension)
    if input_path.suffix == '.txt' and '.du' in input_path.stem:
        output_name = input_path.stem.replace('.du', '') + '.md'
    else:
        output_name = input_path.stem + '.md'
    
    output_file = input_path.parent / output_name
    
    with open(input_file, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    
    # reversing input file (as gives total dir sizes at end, we want first) 
    tree, _ = build_tree(reversed(lines))  # reverse for correct top-down order
    
    generate_markdown(tree, output_file)

if __name__ == "__main__":
    main()
EOF

Test convert one du text file to MarkDown:

python duDirs2MD.py Z-GetSizes/TestDir.du.txt
=> Testdir.md

Create script to convert all directories:

ls *.du.txt|awk '{print "time python text2md.py "$1}'|sh
or:
ls *.du.txt|awk '{print "time python text2md.py "$1}' > Do.All.SubDirs.sh

Execute script:

chmod 755 Do.All.SubDirs; 
./Do.All.SubDirs

Join all the MD output into 1:

printf '%s\n' *.md | sort | while IFS= read -r file; do
    cat "$file"
    echo ""          # ensures clean separation between countries
done > All.SubDirs.md

Open the final document, All.SubDirs.md, in your favourite Markdown viewer/editor (see options below).

Popular Markdown Editors & Viewers

Obsidian — Excellent for adding content, knowledge base features, and plugins
Zettlr — Great for viewing and academic/long-form work
Typora — Beautiful distraction-free WYSIWYG experience
Visual Studio Code — Powerful, free, with excellent Markdown support
iA Writer — Minimalist, focus-oriented writing app
MarkText — Free, open-source, clean interface
Bear — Beautiful Markdown app (Apple ecosystem)
Logseq — Outliner-style knowledge base
Dillinger — Online Markdown editor
StackEdit — Powerful browser-based Markdown editor

soratoraservices.com