Convert Disk Usage to Hierarchical Markdown

Objective: to create a well-organized, nested Markdown report showing disk usage of directories and files.

Table of Contents

  1. Workflow
  2. Markdown Structure
  3. Generic Example
  4. Full Execution Plan
  5. Popular Markdown Editors & Viewers

Workflow

  1. Data Collection
    From your chosen base directory, run du -sh recursively to generate disk usage reports, saving each top-level folder’s output as a .du.txt file.
  2. Conversion
    The Python script duDirs2MD.py reads each .du.txt file and converts it into a clean, structured .md file with proper nesting.
  3. Final Output
    All .md files are merged in alphabetical order into one master Markdown document.

Markdown Structure

  • # → Top-level folder (highest level)
  • ## → Level 2 subdirectories
  • ### → Level 3 subdirectories
  • #### → Level 4 and deeper
  • - [ ] → Leaf items (files or deepest folders)

Generic Example

Sample input (from du):

120G subdir1/
45G subdir1/subdir2/
15G subdir1/subdir2/subdir3/
8G subdir1/subdir2/subdir3/fileA
5G subdir1/subdir2/subdir3/fileB
30G subdir1/subdir2/subdir4/
75G subdir1/subdir5/

Output Markdown:

# 120G subdir1

## 45G subdir1/subdir2

### 15G subdir1/subdir2/subdir3
- [ ] 8G subdir1/subdir2/subdir3/fileA
- [ ] 5G subdir1/subdir2/subdir3/fileB

### 30G subdir1/subdir2/subdir4

## 75G subdir1/subdir5

Full Execution Plan

  1. Select base directory and generate the du script:
    cd BaseDir; mkdir Z-GetSizes
    ls | awk -F/ '{print "du -h "$1" > Z-GetSizes/"$1".du.txt"}' > Z-GetSizes/duSubDirs.sh
  2. Execute the script, this will be the single most time-consuming action:
    chmod 755 Z-GetSizes/dusubdirs.sh
    ./Z-GetSizes/duSubDirs.sh
  3. Create Script to convert the du listing to MarkDown format:
    cat < duDirs2MD.py
    import sys
    from pathlib import Path
    from collections import defaultdict
    import re
    
    def parse_line(line):
        """Parse a du line: size and path"""
        line = line.strip()
        if not line:
            return None, None
        # Split on first whitespace
        match = re.match(r'^(\S+)\s+(.+)$', line)
        if match:
            size = match.group(1)
            path = match.group(2)
            return size, path
        return None, None
    
    def build_tree(lines):
        """Build a tree from path entries with sizes"""
        tree = {}
        path_to_size = {}
        all_paths = []
        
        for line in lines:
            size, path = parse_line(line)
            if not size or not path:
                continue
            components = [c for c in path.split('/') if c]
            if not components:
                continue
            all_paths.append(components)
            path_to_size[tuple(components)] = size
        
        # Build nested dict
        for components in all_paths:
            current = tree
            for i, comp in enumerate(components):
                if comp not in current:
                    current[comp] = {
                        'size': path_to_size.get(tuple(components[:i+1]), ''),
                        'children': {},
                        'is_leaf': False
                    }
                current = current[comp]['children']
        
        # Mark leaves (nodes with no children)
        def mark_leaves(node):
            if not node['children']:
                node['is_leaf'] = True
            else:
                for child in node['children'].values():
                    mark_leaves(child)
        
        for root_node in tree.values():
            mark_leaves(root_node)
        
        return tree, path_to_size
    
    def generate_markdown(tree, output_file):
        """Generate Markdown with proper nesting"""
        md_lines = []
        
        def write_node(node_dict, name, depth, full_path):
            indent = ''
            size = node_dict.get('size', '')
            header = f"{size} {full_path}" if size else full_path
            
            if node_dict.get('is_leaf', False):
                # Use checkbox for leaves (films etc.)
                md_lines.append(f"- [ ] {header}")
            else:
                # Use heading for directories
                hashes = '#' * depth
                md_lines.append(f"{hashes} {header}")
            
            # Sort children for consistent output
            children = sorted(node_dict['children'].items())
            for child_name, child_node in children:
                new_path = f"{full_path}/{child_name}" if full_path else child_name
                write_node(child_node, child_name, depth + 1, new_path)
        
        # Assume single root (the country)
        for root_name, root_node in tree.items():
            write_node(root_node, root_name, 1, root_name)
            break  # only one root expected
        
        with open(output_file, 'w', encoding='utf-8') as f:
            f.write('\n'.join(md_lines))
        
        print(f"Generated: {output_file}")
    
    def main():
        if len(sys.argv) < 2:
            print("Usage: python text2MD.py ")
            sys.exit(1)
        
        input_file = sys.argv[1]
        input_path = Path(input_file)
        
        if not input_path.exists():
            print(f"Error: File {input_file} not found")
            sys.exit(1)
        
        # Output filename: replace .du.txt with .md (or just change extension)
        if input_path.suffix == '.txt' and '.du' in input_path.stem:
            output_name = input_path.stem.replace('.du', '') + '.md'
        else:
            output_name = input_path.stem + '.md'
        
        output_file = input_path.parent / output_name
        
        with open(input_file, 'r', encoding='utf-8') as f:
            lines = f.readlines()
        
        # reversing input file (as gives total dir sizes at end, we want first) 
        tree, _ = build_tree(reversed(lines))  # reverse for correct top-down order
        
        generate_markdown(tree, output_file)
    
    if __name__ == "__main__":
        main()
    EOF
    
  4. Test convert one du text file to MarkDown:
    python duDirs2MD.py Z-GetSizes/TestDir.du.txt
    => Testdir.md
  5. Create script to convert all directories:
    ls *.du.txt|awk '{print "time python text2md.py "$1}'|sh
    or:
    ls *.du.txt|awk '{print "time python text2md.py "$1}' > Do.All.SubDirs.sh
  6. Execute script:
    chmod 755 Do.All.SubDirs; 
    ./Do.All.SubDirs
  7. Join all the MD output into 1:
    printf '%s\n' *.md | sort | while IFS= read -r file; do
        cat "$file"
        echo ""          # ensures clean separation between countries
    done > All.SubDirs.md
  8. Open the final document, All.SubDirs.md, in your favourite Markdown viewer/editor (see options below).

Popular Markdown Editors & Viewers

  • Obsidian — Excellent for adding content, knowledge base features, and plugins
  • Zettlr — Great for viewing and academic/long-form work
  • Typora — Beautiful distraction-free WYSIWYG experience
  • Visual Studio Code — Powerful, free, with excellent Markdown support
  • iA Writer — Minimalist, focus-oriented writing app
  • MarkText — Free, open-source, clean interface
  • Bear — Beautiful Markdown app (Apple ecosystem)
  • Logseq — Outliner-style knowledge base
  • Dillinger — Online Markdown editor
  • StackEdit — Powerful browser-based Markdown editor