Documentation 2: Electric Boogaloo

18 months ago, I wrote a blog post about my process of documenting my home server's containerized applications.

The very next day, OpenAI announced ChatGPT. I immediately started playing with the new shiny technology that could generate text . My well-intentioned plan to manually document each application according to a formatted template seemed less exciting. Therefore, my plan to have perfect documentation slowed. I focused more on doing more fun things with my server - adding family photo viewing capabilities to my Plex, improving my local DNS situation, and moving some more items from my HDD array to a new SSD.

But my desire for documentation still loomed over me. There had to be an easier solution for this. For a bit, I kept trucking, occasionally writing some docs when I mustered up the motivation. I also began to note some pain points with my plans for my server's docs.

Pain Points

Bookstack is great, for other people

Bookstack is a phenomenal piece of software, and I really enjoyed using it at first. It has plenty of handy features, but it’s very overbuilt for what I need. The workflow also had plenty of physical clicks compared to the Obsidian workflow I had started using for my personal notes. The version control and user authentication features really didn't provide any value to me when I'm the only one updating the app.

Mkdocs is really good

Markdown is an awesome format, and using the Obsidian editor to write documentation (and blog posts) is a great experience. I already used markdown-based documentation for my internal automations at work using Mkdocs, and the workflow of running one command to turn the folder of markdown files into a beautiful documentation website is incredibly smooth.

The Mkdocs Material theme also adds a bunch of handy features and looks incredible.

Writing docs takes time

When I have free time to work on my homelab, do you think I would prefer to implement something new, fix something that is broken, or write documentation for something that may be working just fine? Even with the template, I still had to look things up and reference the official docs for the containers.

ChatGPT API Comes Out

I had been playing with ChatGPT since it came out. I had learned the limitations - the core one being that LLMs have no sense of truth. As a language processor, however, they are great at following templates and outputting something that synthesizes source information into the proper format.

The idea to have my documentation automatically generated by an LLM based upon the files for my server was attractive to me. Not only would I have the chance to build something based on the new shiny thing in tech, but I could solve a practical problem. Hence …

The New Documentation Plan

Parse my core Docker Compose file into snippets

I have all of my containers in one massive docker compose file right now. I keep telling myself I will change that, and I for sure will … later. However, this made the process of splitting the containers for my entire setup into snippets very simple. I wrote some python code to make this happen.

import yaml

def split_compose(compose_file: str) -> list:
    “””
    `split_compose` accepts a docker-compose filepath.
    The compose file is split into individual yaml snippets per container.
    Output is an array of YAML snippet strings.
    “””

    with open(compose_file, ‘r’) as file:
        compose_data = yaml.safe_load(file)

    # Extract services from the loaded compose data
    services = compose_data.get(‘services’, {})

    yaml_snippets = []
    for service_name, service_data in services.items():
        # Construct the yaml for individual service
        snippet = yaml.safe_dump({‘services’: {service_name: service_data}, ‘version’: compose_data[‘version’]})
        yaml_snippets.append(snippet)

    return yaml_snippets

Send the snippet and the template to the LLM API

I used the same template that I had previously written the few docs I’d written before my automation project. Using this a baseline, I created a prompt that had ChatGPT acting as a technical writer.

import openai
import json

def generate_gpt_documentation(content):
    response = openai.ChatCompletion.create(
    model=“gpt-3.5-turbo”,
    messages=[
    {“role”:”system”,”content”:”You are a technical documentation writer. \
    The user provides a docker-compose YAML file and you write documentation about the container using the provided template.”},
    {“role”:”system”,”content”:”””Here is the Markdown formatted template that the documentation should follow:

# [[App Name]]

[[App Basic Description]]

## URL:

[https://URL.example.com](https://URL.example.com)

## Mount Points:

[[[Mount Points here]]]

## Ports:

-[[Ports Here]]

## Complete Docker Compose:

yaml
    sample-app:
        container_name : sample app
        ports:
         - 1201:1201


## App Documentation:

[https://appdocs.github.io](https://appdocs.github.io)

## Credentials:

[[Relevant Credentials]]

## Additional Notes:

    [[Any additional context]]

    “””},
    {“role”: “user”, “content”: f”{content}”}
        ]
    )

    print(json.dumps(response,indent=2))

    documentation = response.get(‘choices’)[0].get(‘message’).get(‘content’)
    print(documentation)
    return documentation

Output the synthesized documentation to a markdown file

I wrote a function that took the content generated earlier and saved it to a markdown file.

def write_markdown_file(container_name:str,content:str) -> None:

    with open(f”docs/{container_name}.md”,’w’) as file:
        file.write(content)

Create functions to orchestrate this

I also wrote a function that accepted the individual YAML snippets and ran the GPT generation function on just that snippet, outputting it to a markdown file.

import yaml

def process_snippets_and_generate_docs(snippets: list) -> None:
    “””
    Accepts a list of YAML snippets.
    Grabs the name of each container from the snippets.
    Runs the `generate_gpt_documentation` function on each snippet.
    “””
    for snippet in snippets:
        parsed_data = yaml.safe_load(snippet)
        service_name = list(parsed_data[‘services’].keys())[0]  # Assuming one service per snippet

        # Call the documentation generation function
        doc = generate_gpt_documentation(snippet)
        write_markdown_file(
            container_name=service_name,
            content = doc
        )

        print(f”Generated documentation for {service_name}.”)

All I needed to get the whole process going was a way to call that function from a split compose file.

def make_docs():
    compose_file_location = input(“Enter the file path of your docker compose yaml file”)
    snippets = split_compose(compose_file_location)
    process_snippets_and_generate_docs(snippets)

From there, I ran the make_docs function and waited a few minutes for ChatGPT to work its magic.

Review and add necessary information and human insights

The nice part of having ChatGPT generate documentation is that the LLM did all the formatting and syntax on my behalf. I had a great start to the documentation. However, just putting that site up without giving it the human element doesn’t provide a whole lot of value. An AI doesn’t understand the why behind the infrastructure. The way I did the documentation automation also didn’t give the full compose file as context - only the one container - so it didn’t get the bigger picture of how the container fit into the home lab.

I certainly needed to review every doc for factual accuracy. LLMs boldly lie to your face. The link to the documentation site was right about 60% of the time; the other 40% it made up something that felt close but didn’t go anywhere.

For containers that are more niche, it took more liberties and guessed what they did. Those took the most time to tweak the output.

Use Mkdocs to turn folder of markdown files into beautiful static site

I created a mkdocs project in the same folder as my documentation_automation.py. Mkdocs is very simple to create the site - just onemkdocs build command and the site is ready to go.

This documentation above was almost entirely generated by AI. I had to add the context that 6123 is the port for the server map powered by dynmap, but GPT 3.5 knew that 25565 was the standard Minecraft server port. It linked the correct documentation to the dockerized Minecraft server container I use. Fun note - the official docs for this container also use Material for Mkdocs and look great!

The Minecraft server documentation was one of the best example of AI generated docs. One of the worst items was a dashboard container I tried out called Flame.

GPT-3.5 didn’t identify the correct container and made the assumption that Flame created CPU flame graphs and needed Docker socket access to get performance information. Neither is accurate - Flame is a homepage generator for your home lab, and it can be set up to dynamically add links to your dashboard based on container labels.

Use Cloudflare Access to make this documentation private

One of things Bookstack handled was authentication. I didn't want my docs to my entirely public so I needed to figure out an alternative. I first tried a home-hosted authentication and identity provider application called Authentik. It worked but it was difficult to configure and manage, so I continued to look for an alternative. I also looked into switching my reverse proxy from Nginx Proxy Manager to traefik but that’s the scope creep monster rearing its ugly head once again. Certainly a project I'm looking forward into doing in the future. However, for the time beingm I wanted a solution that I could quickly deploy for my documentation project but still keep security at the forefront.

Cloudflare Access seemed like a great option. I already used Cloudflare for my DNS provider, so adding their access control was a great fit for me. Setting it up was also very simple.

For work, I have the pleasure of dealing with the Microsoft Office 365 environment. I set up a personal 365 tenant to have a playground to test things for my homelab. I set up Azure AD / Entra ID as my identity provider in Cloudflare Access. Then I set up a rule with Cloudflare to allow only those in a specific AD group into my documentation.

Finally, it all came together. I had complete documentation for all my web apps, available behind authentication on the public internet. My project to document was complete - you never need to update docs, right? ;)

As always, if you have any questions, reach out! I’m always glad to chat about these kinds of things.