<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Prompt-Engineering on Adur</title><link>https://adurrr.github.io/en/tags/prompt-engineering/</link><description>Recent content in Prompt-Engineering on Adur</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sun, 15 Jun 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://adurrr.github.io/en/tags/prompt-engineering/index.xml" rel="self" type="application/rss+xml"/><item><title>LLMOps: integrating LLMs into DevOps workflows</title><link>https://adurrr.github.io/en/p/llmops-integrating-llms-into-devops-workflows/</link><pubDate>Sun, 15 Jun 2025 00:00:00 +0000</pubDate><guid>https://adurrr.github.io/en/p/llmops-integrating-llms-into-devops-workflows/</guid><description>&lt;p&gt;LLMs have moved beyond chatbots. They&amp;rsquo;re now embedded in engineering workflows where they automate tedious tasks, speed incident response, and boost developer productivity. But deploying an LLM into a production DevOps pipeline is fundamentally different from using ChatGPT in a browser.&lt;/p&gt;
&lt;p&gt;This guide covers what LLMOps means in practice, where LLMs fit into DevOps, architecture patterns that work, and pitfalls to avoid.&lt;/p&gt;
&lt;h2 id="what-is-llmops"&gt;What is LLMOps?
&lt;/h2&gt;&lt;p&gt;LLMOps is the practices, tools, and infrastructure needed to operationalize LLMs. It extends MLOps but addresses challenges unique to language models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model selection vs. model training&lt;/strong&gt;: Most teams consume pre-trained models (via APIs or self-hosted inference) rather than training from scratch. The operational focus shifts to prompt engineering, fine-tuning, and retrieval-augmented generation (RAG).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost management&lt;/strong&gt;: LLM inference is expensive. Token-based pricing means costs scale with usage in ways that are harder to predict than traditional compute.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Non-determinism&lt;/strong&gt;: LLMs produce variable outputs for the same input, which complicates testing, validation, and reproducibility.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Latency&lt;/strong&gt;: Response times of seconds (not milliseconds) require different architectural patterns than traditional microservices.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;LLMOps is not a separate discipline. It is an extension of your existing DevOps and MLOps practices, adapted for the specific operational characteristics of language models.&lt;/p&gt;
&lt;h2 id="practical-use-cases-in-devops"&gt;Practical use cases in DevOps
&lt;/h2&gt;&lt;p&gt;Here is where LLMs are delivering real value in DevOps workflows today:&lt;/p&gt;
&lt;h3 id="automated-code-review"&gt;Automated code review
&lt;/h3&gt;&lt;p&gt;LLMs can provide a first-pass review of pull requests, catching common issues like missing error handling, security anti-patterns, inconsistent naming, or missing tests. They do not replace human reviewers but reduce the burden of repetitive feedback.&lt;/p&gt;
&lt;h3 id="incident-summarization"&gt;Incident summarization
&lt;/h3&gt;&lt;p&gt;When an incident fires at 3 AM, the on-call engineer needs context fast. An LLM can ingest alert data, recent deployment logs, related runbooks, and previous incident reports to produce a concise summary of what is likely going wrong and what was done last time.&lt;/p&gt;
&lt;h3 id="log-analysis"&gt;Log analysis
&lt;/h3&gt;&lt;p&gt;LLMs are surprisingly effective at pattern recognition in unstructured log data. Feed them a block of error logs and they can identify the root cause faster than manual grep sessions, especially for unfamiliar systems.&lt;/p&gt;
&lt;h3 id="documentation-generation"&gt;Documentation generation
&lt;/h3&gt;&lt;p&gt;Generating draft documentation from code, API schemas, or Terraform modules. The output needs human review, but it eliminates the blank-page problem and keeps docs closer to current state.&lt;/p&gt;
&lt;h3 id="infrastructure-as-code-generation"&gt;Infrastructure as Code generation
&lt;/h3&gt;&lt;p&gt;Given a natural language description of desired infrastructure, LLMs can generate Terraform, Ansible, or Kubernetes manifests as a starting point. Useful for scaffolding, not for production-ready code without review.&lt;/p&gt;
&lt;h2 id="architecture-patterns-for-llm-integration"&gt;Architecture patterns for LLM integration
&lt;/h2&gt;&lt;h3 id="pattern-1-api-gateway-to-external-llm"&gt;Pattern 1: API gateway to external LLM
&lt;/h3&gt;&lt;p&gt;The simplest approach. Your application calls an external LLM API (OpenAI, Anthropic, etc.) through a centralized gateway that handles authentication, rate limiting, logging, and cost tracking.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;[CI/CD Pipeline] --&amp;gt; [API Gateway] --&amp;gt; [External LLM API]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; [Logging &amp;amp; Metrics]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; [Cost Tracking]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;: No infrastructure to manage, access to the most capable models, fast to implement.
&lt;strong&gt;Cons&lt;/strong&gt;: Data leaves your network, vendor lock-in, variable latency, ongoing API costs.&lt;/p&gt;
&lt;h3 id="pattern-2-self-hosted-inference"&gt;Pattern 2: Self-hosted inference
&lt;/h3&gt;&lt;p&gt;Run open-weight models (Llama, Mistral, etc.) on your own infrastructure using inference servers like vLLM or Ollama.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;[CI/CD Pipeline] --&amp;gt; [Load Balancer] --&amp;gt; [vLLM / Ollama Instance(s)]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; [GPU Node Pool]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;: Data stays internal, predictable costs at scale, no vendor dependency, full control over model versions.
&lt;strong&gt;Cons&lt;/strong&gt;: Requires GPU infrastructure, operational overhead, smaller models may be less capable.&lt;/p&gt;
&lt;h3 id="pattern-3-rag-enhanced-pipeline"&gt;Pattern 3: RAG-enhanced pipeline
&lt;/h3&gt;&lt;p&gt;Combine an LLM with a retrieval system that provides relevant context from your own knowledge base (runbooks, documentation, past incidents). This dramatically improves response quality for domain-specific tasks.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;[Query] --&amp;gt; [Embedding Model] --&amp;gt; [Vector DB Search] --&amp;gt; [Context + Query] --&amp;gt; [LLM] --&amp;gt; [Response]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; [Your Knowledge Base]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; (runbooks, docs, etc.)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This pattern is particularly powerful for incident response and documentation tasks where the LLM needs your organization&amp;rsquo;s specific context.&lt;/p&gt;
&lt;h2 id="key-considerations"&gt;Key considerations
&lt;/h2&gt;&lt;h3 id="cost"&gt;Cost
&lt;/h3&gt;&lt;p&gt;LLM API costs can be surprising. A code review pipeline that processes 50 PRs per day with large diffs can easily run hundreds of dollars per month. Strategies to control costs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set token limits per request&lt;/li&gt;
&lt;li&gt;Cache common queries and responses&lt;/li&gt;
&lt;li&gt;Use smaller models for simpler tasks (triage with a small model, escalate to a larger one)&lt;/li&gt;
&lt;li&gt;Monitor token usage per pipeline and set alerts&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="latency"&gt;Latency
&lt;/h3&gt;&lt;p&gt;LLM responses take seconds, not milliseconds. Design your integrations as asynchronous processes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Post code review comments after the fact, do not block the PR&lt;/li&gt;
&lt;li&gt;Process incident data in the background, push results to a Slack channel&lt;/li&gt;
&lt;li&gt;Use streaming responses where possible to improve perceived performance&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="hallucinations"&gt;Hallucinations
&lt;/h3&gt;&lt;p&gt;LLMs will confidently generate plausible-sounding but incorrect information. This is a critical concern for DevOps tasks where bad advice can cause outages.&lt;/p&gt;
&lt;p&gt;Mitigations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Always present LLM output as suggestions, never as authoritative actions&lt;/li&gt;
&lt;li&gt;Require human approval before any LLM-generated change is applied&lt;/li&gt;
&lt;li&gt;Use RAG to ground responses in verified documentation&lt;/li&gt;
&lt;li&gt;Implement output validation (e.g., lint generated IaC before presenting it)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="security"&gt;Security
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data exposure&lt;/strong&gt;: Anything you send to an external LLM API may be used for training or stored. Never send secrets, credentials, or sensitive customer data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prompt injection&lt;/strong&gt;: Malicious content in code, logs, or user input can manipulate LLM behavior. Sanitize inputs and validate outputs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Supply chain&lt;/strong&gt;: LLM-generated code may introduce vulnerabilities. Run all generated code through your existing security scanning pipeline.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tools-and-platforms"&gt;Tools and platforms
&lt;/h2&gt;&lt;h3 id="langchain"&gt;LangChain
&lt;/h3&gt;&lt;p&gt;A framework for building LLM-powered applications. Useful for orchestrating multi-step chains (e.g., retrieve context, format prompt, call LLM, parse output). Supports many LLM providers and has good tooling for RAG pipelines.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;langchain.chat_models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;langchain.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Review this code diff for security issues and suggest fixes:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{diff}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;gpt-4o&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;diff&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;code_diff&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id="vllm"&gt;vLLM
&lt;/h3&gt;&lt;p&gt;A high-throughput inference engine for self-hosted models. Supports PagedAttention for efficient memory management and continuous batching for high throughput.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Start a vLLM server&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;python -m vllm.entrypoints.openai.api_server &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --model mistralai/Mistral-7B-Instruct-v0.2 &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --port &lt;span class="m"&gt;8000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Exposes an OpenAI-compatible API, so you can swap between self-hosted and external APIs with minimal code changes.&lt;/p&gt;
&lt;h3 id="ollama"&gt;Ollama
&lt;/h3&gt;&lt;p&gt;The easiest way to run LLMs locally for development and testing. Great for prototyping pipelines before committing to infrastructure.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Pull and run a model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ollama pull llama3
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ollama run llama3 &lt;span class="s2"&gt;&amp;#34;Summarize this error log: [paste log]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Serve as an API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ollama serve
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Then call http://localhost:11434/api/generate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id="example-automated-pr-review-pipeline"&gt;Example: Automated PR review pipeline
&lt;/h2&gt;&lt;p&gt;Here is a conceptual pipeline for automated PR review using an LLM:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;span class="lnt"&gt;12
&lt;/span&gt;&lt;span class="lnt"&gt;13
&lt;/span&gt;&lt;span class="lnt"&gt;14
&lt;/span&gt;&lt;span class="lnt"&gt;15
&lt;/span&gt;&lt;span class="lnt"&gt;16
&lt;/span&gt;&lt;span class="lnt"&gt;17
&lt;/span&gt;&lt;span class="lnt"&gt;18
&lt;/span&gt;&lt;span class="lnt"&gt;19
&lt;/span&gt;&lt;span class="lnt"&gt;20
&lt;/span&gt;&lt;span class="lnt"&gt;21
&lt;/span&gt;&lt;span class="lnt"&gt;22
&lt;/span&gt;&lt;span class="lnt"&gt;23
&lt;/span&gt;&lt;span class="lnt"&gt;24
&lt;/span&gt;&lt;span class="lnt"&gt;25
&lt;/span&gt;&lt;span class="lnt"&gt;26
&lt;/span&gt;&lt;span class="lnt"&gt;27
&lt;/span&gt;&lt;span class="lnt"&gt;28
&lt;/span&gt;&lt;span class="lnt"&gt;29
&lt;/span&gt;&lt;span class="lnt"&gt;30
&lt;/span&gt;&lt;span class="lnt"&gt;31
&lt;/span&gt;&lt;span class="lnt"&gt;32
&lt;/span&gt;&lt;span class="lnt"&gt;33
&lt;/span&gt;&lt;span class="lnt"&gt;34
&lt;/span&gt;&lt;span class="lnt"&gt;35
&lt;/span&gt;&lt;span class="lnt"&gt;36
&lt;/span&gt;&lt;span class="lnt"&gt;37
&lt;/span&gt;&lt;span class="lnt"&gt;38
&lt;/span&gt;&lt;span class="lnt"&gt;39
&lt;/span&gt;&lt;span class="lnt"&gt;40
&lt;/span&gt;&lt;span class="lnt"&gt;41
&lt;/span&gt;&lt;span class="lnt"&gt;42
&lt;/span&gt;&lt;span class="lnt"&gt;43
&lt;/span&gt;&lt;span class="lnt"&gt;44
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c"&gt;# .github/workflows/llm-review.yml&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;LLM Code Review&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;on&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;pull_request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;types&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="l"&gt;opened, synchronize]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;llm-review&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;runs-on&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;ubuntu-latest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Checkout&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;uses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;actions/checkout@v4&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;with&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;fetch-depth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Get diff&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;diff&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="sd"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; git diff origin/${{ github.base_ref }}...HEAD &amp;gt; diff.txt&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Run LLM review&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;LLM_API_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;${{ secrets.LLM_API_KEY }}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="sd"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; python scripts/llm_review.py \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; --diff diff.txt \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; --model gpt-4o \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; --max-tokens 2000 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; --output review.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Post review comments&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;uses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;actions/github-script@v7&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;with&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;script&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="sd"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; const review = require(&amp;#39;./review.json&amp;#39;);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; await github.rest.pulls.createReview({
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; owner: context.repo.owner,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; repo: context.repo.repo,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; pull_number: context.issue.number,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; body: review.summary,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; event: &amp;#39;COMMENT&amp;#39;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; comments: review.line_comments
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="sd"&gt; });&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The review script would:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Read the diff&lt;/li&gt;
&lt;li&gt;Split large diffs into chunks that fit within the model&amp;rsquo;s context window&lt;/li&gt;
&lt;li&gt;For each chunk, construct a prompt asking for security issues, bugs, and style problems&lt;/li&gt;
&lt;li&gt;Aggregate results and format as GitHub review comments&lt;/li&gt;
&lt;li&gt;Include confidence scores and always mark output as AI-generated&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="guardrails-and-responsible-use"&gt;Guardrails and responsible use
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Label all LLM output clearly&lt;/strong&gt; as AI-generated. Engineers should know when they are reading machine output.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Never auto-merge or auto-apply&lt;/strong&gt; LLM suggestions. Keep a human in the loop for all changes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Log all prompts and responses&lt;/strong&gt; for debugging and audit purposes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Set spending limits&lt;/strong&gt; and alerts on LLM API usage.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Review prompt templates regularly&lt;/strong&gt; to ensure they do not leak sensitive information.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test for bias and errors&lt;/strong&gt; with representative samples before deploying to production workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="getting-started-recommendations"&gt;Getting started recommendations
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Pick one use case&lt;/strong&gt; - Don&amp;rsquo;t try to LLM-enable everything at once. Start low-risk: documentation drafts, commit message suggestions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Start with an external API&lt;/strong&gt; - Don&amp;rsquo;t invest in GPU infrastructure until you&amp;rsquo;ve validated the use case. Use OpenAI or Anthropic to prototype.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Measure everything&lt;/strong&gt; - Track cost per invocation, latency, user satisfaction, error rates from day one.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Build an evaluation framework&lt;/strong&gt; - Create a test suite of known-good inputs and expected outputs. Run it against every prompt change or model update.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Plan your data strategy&lt;/strong&gt; - Decide early what data you&amp;rsquo;ll and won&amp;rsquo;t send to external APIs. Document clearly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Iterate on prompts&lt;/strong&gt; - Prompt engineering is iterative. Version control prompts, treat as code.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;LLMs are a powerful tool for DevOps automation, but they&amp;rsquo;re exactly that: a tool. They work best when thoughtfully integrated into existing workflows, with clear boundaries on what they can and cannot do autonomously.&lt;/p&gt;</description></item></channel></rss>