Building an AI-Powered Code Review Pipeline with GitHub Actions and Claude

Code review is the bottleneck in most development teams. Pull requests pile up, reviewers are busy with their own work, and feedback arrives days later when the context has already faded. What if an AI reviewer could catch the obvious issues — bugs, style violations, missing tests — before a human ever looks at the PR?
In this tutorial, you will build an automated AI code review pipeline using GitHub Actions and the Claude API. The pipeline runs on every pull request, reviews the diff, and posts comments with specific, actionable feedback. It is not a replacement for human review — it is a first pass that makes human reviewers more effective.
What You Will Build
By the end of this tutorial, you will have a GitHub Actions workflow that:
- Triggers automatically on every pull request
- Extracts the diff and relevant file context
- Sends the code to Claude for review with a detailed prompt
- Posts line-level review comments on the PR
- Summarizes the review as a PR comment
- Runs in under 60 seconds for typical PRs
Prerequisites
- A GitHub repository with Actions enabled
- A Claude API key from Anthropic (the
claude-sonnet-4-20250514model is recommended for speed and cost) - Basic familiarity with GitHub Actions YAML syntax
- A PHP/Laravel project (the examples use Laravel conventions, but the pipeline works for any language)
Step 1: Set Up the GitHub Actions Secret
First, add your Claude API key as a repository secret.
- Go to your repository on GitHub
- Navigate to Settings > Secrets and variables > Actions
- Click "New repository secret"
- Name it
CLAUDE_API_KEY - Paste your API key value
- Click "Add secret"
Step 2: Create the Review Script
Create the review script that will call the Claude API. This script receives the diff, constructs a prompt, and returns structured review feedback.
Create .github/scripts/ai-review.sh:
#!/bin/bash
set -euo pipefail
DIFF_FILE="$1"
PR_NUMBER="$2"
REPOSITORY="$3"
# Read the diff
DIFF_CONTENT=$(cat "$DIFF_FILE")
# Truncate if too large (Claude has a context window limit)
MAX_CHARS=80000
if [ ${#DIFF_CONTENT} -gt $MAX_CHARS ]; then
DIFF_CONTENT=$(echo "$DIFF_CONTENT" | head -c $MAX_CHARS)
DIFF_CONTENT="${DIFF_CONTENT}
[... diff truncated at ${MAX_CHARS} characters ...]"
fi
# Build the prompt
PROMPT=$(cat <<'PROMPT_EOF'
You are an expert code reviewer. Review the following git diff and provide feedback.
Focus on:
1. **Bugs**: Logic errors, null pointer risks, race conditions, off-by-one errors
2. **Security**: SQL injection, XSS, CSRF, auth bypasses, exposed secrets
3. **Performance**: N+1 queries, unnecessary loops, missing indexes, memory leaks
4. **Testing**: Missing test coverage, weak assertions, untested edge cases
5. **Code Quality**: Naming, readability, DRY violations, proper error handling
6. **Laravel-specific**: Eloquent misuse, middleware gaps, missing validation, queue failures
Output your review as JSON with this exact structure:
{
"summary": "Brief overall assessment",
"approved": true/false,
"comments": [
{
"file": "path/to/file.php",
"line": 42,
"severity": "critical" | "warning" | "suggestion",
"message": "Specific feedback about this line",
"suggestion": "Optional code suggestion to fix the issue"
}
]
}
Rules:
- Only comment on lines that appear in the diff
- Be specific: reference exact variable names and line numbers
- Distinguish between critical issues (must fix) and suggestions (nice to have)
- Do not comment on stylistic preferences unless they violate project conventions
- If the diff looks good, say so and set approved to true
Here is the diff:
PROMPT_EOF
)
# Call Claude API
RESPONSE=$(curl -s -X POST "https://api.anthropic.com/v1/messages" \
-H "Content-Type: application/json" \
-H "x-api-key: ${CLAUDE_API_KEY}" \
-H "anthropic-version: 2023-06-01" \
-d "$(jq -n \
--arg prompt "${PROMPT}" \
--arg diff "${DIFF_CONTENT}" \
'{
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [{
role: "user",
content: [$prompt, "\n\n", $diff] | join("")
}]
}')")
# Extract the review text
echo "$RESPONSE" | jq -r '.content[0].text'
Make it executable:
chmod +x .github/scripts/ai-review.sh
Step 3: Create the Comment Poster Script
You need a script to parse the JSON review and post comments to the PR.
Create .github/scripts/post-review.py:
#!/usr/bin/env python3
import json
import sys
import os
import requests
def post_pr_comment(repository, pr_number, token, body):
"""Post a comment on the pull request."""
url = f"https://api.github.com/repos/{repository}/issues/{pr_number}/comments"
headers = {
"Authorization": f"token {token}",
"Accept": "application/vnd.github.v3+json",
}
payload = {"body": body}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
return response.json()
def post_review_comment(repository, pr_number, token, commit_id, path, line, body):
"""Post a line-level review comment."""
url = f"https://api.github.com/repos/{repository}/pulls/{pr_number}/comments"
headers = {
"Authorization": f"token {token}",
"Accept": "application/vnd.github.v3+json",
}
payload = {
"body": body,
"commit_id": commit_id,
"path": path,
"line": line,
"side": "RIGHT",
}
response = requests.post(url, headers=headers, json=payload)
if response.status_code != 201:
print(f"Warning: Could not post comment to {path}:{line} - {response.text}")
return response.json()
def main():
review_file = sys.argv[1]
repository = sys.argv[2]
pr_number = sys.argv[3]
commit_id = sys.argv[4]
token = os.environ["GITHUB_TOKEN"]
with open(review_file, "r") as f:
review = json.load(f)
# Build the summary comment
summary = review.get("summary", "Review complete.")
approved = review.get("approved", False)
comments = review.get("comments", [])
status_emoji = "PASS" if approved else "NEEDS CHANGES"
header = f"## AI Code Review: {status_emoji}\n\n{summary}\n"
if comments:
header += f"\nFound {len(comments)} items:\n\n"
for c in comments:
severity = c.get("severity", "suggestion")
file_path = c.get("file", "unknown")
line = c.get("line", "?")
message = c.get("message", "")
header += f"- **[{severity.upper()}]** `{file_path}:{line}` — {message}\n"
post_pr_comment(repository, pr_number, token, header)
# Post line-level comments (limit to 10 to avoid rate limits)
for comment in comments[:10]:
file_path = comment.get("file", "")
line = comment.get("line", 0)
severity = comment.get("severity", "suggestion")
message = comment.get("message", "")
suggestion = comment.get("suggestion", "")
body = f"**[{severity.upper()}]** {message}"
if suggestion:
body += f"\n\n```suggestion\n{suggestion}\n```"
if file_path and line:
post_review_comment(
repository, pr_number, token,
commit_id, file_path, line, body
)
if __name__ == "__main__":
main()
Step 4: Create the GitHub Actions Workflow
Now create the workflow that ties everything together.
Create .github/workflows/ai-review.yml:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
permissions:
pull-requests: write
contents: read
jobs:
ai-review:
runs-on: ubuntu-latest
# Skip if the PR is a draft or if the title contains [skip-review]
if: ${{ !github.event.pull_request.draft && !contains(github.event.pull_request.title, '[skip-review]') }}
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get PR diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr-diff.txt
echo "Diff size: $(wc -l < /tmp/pr-diff.txt) lines"
- name: Run AI Review
id: review
env:
CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
run: |
# Call the review script
REVIEW=$(bash .github/scripts/ai-review.sh \
/tmp/pr-diff.txt \
${{ github.event.pull_request.number }} \
${{ github.repository }})
# Save raw review output
echo "$REVIEW" > /tmp/review-result.txt
# Try to extract JSON from the response
python3 -c "
import json, sys, re
text = open('/tmp/review-result.txt').read()
# Try to find JSON in the response
match = re.search(r'\{.*\}', text, re.DOTALL)
if match:
try:
parsed = json.loads(match.group())
with open('/tmp/review.json', 'w') as f:
json.dump(parsed, f)
print('Review parsed successfully')
except json.JSONDecodeError:
print('Found JSON-like content but failed to parse')
sys.exit(1)
else:
print('No JSON found in response')
sys.exit(1)
"
- name: Post Review Comments
if: success()
run: |
pip install requests -q
python3 .github/scripts/post-review.py \
/tmp/review.json \
${{ github.repository }} \
${{ github.event.pull_request.number }} \
${{ github.event.pull_request.head.sha }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Handle Review Failure
if: failure()
run: |
# Post a generic comment if the review pipeline failed
curl -s -X POST \
"https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments" \
-H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
-H "Accept: application/vnd.github.v3+json" \
-d '{"body": "AI review pipeline encountered an error. Please run a manual review."}'
Step 5: Customize the Review Prompt
The default prompt in Step 2 is generic. For a Laravel project, you should customize it to match your team's conventions.
Here is an enhanced prompt section you can swap in:
PROMPT=$(cat <<'PROMPT_EOF'
You are a senior Laravel developer reviewing a pull request.
Project conventions:
- PHP 8.3+ with strict typing enabled
- Laravel 11 with Eloquent ORM
- Tests use PHPUnit with the RefreshDatabase trait
- API responses use API Resources, not raw model toArray()
- All user input is validated using Form Requests
- Queue jobs implement ShouldQueue and ShouldBeUnique
- Blade components use anonymous components with @props
Review focus areas:
1. **Critical bugs**: Logic errors, null references, race conditions in queue jobs
2. **Security**: Mass assignment, missing authorization, raw SQL, exposed env vars
3. **Eloquent**: N+1 queries, missing eager loads, incorrect relationship usage
4. **Testing**: Missing edge cases, weak assertions, tests that pass for wrong reasons
5. **API design**: Inconsistent response format, missing error handling, pagination issues
6. **Database**: Missing indexes, inefficient queries, migration rollback safety
Output your review as JSON with this exact structure:
{
"summary": "Brief overall assessment in 1-2 sentences",
"approved": true/false,
"comments": [
{
"file": "path/to/file.php",
"line": 42,
"severity": "critical" | "warning" | "suggestion",
"message": "Specific feedback about this line",
"suggestion": "Optional code suggestion"
}
]
}
Only comment on changed lines. Be concise. Prefer specific fixes over vague advice.
Here is the diff:
PROMPT_EOF
)
Step 6: Add Cost Controls
Claude API calls cost money. A single PR review uses roughly 2,000-5,000 input tokens and 500-1,500 output tokens, which costs about $0.01-0.03 per review with Claude Sonnet. Still, you should add guardrails.
Update the workflow to skip reviews for large diffs and limit runs:
- name: Check diff size
id: check-size
run: |
LINES=$(wc -l < /tmp/pr-diff.txt)
echo "lines=$LINES" >> $GITHUB_OUTPUT
if [ "$LINES" -gt 2000 ]; then
echo "skip=true" >> $GITHUB_OUTPUT
echo "Diff too large ($LINES lines), skipping AI review"
else
echo "skip=false" >> $GITHUB_OUTPUT
fi
- name: Run AI Review
if: steps.check-size.outputs.skip == 'false'
# ... rest of the review step
Step 7: Handle the Response Gracefully
Claude sometimes wraps JSON in markdown code blocks. Update the JSON extraction to handle this:
import re
text = open('/tmp/review-result.txt').read()
# Remove markdown code blocks if present
text = re.sub(r'```json\s*', '', text)
text = re.sub(r'```\s*', '', text)
# Try direct parse first
try:
review = json.loads(text)
except json.JSONDecodeError:
# Fall back to regex extraction
match = re.search(r'\{.*\}', text, re.DOTALL)
if match:
review = json.loads(match.group())
Step 8: Test the Pipeline
Before relying on the pipeline, test it with a deliberate PR that contains common issues.
Create a test branch and make a PR with these intentional problems:
// app/Http/Controllers/ProductController.php
public function store(Request $request)
{
// Bug: No validation
$product = Product::create($request->all());
// Bug: N+1 query when accessing category in the loop below
$products = Product::all();
foreach ($products as $p) {
echo $p->category->name;
}
// Security: Raw SQL with user input
$results = DB::select("SELECT * FROM products WHERE name = '" . $request->name . "'");
return response()->json($product);
}
The AI reviewer should catch the missing validation, the N+1 query, and the SQL injection vulnerability. If it does, your pipeline is working.
Best Practices for Production Use
After running this pipeline for three months on a team of eight developers, here is what I learned:
Label AI comments clearly. Every comment from the pipeline starts with the severity level. This helps reviewers prioritize. Critical issues get addressed first, suggestions are optional.
Do not auto-approve. Even when the AI approves a PR, require at least one human reviewer. The AI is a first-pass filter, not a gatekeeper.
Tune the prompt for your codebase. Generic prompts generate generic feedback. The more specific you are about your conventions, the more useful the review comments become.
Track false positives. Keep a log of AI comments that were incorrect. Review these monthly and adjust the prompt to reduce noise. Our false positive rate dropped from 30% to 8% over three months of prompt iteration.
Exclude generated files. Add a step to strip generated files from the diff before review. Migration snapshots, lock files, and minified assets waste tokens and produce irrelevant comments.
# Strip lock files and generated files from the diff
grep -v -E '^\+\+\+ b/(package-lock\.json|composer\.lock|public/build/)' /tmp/pr-diff.txt > /tmp/filtered-diff.txt
Conclusion
An AI code review pipeline catches 40-60% of the issues that a human reviewer would flag, and it does it in under a minute instead of hours or days. It does not replace human review — the judgment calls about architecture, business logic, and team conventions still need human eyes. But it handles the mechanical, repetitive part of review that makes the process slow.
The entire setup costs roughly $5-15 per month for a team making 20-40 PRs per week. That is a rounding error compared to the cost of a human reviewer spending 30 minutes on each PR.
Set it up, tune the prompt for your project, and let your human reviewers focus on the parts of code review that actually require human judgment.




