Code reviews are among the highest value, yet most time consuming aspects of the modern development process. While it cannot, and should not, replace human developers in the code review chain, AI can prove a valuable resource to handle much of the mechanical work of the flow – styles, naming, bugs, null checks, and so forth – so that human reviewers can focus on architecture, intent, correctness, and nuance. When applied prudently, AI can be a critical augmentation to humans in the process.
What AI Is Good At (And Isn’t)
Let’s start by reviewing the things that AI does well and how that can be a part of the code review process. First, AI excels at catching the rule-based and pattern-based issues: incorrect naming, missing comments and documentation, async pitfalls, unused variables, circular references, null-related issues, obvious bugs and security issues, and so forth. Anything for which there are clear, defined rules in place: that’s what AI does well. And sometimes we, as human developers, don’t do those things so well. We should take advantage of the capabilities of AI to handle those portions of the code review process.
But there are things that AI doesn’t do well. For instance, evaluating the correctness of business rules code logic. Does the code match to the description provided in the work ticket? Does it provide the expected output for each set of inputs?
AI also struggles with architectural related decisions. Does the new or changed code fit properly into the overall scheme of the broader system. AI has to work within a highly limited context. As such, it struggles with overarching reviews and anything requiring an understanding of the entire codebase context at once. It’s why AI code assistants do great with small, light proof of concept type tasks, but consistently struggle with enterprise level application work.
Tooling Options
For developer teams there are a wide array of tools available to assist with code reviews. Anyone who uses GitHub will by now have seen the option of having Copilot do code reviews in the repository.

Click the little “Request” button, and typically within a few minutes you’ll get a code review performed by Copilot, complete with comments and suggestions that you can accept or reject just like any code review in GitHub.

If you use Azure DevOps instead of GitHub, there isn’t anything “out of the box” to do the same thing, but there are a number of custom extensions and workflows you can implement to use a tool like OpenAI or Claude to automate the code review process in a similar fashion. A quick search of the Azure DevOps Marketplace will yield a plethora of plugins to facilitate that process.
While those two options are useful, a more interesting exercise might be to delve into a third option: Rolling your own PR review bot.
Building a Lightweight PR Review Bot
In our previous post about Testing AI-Powered Features in .NET, we briefly touched on creating a chat service to connect to a LLM to send prompts and get answers. We’re going to enhance that a bit now to build a lightweight tool that will automatically review our GitHub pull requests.
We’ll start with a service to connect to the GitHub API to get the file diffs for the PR.
public class GitHubPrClient(HttpClient http)
{
// Returns the raw unified diff for a pull request
public async Task<string> GetPullRequestDiffAsync(
string owner, string repo, int prNumber)
{
// GitHub returns a diff when you request this Accept header
http.DefaultRequestHeaders.Add("Accept", "application/vnd.github.v3.diff");
var response = await http.GetAsync(
$"https://api.github.com/repos/{owner}/{repo}/pulls/{prNumber}");
response.EnsureSuccessStatusCode();
return await response.Content.ReadAsStringAsync();
}
// Posts a top-level review comment on the PR
public async Task PostReviewCommentAsync(
string owner, string repo, int prNumber, string body)
{
var payload = JsonSerializer.Serialize(new
{
body,
event = "COMMENT" // COMMENT = non-blocking; APPROVE / REQUEST_CHANGES block the PR
});
var response = await http.PostAsync(
$"https://api.github.com/repos/{owner}/{repo}/pulls/{prNumber}/reviews",
new StringContent(payload, Encoding.UTF8, "application/json"));
response.EnsureSuccessStatusCode();
}
}
Let’s register that service in our Program.cs file:
builder.Services.AddHttpClient<GitHubPrClient>(client =>
{
client.DefaultRequestHeaders.Add("User-Agent", "NimblePros-PR-Bot");
client.DefaultRequestHeaders.Authorization =
new AuthenticationHeaderValue("Bearer",
builder.Configuration["GitHub:Token"]);
});
Consult the GitHub API documents on the process for getting a token. You’ll need a token with the pull_requests: read/write permissions. Either an app token or a fine-grained personal access token will do.
Now, let’s add a review service. The key here is to keep the prompt related details separate from the GitHub related code so they’re independent one from another. This also makes it easier to iterate on the prompt until you get one that gives you good, reliable results.
public class PullRequestReviewService(
GitHubPrClient github,
IChatService chat,
ILogger<PullRequestReviewService> logger)
{
public async Task ReviewPullRequestAsync(
string owner, string repo, int prNumber)
{
logger.LogInformation("Fetching diff for PR #{PrNumber}", prNumber);
var diff = await github.GetPullRequestDiffAsync(owner, repo, prNumber);
if (string.IsNullOrWhiteSpace(diff))
{
logger.LogWarning("PR #{PrNumber} has no diff — skipping", prNumber);
return;
}
// Truncate very large diffs to avoid token limit issues
const int maxDiffLength = 12_000;
if (diff.Length > maxDiffLength)
{
diff = diff[..maxDiffLength] + "\n\n[diff truncated — only first 12,000 characters reviewed]";
logger.LogWarning("Diff truncated for PR #{PrNumber}", prNumber);
}
var prompt = BuildReviewPrompt(diff);
var review = await chat.GetCompletionAsync(prompt);
var comment = $"## AI Code Review\n\n> ⚠️ This is an automated review. " +
$"Human review is still required before merging.\n\n{review}";
await github.PostReviewCommentAsync(owner, repo, prNumber, comment);
logger.LogInformation("Review posted for PR #{PrNumber}", prNumber);
}
private static string BuildReviewPrompt(string diff) => $"""
You are a .NET code reviewer. Review the following pull request diff and provide
concise, actionable feedback.
Focus on:
- Correctness: null reference risks, unhandled exceptions, async/await misuse
- Security: input validation, sensitive data exposure, injection risks
- .NET conventions: naming, SOLID principles, proper use of DI
- Performance: unnecessary allocations, N+1 patterns, missing cancellation tokens
Do NOT comment on:
- Code you cannot see (missing context from other files)
- Stylistic preferences with no correctness impact
- Things that are already handled by Roslyn analyzers or StyleCop
Format your response as a markdown list grouped by category.
If you find no issues in a category, omit that category entirely.
If the diff looks good overall, say so briefly.
DIFF:
{diff}
""";
}
Next, we need to wire up an endpoint and tie it in to our GitHub repo’s PR webhook so that it kicks off a review whenever a PR gets created. Here’s an example using a minimal API endpoint.
app.MapPost("/webhook/github", async (
HttpRequest request,
PullRequestReviewService reviewer,
IConfiguration config) =>
{
// Validate the webhook signature (never skip this in production)
if (!await IsValidGitHubSignature(request, config["GitHub:WebhookSecret"]!))
return Results.Unauthorized();
using var reader = new StreamReader(request.Body);
var body = await reader.ReadToEndAsync();
var payload = JsonDocument.Parse(body).RootElement;
var action = payload.GetProperty("action").GetString();
if (action is not ("opened" or "synchronize"))
return Results.Ok("Ignored"); // Only review on new/updated PRs
var owner = payload
.GetProperty("repository").GetProperty("owner").GetProperty("login").GetString()!;
var repo = payload
.GetProperty("repository").GetProperty("name").GetString()!;
var prNumber = payload
.GetProperty("number").GetInt32();
// Fire and forget — webhook must return quickly or GitHub retries
_ = reviewer.ReviewPullRequestAsync(owner, repo, prNumber);
return Results.Accepted();
});
static async Task<bool> IsValidGitHubSignature(HttpRequest request, string secret)
{
request.Headers.TryGetValue("X-Hub-Signature-256", out var signature);
if (string.IsNullOrEmpty(signature)) return false;
request.EnableBuffering();
var body = await new StreamReader(request.Body).ReadToEndAsync();
request.Body.Position = 0;
var key = Encoding.UTF8.GetBytes(secret);
var hash = HMACSHA256.HashData(key, Encoding.UTF8.GetBytes(body));
var expected = "sha256=" + Convert.ToHexString(hash).ToLowerInvariant();
return CryptographicOperations.FixedTimeEquals(
Encoding.UTF8.GetBytes(expected),
Encoding.UTF8.GetBytes(signature!));
}
Lastly, let’s add a couple of unit tests to test our code.
public class PullRequestReviewServiceTests
{
[Fact]
public async Task ReviewPullRequest_PostsFormattedComment()
{
var github = Substitute.For<GitHubPrClient>();
var chat = Substitute.For<IChatService>();
github.GetPullRequestDiffAsync("org", "repo", 42)
.Returns("+ var x = null;\n+ x.ToString();");
chat.GetCompletionAsync(Arg.Any<string>())
.Returns("**Correctness**\n- Potential null reference on line 2");
var sut = new PullRequestReviewService(github, chat, NullLogger<PullRequestReviewService>.Instance);
await sut.ReviewPullRequestAsync("org", "repo", 42);
await github.Received(1).PostReviewCommentAsync(
"org", "repo", 42,
Arg.Is<string>(s =>
s.Contains("AI Code Review") &&
s.Contains("automated review") &&
s.Contains("Correctness")));
}
[Fact]
public async Task ReviewPullRequest_TruncatesLargeDiffs()
{
var github = Substitute.For<GitHubPrClient>();
var chat = Substitute.For<IChatService>();
github.GetPullRequestDiffAsync(Arg.Any<string>(), Arg.Any<string>(), Arg.Any<int>())
.Returns(new string('+', 20_000)); // Massive diff
chat.GetCompletionAsync(Arg.Any<string>()).Returns("Looks fine.");
var sut = new PullRequestReviewService(github, chat, NullLogger<PullRequestReviewService>.Instance);
await sut.ReviewPullRequestAsync("org", "repo", 1);
await chat.Received(1).GetCompletionAsync(
Arg.Is<string>(s => s.Contains("diff truncated")));
}
}
Once you’ve deployed the service and configured GitHub to call the webhook on a PR creation, the service will perform a code review and add any results it has as comments on the PR. By no means should it be your only code reviewer, but it’s a good first pass. You should always have human developers in the loop for the code review process.
Prompt Engineering For a Code Review
A generic, naive “review this code” prompt will produce generic, nearly useless feedback. Spend some time experimenting with the prompt. The example prompt in our sample code above could be a good starting point, but you should tailor the prompt to the needs of your organization and project. What works for a .NET web API project won’t be entirely correct for a .NET Blazor application, and will probably be entirely incorrect for a React SPA app. A good next step could be to add some customization to be able to use different prompts for different projects.
An example of a weak prompt:
Review this code and tell me if there are any problems.
And an example of a strong prompt:
You are reviewing a C# pull request for an ASP.NET Core 8 web API.
The codebase uses:
- Minimal APIs with vertical slice architecture
- EF Core 8 with SQL Server
- MediatR for command/query handling
- xUnit + NSubstitute for testing
Review the diff below. Focus only on:
1. Correctness — null risks, unhandled exceptions, broken async patterns
2. Security — input that reaches the database or HTTP response without validation
3. Missing test coverage — new public methods or branches with no corresponding test
Do NOT flag:
- Formatting or whitespace
- Naming style (we use a StyleCop ruleset for that)
- Anything you can't see in the diff (no guessing about other files)
Return a markdown list grouped by the three categories above.
If a category has no issues, omit it.
If the diff looks good overall, say so in one sentence.
Don’t be afraid to be clear and detailed in what you want the LLM to review and flag. You might even want to set up different prompts to review different specific aspects of the code instead of having one overall review prompt. For example, if you wanted a “reviewer” to focus on async/await related issues:
You are a senior .NET developer reviewing a pull request specifically for async/await correctness.
Flag any of the following in the diff:
- async void methods (except event handlers — those are acceptable)
- Calls to .Result or .GetAwaiter().GetResult() that could deadlock
- Missing CancellationToken parameters on async methods that accept I/O
- Tasks that are created but not awaited
- ConfigureAwait(false) missing in library code (not application code)
For each issue found:
- Quote the relevant line(s) from the diff
- Explain the specific risk
- Suggest the corrected code
If none of the above are present, respond with: "No async issues found."
Another excellent way to organize things is to use a model that supports the inclusion of a system prompt separate from the PR-related prompt. You could then update your prompt code fixture as follows:
private const string SystemPrompt = """
You are a .NET code reviewer working on a team that values:
- Explicit over implicit (prefer clarity over cleverness)
- Failing fast (validate inputs early, throw meaningful exceptions)
- Testability (code should be injectable and mockable)
Your tone is direct and collegial — like a peer, not a gatekeeper.
You acknowledge good decisions as well as flagging problems.
You never suggest changes outside the scope of the diff.
""";
private static string BuildDiffPrompt(string diff) => $"""
Review this pull request diff. Apply the standards from your system instructions.
Return your feedback as:
## Summary
One sentence overall assessment.
## Issues
Bulleted list of specific problems with line references. Omit if none.
## Suggestions
Optional improvements that aren't blocking. Omit if none.
DIFF:
{diff}
""";
Another good customization is to alter the prompt when the PR is created as a draft PR. In cases like that, your review approach could be different.
This is a DRAFT pull request — the author has flagged it as a work in progress.
Do not flag:
- TODOs or placeholder comments (these are expected)
- Missing tests (test coverage comes later)
- Incomplete error handling
DO flag:
- Architectural decisions that will be hard to change later
- Data model choices that look problematic at this stage
- Any security concerns that need to be designed in from the start, not bolted on later
Keep feedback high-level and directional.
The goal is early course correction, not a full review.
There are 4 main keys to writing a strong prompt for a PR review:
- Stack context - Include the language, framework, version, and architecture pattern
- Explicit include list - Provide a detailed explanation of what categories of issues to review
- Explicit exclude list - Provide a detailed explanation of what NOT to review
- Output format - Explain how the results should be formatted when returned from the LLM
Provide these details and you’ll usually get excellent results.
Keep Humans In The Loop
I’ll say it again. AI is not a substitute for human code reviewers. AI is an augmentation tool, not a replacement. Take the humans out of the loop and things go wrong really quickly.
There are many well-documented cases out there of AI-generated code causing production incidents when shipped without human review. AI should never be left alone to work by itself. Like any developer, the work it generates should be reviewed by other developers. That’s not to suggest human developers don’t make mistakes. Of course we do. But we each bring a different perspective. Humans will spot things AI won’t, just as AI will spot things humans won’t.
Humans and AI are both good at one thing: confidently wrong suggestions. Don’t get in the habit of rubber-stamping the output of the AI reviews. Blindly accepting the feedback our AI agent provides will have an overall negative impact on our code review efforts.
View it with a critical eye, just as you should any feedback. This takes us back to the critical aspect of logging the results which we’ve touched on repeatedly in previous posts. Having good telemetry and tracking which prompts generate good results and which don’t go a long way towards turning an automated review process like this into a useful tool in our team’s toolbelt.
Conclusion
AI is a tool, just like your linter or your testing suite or your CI/CD pipeline application. You wouldn’t ship your code without running linting or unit tests, and we’re at the point now where we shouldn’t ship code without including an AI-agent code review process either. It’s just another part of the development arsenal. Used properly, it can go a long way toward ensuring the quality of our code only gets better over time.

