Imagine a Roomba that only told you your floors were dirty, but didn’t actually clean them for you. Helpful? Debatable. Annoying? Very.
When ChatGPT first arrived, that was about where things stood. It could describe how to do math problems and discuss theory endlessly, but it couldn’t reliably handle a simple arithmetic question. Connecting it with an external application, however (like an online calculator) significantly improved its abilities—just like connecting Roomba’s sensors with its robot body makes it capable of actually cleaning your floor.
That simple discovery was a precursor to an evolution that’s now occurring in generative AI where large language models (LLM) power AI agents that can pursue complex goals with limited direct supervision.
In these systems, the LLM serves as the brain while additional algorithms and tools are layered on top to accomplish key tasks ranging from generating software development plans to booking plane tickets. Proof-of-concepts like AutoGPT offer examples, such as a marketing agent that looks for Reddit comments with questions about a given product and then answers them autonomously. At their best, these agents hold the promise of pursuing complex goals with minimal direct oversight—and that means removing toil and mundane linear tasks while allowing us to focus on higher-level thinking. And when you connect AI agents with other AI agents to make multi-agent systems, like we’re doing with GitHub Copilot Workspace, the realm of possibility grows exponentially.
All this is to say, if you’re a developer you’ll likely start encountering more and more instances of agentic AI in the tools you use (including on GitHub) and in the news you read. So, this feels like as good a time as any to dive into exactly what agentic AI and AI agents are, how they work on a technical level, some of the technical challenges, and what this means for software development.
What are AI agents and agentic AI?
Agentic AI refers to artificial intelligence capable of making decisions, planning, and adapting to new information in real time. AI agents learn and enhance their performance through feedback, utilizing advanced algorithms and sensory inputs to execute tasks and engage with their environments.
According to Lilian Weng, the head of safety systems at OpenAI and their former head of applied AI research, an AI agent features three key characteristics:
- Planning: an AI agent is capable of creating a step-by-step plan with discrete milestone goals from a prompt while learning from mistakes via a reward system to improve future outputs.
- Memory: an AI agent combines the ability to use short-term memory to process chat-based prompts and follow-up prompts with longer-term data retention and recall (often via retrieval augmented generation, or RAG).
- Tool use: an agent can query APIs to request additional information or execute an action based on an end user’s request.
What are the different types of AI agents?
AI agents range from simple reflex agents to sophisticated learning agents, and each has its strengths and weaknesses.
As this field continues to evolve, more types of AI agents will likely emerge. Whether you’re looking to build your own AI agent or understand a bit more about how GitHub uses AI to improve developer tools, here’s a list of the different types of AI agents you’ll most commonly encounter:
Characteristics | Examples | |
---|---|---|
Reflex agent | Uses a model of the world to make decisions. They can remember some past states and make decisions based on both current and past experiences. | Linting tools like ESLint or Pylint that apply a set of predefined rules to evaluate code. |
Goal-based agent | Achieves specific goals using their knowledge and the stated goal (or prompt) to make decisions. | Advanced IDEs with AI-powered code completion such as GitHub Copilot. |
Utility-based agent | Aims to achieve a goal in the best way possible, as determined by evaluating different possible approaches. | Tools that prioritize and assign bugs based on severity, impact, and developer workloads. |
Learning agent | Improves performance over time by learning from experiences. They consist of a learning element that makes improvements to the AI agent’s outputs based on user feedback and a performance element that uses the learned knowledge. | Code completion tools, such as GitHub Copilot, that improve over time. |
Common technical challenges with AI agents today
While there’s a lot of promise in agentic AI, there are two core industry-wide technical challenges when developing agentic AI systems today:
- We can’t deterministically predict what an AI model will say or do nextand that makes explaining what and how their inputs work (that is, the combination of the prompt and the training data they use to generate a response) challenging.
- We don’t have models that can fully explain their outputsthough work is being done to offer greater transparency by enabling them to explain how they arrived at a solution.
As a result, it is difficult to debug agentic systems and to create evaluation frameworks to understand their effectiveness, efficiency, and impact.
AI agents are difficult to debug, because they are prone to solve problems in unexpected ways. This is a nuance that has long been known in—of all things—chess, where machines make moves that seem counterintuitive to their human opponents, but can win games. The more sophisticated an agent becomes, the longer you expect it to run, the more difficult it is to debug—especially when you consider how quickly a log can grow.
AI agents are also difficult to evaluate in a repeatable way that shows progress without employing artificial constraints. This is especially challenging as the core capabilities of the underlying LLMs continue to rapidly improve, which makes it difficult to know whether your approach has improved results or if it’s simply the underlying model. Developers often encounter problems in choosing the right metrics, benchmarking overall performance against a set heuristic or rubric, and collecting end-user feedback and telemetry to evaluate agent output efficacy.
How we think about AI agents at GitHub
Our focus at GitHub has been to rethink the developer “inner loop” as collaboration with AI. That means AI agents that can reliably build, test, and debug code. It means reducing the energy needed to get started and empowering more people to learn and contribute to code bases. We know that it requires tackling every part of the developer’s day where they run into friction, and that’s where multi-agent systems like Copilot Workspace and code scanning autofix come in.
Earlier this year, we launched a technical preview of Copilot Workspace, our Copilot-native developer environment. It’s a multi-agent system—a network of agents that interact and collaborate to achieve a larger goal. Each agent in a system typically has specialized skills or functions, and they can communicate and coordinate with one another to solve complex problems more efficiently than a single agent could.
For Copilot Workspace, that means a developer can ask Copilot to help create an application, and it will not only generate a software development plan, but also the code, pull requests, and more, needed to achieve that plan.
There’s more in the works to make developers more productive and make their days a little bit (or a lot) better.
Why this matters (and some final thoughts)
There’s a lot of buzz around AI agents—and for good reason. As they continue to evolve, they’ll be able to work together to handle more complex tasks, which means less upfront cost of prompt engineering for users. For developers though, the benefit of AI agents is simple: they can allow developers to focus on higher-value activities.
When you give LLMs access to tools, memory, and plans to create agents, they become a bit like LEGO blocks that you can piece together to create more advanced systems. That’s because, at their best, AI agents are modular, adaptable, interoperable, and scalable, like LEGO blocks. Just as a child can transform a pile of colorful LEGO blocks into anything from a towering castle to a sleek spaceship, developers can use AI agents to build multi-agent systems that promise to revolutionize software development.
At GitHub, we’re excited about what AI agents, agentic AI, and multi-agent systems mean more broadly for software developers. With agentic AI coding tools like Copilot Workspace and code scanning autofix, developers will be able to build software that’s more secure, faster—and that’s just the beginning.
Written by