Prompt engineering techniques for AI Applications.
Techniques, Patterns, and Tactics for building efficient Open AI Applications
What is Prompt Engineering?
Large language models (LLMs) like GPT 3.5, and GPT 4 are compelling machine-learning models. They can understand and respond to the user query with text generated using natural language. They are sensitive to the user’s input, which gives us (as application developers) an option to influence the response generated by the models. The model input can be enhanced using a prompt. A prompt is an additional text written using natural language that is suffixed or/and prefixed to the actual request from the user. To LLM, a prompt is a set of instructions, rules, and guidelines with a specific objective that provides additional context to the LLM with the user query. The diagram below shows a general model of using prompts in Open AI applications.
Prompt Engineering is the art of skillfully communicating with an LLM. This way, the LLM response is improved beyond simple text or code generation. The quality of the prompt directly influences the LLM response. However, identifying appropriate prompts to accomplish a task needs a greater understanding of the LLM and an intuition of how it responds to particular semantics of the natural language.
Why is it required to use prompts?
Prompts may not be necessary for users who know how LLM models work. However, this is a small subset of users as of today. For Business users, a prompt can significantly assist to reach the intended goal.
As a general guideline, and also by following responsible AI practices, the prefixed or suffixed text should be transaprent to the user. The following are some reasons why prompts make more sense.
- User training: Most of the time, the user is unaware of how to frame an instruction to get a precise/reasonable response from the model. This could be due to less knowledge of the domain on which the instruction is grounded. Prompts help in enhancing the response and increase the chances of LLM responding specifically to the user’s query.
- Limiting the response: Asking for open-ended questions may lead to lengthy responses from the model. This can lead to an increase in the cost of the solution because the LLM models are charged per number of tokens generated.
- Respond in a particular way: Sometimes, you might want the LLM to respond in a particular format. For example, using JSON as a response format so that the response can be integrated into other applications.
The below snapshot confirms how prompts help in enhancing the response from LLM. In the first attempt, the model’s response is compact with not much reasoning and facts. In the second attempt (with prompt) the model’s response is more elaborated, based on facts and citations. It also responded in a list format which is easy to interpret.
Deconstructing a prompt
LLMs like GPT models are text-in / text-out models. Hence, the prompt can be written in any unstructured way. However, from repeated usage, it is observed that they respond well when you follow a particular structure. The below image shows the deconstruction of the prompt (the prompt is taken from the code example above).
Best Practices
- Use markup language: GPT models are trained using web content. Web semantics like HTML tags and separators are easily understood by the model. Use separators while constructing a prompt with multiple subsections. Markdowns or XML separators will provide better results. Label your sections with capitalized words so that the model understands the title of the section, ex: QUERY, RESPONSE FORMAT.
- Retrieval augmented search: Knowledge mining use cases can be greatly enhanced with Open AI. Using retrieval augmented generation, you can extract semantically related portions of a large corpus using efficient full-text search or vector-based search methods and share only the relevant context to the Open AI model.
- Restrict the length of generated content: Open AI models are charged per token basis, so if a user posts an open question the model may tend to generate more tokens. Using prompt engineering you can instruct the LLM model to respond in a few sentences only.
Patterns in prompt engineering.
Software patterns emerge from regular usage, patterns are used to map proven solutions to common problems. White et al. proposed an interesting approach to prompt engineering by identifying prompt engineering patterns. The image below shows six categories of patterns catered toward programming LLMs to respond in a particular way.
Each of these patterns is extremely powerful when applied individually or combined with others. For example, we can combine the Gameplay pattern (used for generating games until the ‘stop’ word) with a template pattern to generate a stream of templated responses from user interaction. An example of that is shown below.
For a complete list of patterns including brief descriptions, examples, and pros & cons of each pattern you can visit my GitHub repo here. For any given use case, a correct pattern is easy to identify using the checklist.
Challenges.
Below are some of the key challenges in prompt engineering in the context of building open AI-based applications.
- Prompt relevance: Based on the nature of the use case and interface, we can build a catalog of prompts by identifying suitable patterns and techniques mentioned above. However, identifying a prompt relevant to the user’s query from the existing catalog is still a challenge. Using a wrong prompt may generate an incorrect response, leading to a bad user experience. An intelligent approach and techniques to identify the prompt relevant to the user’s input are yet to be identified. A few approaches to consider are:
- Using low-latency ML models to identify the prompt with the highest relevance.
- Collect feedback from the user on the response and improve the prompt selection by using reinforcement learning by a human feedback loop (RLHF).
- Cost: LLM models are charged per token. A word roughly equates to 1.5 tokens. Prompt engineering is adding more context, instructions, rules, and examples to the input, which are also counted toward the number of tokens. Hence, it is important to strike the right balance between reducing the operating cost and improving the quality of response. To identify the cost of your input based on the model chosen, you may use my code snippet posted here.
- Content security: Decoupling internal (proprietary) knowledge base from LLMs is considered a good practice. This strategy helps in applying security policies (ACLs, content classification, authorization) on the knowledge base before sharing it with Open AI models. Appropriate security practices such as role-based content search (like RAG), document classification, and validation of the response should be applied to protect confidential/personal information. However, the models can still learn confidential data from user interactions. Efficient techniques to make the model unlearn, are yet to be identified.
- Prompts in conversational models: Prompts in conversation models like GPT-3.5-turbo follow a unique syntax. Storing, retrieving, and archiving the conversations per user, adds to the complexity while designing an AI application. Identifying scalable, secure engineering practices such as data structures, data stores, session handling, and caching pose a new category of challenges that are key in building effective AI chatbots.
Summary
LLM models are powerful, and literally, respond to any user query. Prompt engineering acts as a programmable technique to influence the response from the LLM models. Without prompt engineering, the models may hallucinate, increase operating costs, or break integration scenarios (if the response format changes). Prompt engineering techniques, tactics, and patterns described in this help apply a systematic approach to prompt engineering, thereby building efficient LLM applications.