Creating Your Own AI Co-author Using C++ – InfoQ.com
QCon San Francisco (Oct 2-6): Get assurance you’re adopting the right practices. Register
Facilitating the Spread of Knowledge and Innovation in Professional Software Development
In this article, we will look at how to identify and fix performance issues in Go programs using the pprof and trace packages. We will begin by covering the fundamentals of the tools, then delving into practical examples of how to use them. By the end of this article, you will have a solid understanding of how to use these powerful tools to improve the performance of your Go applications.
Juncheng Yang discusses three trends in hardware, workload, and cache usage that shape the design of modern caches.
While using ChatGPT through a web interface is one thing, creating your own autonomous AI tool that interfaces with ChatGPT via its API, is a different story altogether. As strong proponents of C++, in this article we are going to present a GPT tool written in C++ to ease the pain of dealing with the daunting task of editing endless editorial comments.
In recent years, there has been increased attention to neurodivergencies such as ADHD, hyper-sensitivity, autism, dyslexia, etc. In this article, Dietrich Moerman tells his own story about ADHD while working as a software developer and becoming a team lead, what he learned, and what he found to be working well to help people with ADHD and more to thrive in their teams and companies.
Mykyta Protsenko discusses the trade-offs that companies face during the process of shifting left, how to ease cognitive load for the developers, and how to keep up with the evolving practices.
Learn what’s next in software from world-class leaders pushing the boundaries. Attend in-person or get video-only pass to recordings.
Your monthly guide to all the topics, technologies and techniques that every professional needs to know about. Subscribe for free.
InfoQ Homepage Articles Creating Your Own AI Co-author Using C++
Jul 03, 2023 15 min read
by
reviewed by
While using ChatGPT through a web interface is one thing, creating your own autonomous AI tool that interfaces with ChatGPT via its API, is a different story altogether – especially when you aim to maintain complete control over the interaction with the user. At the same time, as strong proponents of C++, we believe that a GPT tool in C++ will ease the pain of dealing with the daunting task of editing (endless) editorial comments.
We aim to explore the realm of MS Office automation and leverage the ChatGPT API to enhance the editing process. We envision a sophisticated tool that seamlessly integrates C++ with the ChatGPT API, providing a new way to interact with editorial comments in Word documents.
Traditional document editing involves manually reviewing content and adding comments to specific sections. In our case, as we worked on our C++ book, we encountered over 100 editorial comments each time, most of which related to the publisher’s style guide and annotations. It would have been helpful to have a way to store these comments and associated text in a database, not to mention the potential for AI-based editing. That's precisely what our software accomplishes: by automating this process, we can expedite the editing workflow. While this tool serves as proof of concept (POC) and is not recommended for writing and editing entire books, it still presents an exciting exercise in automation and is certainly worth trying.
Code, deploy, and scale Java your way.
Microsoft Azure supports your workload with abundant choices, whether you're working on a Java app, app server, or framework. Learn more.
The workflow begins with our software scanning the Word file, meticulously examining each editorial comment embedded within the document using Office Automation API.
Once enumerated all comments, our tool extracts them along with the associated text segments and stores them in a sqlite3 database. Based on this, it prepares targeted questions for ChatGPT revolving around how to improve or fix a particular section of text. By leveraging the ChatGPT API, we can tap into the language model's vast knowledge and linguistic prowess to obtain expert suggestions and recommendations.
Upon receiving a response from ChatGPT, our tool dynamically incorporates the suggested edits into the associated text segments, seamlessly enhancing the content based on the model's insights.
This automated editing process significantly reduces manual effort and accelerates overall document refinement. Our tool even tracks the changes but remembers to turn 'track changes' off, when done.
Programming-wise, there are several building blocks in our project, and some of them can be expanded or replaced to serve different purposes. Let’s call our code Proof of Concept.
Here are the players involved in the process – our building blocks:
Our tool interfaces and interacts with ChatGPT by utilizing various parameters and approaches. We prepare payloads to be sent to the API and parse the responses. To use our tool, you must obtain an API key and add it to our code instead of "<Your-API-key>
". Here is a code snippet demonstrating the basics of interfacing with ChatGPT.
The advantage of using the API includes being able to interface and interact with Chat GPT, using different parameters and approaches, preparing payloads to be sent to the API, and parsing the response we get back.
When using ChatGPT API, there are several things to take into consideration.
For the purpose of this article, we created a generic function. That function is modular as it generates requests with modular attributes and parameters in the following form:
Let’s go over some issues and requirements along with these attributes:
As a user of ChatGPT API, you will be charged for the tokens you consume.
Model
Price for 1000 tokens (prompt)
Price for 1000 tokens (completion)
We always like to say that the significance of well-structured prompts cannot be overstated. A carefully constructed prompt acts as a guiding blueprint, influencing the quality of the generated output. In this article, we will delve into the components of an effective prompt and offer practical examples and guidelines to help C++ students maximize the potential of ChatGPT API in their projects.
Here is an example:
When you compose a prompt, it is best to create a template containing the constant parts of the requests you will use throughout the program and then change the variable parts based on the immediate need. Here are some key building blocks for a good prompt:
Context:
Context serves as the groundwork for the prompt, offering crucial background information. It enables the Language Model to grasp the task's essence. Whether it entails a concise problem description or a summary of pertinent details, providing context is pivotal.
Example:
"You are a software developer working on a mobile app for a food delivery service. The app aims to provide a seamless experience for users to order food from local restaurants. As part of the development process, you need assistance generating engaging and informative content about the app's features."
Task:
The task defines the precise goal or objective of the prompt. It should be clear, concise, and focus on the specific information or action expected from the ChatGPT model.
Example:
"Compose a short paragraph that highlights the app's key features and showcases how they enhance the food delivery experience for customers."
Constraints:
Constraints set boundaries or limitations for the prompt. They may encompass specific requirements, restrictions on response length or complexity, or any other pertinent constraints. By defining constraints, you can guide the generated output toward the desired outcome.
Example:
"The response should be concise, with a maximum word count of 150 words. Focus on the most prominent features that differentiate the app from competitors and make it user-friendly."
Additional Instructions:
In this section, you have the opportunity to provide supplementary context or specify the desired output format. This can include details regarding the expected input format or requesting the output in a specific format, such as Markdown or JSON.
Example:
"Please format the response as a JSON object, containing key-value pairs for each feature description. Each key should represent a feature, and its corresponding value should provide a brief description highlighting its benefits."
By understanding and implementing these fundamental components, C++ developers can master the art of constructing effective prompts for optimal utilization of the ChatGPT API in their projects. Thoughtfully incorporating context, defining clear tasks, setting constraints, and providing additional instructions will enable developers to achieve precise and high-quality results.
In most cases, we would like to be able to continue a conversation from where you left off last time. There is a special flag used by Chat GPT API to allow that. If it isn’t set, here is what will happen:
➢ What is the capital of France?
Your AI friend responds:
➢ The capital of France is Paris.
Then comes a follow-up question:
➢ How big is it?
➢ I apologize, but I need more context to accurately answer your question. What are you referring to?
To fix that, we need to maintain a continuous chat, but how do we do that? In fact, the only way to do that must include passing back and forth a string containing the entire conversation.
We also define:
which is defined as
In our source code, you can see how we maintain our Conversation object up to a fixed length (as, clearly, we can’t store endless conversations). This fixed length is set here:
As already mentioned, our prompt plays a key role in the efficiency of the request, and when it comes to continuous chats, we may want to use a different prompt:
When you ask your AI friend:
➢ Write me a C++ code that counts from 1 to 10
You may get just that:
➢ Sure, here's the C++ code to count from 1 to 10:
Without any source code.
Here is why: The stop parameter sent to the API lets the model know at what point of its output it should stop generating more. The newline is the default when nothing is specified, and it means that the model stops generating more output after the first newline it outputs.
But if you set the "stop" parameter to an empty string, you will get the full response including the source code:
[Click on the image to view full-size]
OLE Automation is a technology introduced by Microsoft in the past that has since evolved. In our implementation, we utilize Microsoft automation directly, bypassing the use of MFC (Microsoft Foundation Classes). To access various elements of MS Word, such as documents, active documents, comments, etc., we define an IDispatch COM interface for each object we need to interact with.
Office Automation
Our tool automates various tasks and features within MS Word. It can read comments, find associated text, turn on/off "Track Changes," work in the background, replace text, add comments, save the result, and close the document. Here is a description of the functions we use:OLEMethod()
: A helper function that invokes a method on an IDispatch
interface, handling method invocations and returning HRESULT
values indicating errors.Initialize()
: A function that initializes the OfficeAutomation
class by creating an instance of the Word application and setting its visibility. It initializes the COM
library, retrieves the CLSID
for the Word application, creates an instance of the application, and sets its visibility.OfficeAutomation()
: The constructor of the OfficeAutomation
class. It initializes member variables and calls the Initialize
function with false to create a non-visible Word application instance.~OfficeAutomation()
: The destructor of the OfficeAutomation
class. It does nothing in this implementation.SetVisible()
: A function that sets the visibility of the active document. It takes a boolean parameter to determine whether the document should be visible or not. It uses the OLEMethod
function to set the visibility property of the Word application.OpenDocument()
: A function that opens a Word document and sets its visibility. It takes a path to the document and a boolean parameter for visibility. It initializes the class if necessary, retrieves the Documents interface, opens the specified document, and sets its visibility.CloseActiveDocument()
: A function that closes the active document. It saves the document and then closes it. It uses the OLEMethod
function to call the appropriate methods.ToggleTrackChanges()
: A function that toggles the "Track Revisions" feature of the active document. It gets the current status of the feature and toggles it if necessary. It uses the OLEMethod
function to access and modify the "TrackRevisions" property.FindCommentsAndReply()
: A function that finds all comments in the active document, sends a request to the ChatGPT API for suggestions, and updates the associated text of each comment based on the API response. It iterates through each comment, retrieves the associated text range, sends a prompt to the ChatGPT API with the text and comment as context, receives the API response, and updates the text range with the suggested changes.CountDocuments()
: A function that returns the number of open documents in the Word application associated with the OfficeAutomation class. It retrieves the Documents interface and returns the count.
When developing a mechanism that will go over comments, we need to be able to enumerate all comments and distinguish between resolved ones and non-resolved ones.
That is done the following way:
As you can see, using OLEMethod()
along with DISPATCH_PROPERTYGET
, allows us to check the property name "Done" which will indicate resolved comments.
Next, we can just enumerate all comments in the document, and maybe print the "Resolved" status per each of these comments.
Before we start, we would want to not just enumerate the comments, but also the text associated with them. The reason for that is laid on the initial purpose of commenting. The author of a document composes and edits the document. The editor marks a segment, which can be a paragraph, sentence, or even a word, and adds a comment. When we read a comment, we need the context of that comment, and the context would be that marked segment.
So when we enumerate all comments, we do not just print the comment’s text but also the text associated with it (our segment).
When we start going over all comments, we need to declare and initialize 2 pointers:pComments
– points to the document’s comments.
pRange
– points to the document’s content (the segment that holds the text associated with the comment).
Each of them is initialized:
Then we can start our loop to iterate through all comments in the document.
You can see how that’s done in our source code, but generally speaking, we start with the comment, go to the associated text, and check if the comment is resolved. Then we can either print it to a report, add it to a database, or send it to Chat GPT API.
To interface with any API over the web, we employ general code that facilitates sending requests and parsing responses using the JSON data format. In this process, we utilize libCurl, a powerful tool widely used for transferring data across the network using the command line or scripts. It has extensive applications across different domains, including automobiles, televisions, routers, printers, audio equipment, mobile devices, set-top boxes, and media players. It serves as the internet transfer engine for numerous software applications, with billions of installations.
If you check our source code, you can see how libCurl is used.
By utilizing the power of MS Office automation and integrating it with the ChatGPT API, we empower editors and writers to streamline their workflow, saving valuable time and improving the quality of their work. The synergy between C++ and the ChatGPT API facilitates smooth and efficient interaction, enabling our tool to provide intelligent and context-aware recommendations for each editorial comment.
As a result, our small MS Office automation POC tool, powered by the ChatGPT API and C++, revolutionizes the editing process. By automating the extraction of editorial comments, interacting with ChatGPT to seek expert guidance, and seamlessly integrating the suggested edits, we empower users to enhance the quality and efficiency of their work in Word documents. This powerful combination of technologies opens new possibilities for efficient document editing and represents a significant leap forward in the field of MS Office automation.
Writing for InfoQ has opened many doors and increased career opportunities for me. I was able to deeply engage with experts and thought leaders to learn more about the topics I covered. And I can also disseminate my learnings to the wider tech community and understand how the technologies are used in the real world.
I discovered InfoQ’s contributor program earlier this year and have enjoyed it since then! In addition to providing me with a platform to share learning with a global community of software developers, InfoQ’s peer-to-peer review system has significantly improved my writing. If you’re searching for a place to share your software expertise, start contributing to InfoQ.
I started writing news for the InfoQ .NET queue as a way of keeping up to date with technology, but I got so much more out of it. I met knowledgeable people, got global visibility, and improved my writing skills.
Becoming an editor for InfoQ was one of the best decisions of my career. It has challenged me and helped me grow in so many ways. We'd love to have more people join our team.
InfoQ seeks a full-time Editor-in-Chief to join C4Media’s international, always remote team. Join us to cover the most innovative technologies of our time, collaborate with the world’s brightest software practitioners, and help more than 1.6 million dev teams adopt new technologies and practices that push the boundaries of what software and teams can deliver!
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
We protect your privacy.
You need to Register an InfoQ account or Login or login to post comments. But there’s so much more behind being registered.
Get the most out of the InfoQ experience.
Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p
Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p
Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
We protect your privacy.
October 2-6, 2023.
Attend in-person. Or get a Video-Only Pass to watch recordings later.
QCon San Francisco International Software Conference returns this October 2-6. More than 1000 software professionals will join together and learn about the emerging trends they should pay attention to in 2023, how to adopt them, how to avoid pitfalls, and how to embrace the best practices.
Join the experience and get implementable ideas to shape your projects that last beyond the conference.
SAVE YOUR SPOT NOW
InfoQ.com and all content copyright © 2006-2023 C4Media Inc.
Privacy Notice, Terms And Conditions, Cookie Policy