[Case Study] A More Intelligent and Systematic Document Summarization Method by AssistAce for Summary

TecAce Software
Apr 21, 2024
5 min read

The following text has been translated from Korean to English using AssistAce.

One of the easiest AI solutions to help with LLM (Language Learning Models) these days is the Summary feature. It is a feature useful for its ability to convert long or difficult to understand documents into short and easy to understand content.

The feature is already widely utilized and easily accessed to summarize various types of documents such as news, PDFs, webpages, etc.

Challenge

As seen through the feedback from customers, there are a lot of different expectations for summaries: from news summaries to technical documents or even a large collection of texts such as product reviews, the necessity for summarization ranges widely. It's also worth noting that the satisfaction level of a summary depends on who is reading it and what it's being used for.

To summarize, summaries should be created differently depending on the following factors

Type of document e.g. technical documentation, news, product reviews, interviews (meeting minutes), stories, etc.
Type of reader e.g. reader's level, reader's language, reader's preferred summary format
Intended use e.g. to turn into a report, to share as a short document, to rephrase in easy or specialized language, etc.

In addition to this, there were requests such as adjusting the length of summaries and explaining the concepts of technical or uncommon terms. Although these are simple features, it was found that there are many requests for customization, so the generated summaries meet the specific needs of the reader.

Solution

We began our research to implement a more intelligent and universal summary feature. We conducted numerous experiments using OpenAI, Gemini, Claude, and other various LLMs to generate summaries that meet diverse conditions. Each LLM has its own strengths and weaknesses, but they all show good performance so there is no significant difference between the LLMs. (The comparison of the strengths and weaknesses of the three LLMs will be covered in the next session.)

To fulfill diverse requirements, we implemented Prompt Engineering which is integrated with the Prompt Management Engine. We then leveraged the LLM auto-fine-tuning feature to tailor the language in the executive summary to the specific needs of the corporate domain.

[Figure 1] TecAce AssistAce for Summarization Flow (Ver.1.2)

For the convenience of the customer, the goal is to be able to run summarization of any document type in one place and produce a summary in the desired format.

To achieve this, we implemented the above flow.

In the first part, Prompt Generation, the system identifies the document type and either specifies an existing or generates a new prompt that is best suited for summarization of that specific document type. In other words, the LLM reads the original document and identifies the type of document, such as a technical documentations, interview or meeting transcripts, news articles, novels, etc. It then designates a prompt for generating the summary in a format best suited to summarize that document type.

Summarization by Document Type

The prompt designation are as follows:

In case of technical documentation, the system determines the purpose of the document and generates a summary with a detailed outline that includes the main topics and keywords to be covered in each section. The main topics are gathered and organized in a hierarchical outline in the order of importance. Key terminology should be highlighted and defined clearly as well. In the case of a meeting or interview transcript, the system classifies the text by speaker, identifying the points made by each participant and summarizing without mixing them. For content such as product reviews, the system classifies the pros and cons of the product and summarizes them accordingly. For news articles, it summarizes the content by listing the main topics.

The following is a real-world example of the TecAce AssistAce for Summary. We tried to summarize a page from the Samsung Developer portal.

[Figure 2] Example screen of TecAce AssistAce for Summary

Using the example above, the process can be explained as follows. The system recognizes the document as a technical document, compiles a brief summary including key points, and provides translations in the desired language.

Additionally, since the summaries are created for the purpose of quickly understanding the documents, the system also offers a glossary feature to aid understanding. This feature allows users to easily comprehend difficult or professional terms by providing additional explanations from sources other than the document’s main text. By doing so, it offers a great deal of help in quickly understanding the compressed summaries.

Personalization of Document Summarization

By providing two types of feedback, users can contribute to the output and develop a personalized summary model. First, if the user is satisfied with the generated summary, they can press the ‘Like’ button. If unsatisfied they may press the ‘Dislike’ button which generates a new summary. If the ‘Like’ button is pressed, the user’s prompt management engine records the prompt and generates a similar format for summaries of similar document types in future uses. The ‘Dislike’ button triggers the prompt management engine to generate a different prompt and make an effort to generate a more suitable summary that meets the user’s preference. It is recommended to try again until an ideal summary format is generated. As additional prompts are generated, the system will be able to generate summaries that are more suited to the user’s preference, and the user will have their own personalized summary prompt.

In editor mode, where users can edit the generated summaries, they can modify the format of the summary or change words and sentences to meet their own specific needs. The modified format and words will be learned and applied to similar document types in the same domain. The modified details are automatically saved in order to fine-tune the model. Through user feedback the model is directly improved.

By adding a flow to improve prompts and models systematically based on the type of document, users can utilize a highly personalized summary service.

Result

TecAce's AssistAce for Summary is continuously evolving and establishing itself as a truly personalized summarization solution. It goes beyond the simple summary function to customize summary results according to various document types, domains, and purposes.

This enables every department and individual across the enterprise to have a summary service optimized for their needs. Marketing teams can summarize briefing materials, R&D teams can condense technical documents, and executives can distill meeting minutes into key points. Individuals can use it for reading notes, organizing mail, and much more.

The key is continuous user feedback and model improvement. Users repeatedly modify the summary format and content until they are satisfied, and these changes are reflected in the prompts and the model, building a personalized summary model. As a result, every user effectively gets their own professional summary assistant.

Through this innovative approach, TecAce is providing a new solution for efficient knowledge utilization and information sharing.

Q&A

Q: What are the best features of the AssistAce for Summary solution?

A: The biggest feature is that it provides a personalized, customized summary model that reflects ongoing feedback from users, rather than a fixed, one-size-fits-all summary.

Q: How is user feedback used to improve the summary model?

A: As users select "thumbs up" or "thumbs down" or make edits directly in the editor, this feedback is fed into the Prompt generation engine and LLM model to evolve into a personalized, tailored model.

Q: How are the best summaries provided for different document types?

A: By identifying the document type and using Prompt Engineering technology to automatically generate prompts that match the characteristics of the document, whether it's a technical document, meeting minutes, or product review, and feeding them into LLM.

Q: Is visualization of summary results or multimodal data summaries supported?

A: Not currently implemented, but visualization of summaries and multimodal data summaries will be added as we move forward.

Q: What if I want to test it out in practice? A: Please contact us via Request a Demo.

Challenge

Solution

Result

Q&A

Comments