Text summarization has long been a challenging task in natural language processing, requiring a delicate balance between brevity and informativeness. Large language models (LLMs) have revolutionized this field by generating coherent summaries with minimal task-specific fine-tuning. However, creating summaries that are both concise and information-rich remains a significant challenge.

Chain of Density (CoD) prompting^{1Adams, G., Fabbri, A., Ladhak, F., Lehman, E., & Elhadad, N. (2023). From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting. https://arxiv.org/abs/2309.04269} addresses this challenge through an innovative approach that systematically enriches summaries while maintaining a fixed length. This technique represents a significant advancement in how we can leverage LLMs for more effective summarization.

The key aspects of CoD include:

Iterative refinement: Instead of producing a summary in one pass, CoD prompting refines it over multiple iterations. In each step, the model identifies and integrates 1-3 missing salient entities, ensuring that previously omitted details are incorporated gradually.
Controlled compression: To maintain a constant token count, the model compresses or rephrases existing content to make space for new information. This strategy allows the summary to become more detailed without growing longer.
Enhanced abstraction: As the summary is repeatedly reworked, the language shifts toward a more abstract and synthesized narrative, effectively capturing the source material's essence.

The effectiveness of CoD comes from its ability to:

Preserve a fixed length, preventing verbosity
Fuse and reorganize content efficiently
Balance entity density-ideally around 0.15 entities per token-to align with human preferences

The Mechanics of CoD

Chain of Density (CoD) prompting transforms summarization into a sophisticated, iterative refinement procedure. Here's how it works:

Generate an initial "entity-sparse" summary containing only 1-3 salient entities from the source text. The fixed token count is crucial for consistent evaluation of informativeness.
Through five fixed iterations, identify unique, missing entities that are:

Relevant to the primary narrative
Specific yet concise (five words or fewer)
Novel to the existing summary
Faithful to the original article
Unrestricted in location within the article

"Fuse" these missing entities into the existing summary through:

Compression of redundant language
Abstraction of content
Integration of related details

Implementing CoD in Practice

The following template guides an LLM through the CoD process:

Template

Article: [ARTICLE]

You will generate increasingly concise, entity-dense summaries of the above Article. Repeat the following 2 steps 5 times.

Step 1. Identify 1-3 informative Entities (";" delimited) from the Article which are missing from the previously generated summary.

Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities.

A Missing Entity is:

Relevant: to the main story.
Specific: descriptive yet concise (5 words or fewer).
Novel: not in the previous summary.
Faithful: present in the Article.
Anywhere: located anywhere in the Article.

Guidelines:

The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., "this article discusses") to reach ~80 words.
Make every word count: re-write the previous summary to improve flow and make space for additional entities.
Make space with fusion, compression, and removal of uninformative phrases like "the article discusses".
The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.
Missing entities can appear anywhere in the new summary.
Never drop entities from the previous summary. If space cannot be made, add fewer new entities.

Remember, use the exact same number of words for each summary.

Answer in JSON. The JSON should be a list (length 5) of dictionaries whose keys are "Missing_Entities" and "Denser_Summary".

Real-World Application: The Chinese Grand Prix Incident

To better understand how CoD works in practice, let's examine a concrete example from Formula 1 racing:

The summary evolves through five steps:

Initial broad overview of the collision and consequences
Addition of team and position details
Integration of specific penalties and driver movements
Incorporation of technical details (lap numbers, damage specifics)
Final refinement while maintaining length

Human evaluations show that steps 3-4 typically achieve the optimal balance of informativeness and readability.

Comparative Analysis: CoD vs Traditional Methods

Method	Summary Characteristics	Strengths	Limitations
Vanilla Summarization	Single-step, general	Fast and straightforward	Misses crucial details
Human-Written	Thoughtfully balanced	High readability, context-rich	Subjective and time-intensive
CoD	Iteratively refined, dense	Incremental detail integration	Multiple processing steps

Current Limitations and Future Potential

While CoD advances summarization technology, it faces several challenges:

Domain specificity: Currently focused on news summarization
Evaluation challenges: Subjective density standards
Computational overhead: Multiple processing steps impact real-time performance
Quality consistency: Varies with source material and LLM selection

Note

Explore CoD further with the creators' dataset on Hugging Face, featuring 500 annotated and 5,000 unannotated examples.

Conclusion

Chain of Density Prompting advances automated summarization by combining iterative refinement with controlled information density. Its structured methodology provides a foundation for future developments in text summarization, offering a valuable tool for applications requiring high-quality, balanced summaries.

Footnotes

Adams, G., Fabbri, A., Ladhak, F., Lehman, E., & Elhadad, N. (2023). From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting. https://arxiv.org/abs/2309.04269 ↩

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

Edit this page

🟢 Self-Calibration

🟢 Chain-of-Verification (CoVe)

Master Generative AI with Our Courses

Need Business GenAI Training?

Contact Sales

Want to keep learning

Explore Our Full Course Collection

On this page

The Mechanics of CoD
Implementing CoD in Practice
Real-World Application: The Chinese Grand Prix Incident
Comparative Analysis: CoD vs Traditional Methods
Current Limitations and Future Potential
Conclusion

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

🟢 Chain of Density (CoD)