KG-enhanced LLM: Large Language Model (LLM) and Knowledge Graph Patterns (Part 1/3)
Knowledge Graph-enhanced LLM
In this series of articles, we will explain Large Language Models, Knowledge Graphs and their combinations to examine the popular pattern of combining them and finally discuss to what extent this pattern will persist or perish in the future.
These articles are co-written by
and , the first one dives into LLMs, the second dives into KGs, and the third dives into their combination and viability of their future coexistence. All three articles contain tangible ways that are useful for enterprise data management and science.This article is part of a series of 3 articles to explain LLM + KG pattern. In this part (1/3), we will talk about Large Language Models (LLMs) and how they can be enhanced using knowledge graphs.
1. What is a Large Language Model (LLM) ?
Before diving into LLMs, we will start by defining what is a Language Model (LM).
A Language Model is a mathematical model designed to represent the language domain. The model is trained on a corpus of textual documents for a specific task. During this training phase, the model learns language semantics to have a better understanding of the language domain.
There are 3 popular types of language models with a transformer-based architecture, which are the encoder-style, decoder-style and encoder-decoder-style. They are based on the original transformer architecture [1] as shown in Figure 1.
Encoder-style language models. It is the most popular LM architecture. The famous Google/BERT models are based on an encoder-type architecture, like BERT (2018) and ALBERT (2020). This type of LM is composed of encoder blocks only, which are pre-trained on large amount of textual data to learn language semantics.
Decoder-style language models. They are generative in nature. The famous OpenAI/GPT models and Meta/LLaMA model (2023) are based on a decoder-type architecture. This type of models are doing well in generalizing to new downstream tasks without task-specific fine-tuning [2], unlike the encoder-type models.
Encoder-decoder-style language models. This architecture of models is usually used in sequence-to-sequence tasks like machine translation and text summarization. It allows models to encode the input sequence in a latent space, which is then decoded to an output sequence. Models like Meta/Bart (2020) and Google/Flan-T5 (2022) are based on an encoder-decoder architecture.
Now that we have defined what a Language Model is, we can move on to Large Language Models.
A Large Language Model [6] is a large-scale language model (LM) containing a large number of parameters, up to hundreds of billions. They are pre-trained on huge amount of data, which allows them to achieve general-purpose language understanding and generation.
Recent LLM architectures support multimodal inputs, which permit models to interact with users through the prompt using text, image, audio and video. This multimodal architecture traits different types of informations in a centralized manner.
One fascinating aspect of large language models (LLMs) is the phenomenon of emergent abilities—capabilities that arise unexpectedly as the model's size increases. These include complex behaviors such as reasoning, multi-step problem solving, and in-context learning, which are not explicitly programmed or present initially in smaller models.
For a deeper exploration of emergent abilities in LLMs, refer to [3].
2. How LLMs work ?
Many large language models (LLMs), such as the well-known OpenAI GPT models, are built on a decoder-style architecture. These models are initially pre-trained in an Auto-Regressive fashion using massive amounts of textual data, with the objective of predicting the next token in a sequence. This process enables the model to learn complex semantic relationships and effectively generate coherent text. It's important to note that the model predicts tokens, not just words—tokens can be whole words, subwords, or even characters, depending on the tokenizer used. After pre-training, LLMs are typically fine-tuned on specific tasks to adapt them for targeted applications.
Auto-Regressive models are a class of machine learning models that generate predictions by relying on previous elements in a sequence. They operate under the assumption that future values are strongly influenced by past ones. In natural language processing (NLP), this concept is illustrated by the next token prediction task. Here, the model predicts the next token in a sequence based on the preceding tokens. Once the token is predicted, it is appended to the sequence, forming a new input for the next prediction. This iterative process continues, allowing the model to generate coherent text. Through such pre-training on large text corpora, the model learns both contextual and semantic relationships within language.
3. LLM General Limitations
LLMs are powerful generative AI tools, though they come with certain limitations.
Misinformation Generation. LLMs can sometimes generate content that is factually incorrect reflecting a potential misinformation present in the training data.
Outdated Information. LLMs are trained on data available up to a certain point in time and do not have real-time awareness, which can lead them to provide information that is no longer current or accurate.
Overgeneralization. LLMs sometimes produce content that lacks specificity or nuance, resulting in overly broad or vague responses.
Limited Creativity. While LLMs may appear creative, their outputs are fundamentally based on pattern recognition. They do not generate truly original ideas but instead mimic patterns found in their training data.
Ethical Risks. The ethical implications of LLMs largely depend on how they are used. They can generate misleading information that endangers individuals or supports harmful narratives and propaganda.
4. KG-enhanced LLMs Explained
Knowledge Graph-enhanced LLMs [5] combine the strengths of large language models with structured knowledge from knowledge graphs (KGs) to improve reasoning, factual accuracy, and domain-specific understanding.
A knowledge graph [4] is a structured representation of information, where entities (such as people, places, or concepts) are represented as nodes, and relationships between them are represented as edges. This structured format allows for explicit encoding of factual and relational information, which is often difficult for LLMs to learn purely from unstructured text.
Traditional LLMs are trained on large volumes of text in an auto-regressive manner (as explained before) and excel at generating fluent and coherent language. However, as we established, they may struggle with:
Factual accuracy
Complex reasoning
Up-to-date information
Domain-specific knowledge
By integrating knowledge graphs, KG-enhanced LLMs can:
Ground their outputs in structured knowledge, improving factual consistency.
Perform better at knowledge-intensive tasks, such as question answering, entity linking, or reasoning over relationships.
Access latest facts without needing to retrain.
Incorporate external Domain-specific KGs sources which enables LLMs to access precise, domain-relevant insights.
5. KG-enhanced LLMs Approaches
Knowledge Graphs can be incorporated at various stages of the LLM lifecycle to improve performance, reliability, or transparency. In [5], the authors categorize KG-enhanced LLMs methods into three main categories: 1) KG-enhanced LLM pre-training, 2) KG-enhanced inference, and 3) KG-enhanced interpretability. The details of each category are presented below:
KG-enhanced LLM pre-training. This approach injects KG information into LLMs during the pre-training stage. It can be implemented in several ways:
Integrating KGs into LLMs Input. This approach injects subgraph knowledge directly into the LLM’s input. For instance, ERNIE 3.0 [7] concatenates a sentence with a KG triple and represents the combination as a sequence of tokens. During training, either the relation token in the triple or tokens in the sentences are randomly masked, encouraging the model to jointly learn from both graph-based and textual information.
Integrating KGs into Training Objective. Beyond modifying the input, KG information can be incorporated by designing specific pre-training objectives that jointly handle both structured and unstructured data. For example, ERNIE [8] introduces a word-entity alignment training objective, where both sentences and corresponding KG entities mentioned in the text are fed into the LLM. The LLM is then trained to predict alignment links between textual tokens and KG entities, which strengthens the connection natural language and structured knowledge.
KGs Instruction-tuning. This method fine-tunes LLMs to better understand the KG structure and to follow instructions that leverage reasoning ability on complex tasks. OntoPrompt [9], for instance, uses an ontology-enhanced prompt-tuning that integrates entity informations from the KG into the LLM context before fine-tuning it on downstream tasks, thereby enhancing reasoning capabilities.
KG-enhanced LLM inference. Includes research that utilizes KGs during the inference stage of LLMs, which enables LLMs to access the latest knowledge without retraining.
Retrieval-Augmented Knowledge Fusion. This strategy integrates relevant external knowledge into an LLM’s reasoning pipeline by retrieving it from a large corpus or knowledge source. A representative example is the Graph Retrieval-Augmented Generation (Graph-RAG) framework, which is highly effective for question-answering tasks. In this approach, the query is first encoded, and pertinent documents or subgraph fragments are retrieved from the KG. The retrieved information is then incorporated into the LLM through a structured prompt, enabling the model to utilize domain-specific context without requiring direct fine-tuning on the KG.
In contrast, RAG [10] combines a non-parametric retrieval module with a parametric language model to incorporate external knowledge. Unlike Graph-RAG, it treats retrieved documents as latent variables, injecting them into a sequence-to-sequence LLM. This design allows the model to be fine-tuned for question-answering tasks, further improving its capacity to generate accurate and contextually grounded responses.
KGs Prompting. Prompting refers to the practice of crafting structured inputs that guid LLMs to address specific tasks or use cases. KG-based prompting involves integrating structured knowledge from a KG into these prompts in the form of natural language sequences. By converting KG triples into concise sentences and incorporating them into a predefined prompt, the LLM can use this sequential input as contextual information to support reasoning. An example of this approach is described in [11], where triples are transformed into short textual statements and are provided to the LLM as part of the prompt. Another variant of KG-based prompting involves integrating structured knowledge from a KG directly into the prompt in the form of triples. CoK [12] introduces a chain-of-knowledge prompting method, where triples are arranged in a logical sequence to engage and enhance the LLM’s reasoning capabilities.
As described in [5], KG-enhanced LLM pre-training methods integrate knowledge during the training phase, making updates possible only through retraining; as a result, they may generalize poorly to recent or unseen information. In contrast, KG-enhanced LLM inference methods allow knowledge updates by modifying the input at inference time, making them more suitable for incorporating new or rapidly changing information. Consequently, pre-training approaches are best suited for time-insensitive knowledge in well-defined domains, whereas inference-based approaches are preferable for open-domain applications where knowledge evolves frequently.
KG-enhanced LLM interpretability. Includes works that use KGs to understand the knowledge learned by LLMs and interpret the reasoning process of LLMs.
KGs for LLM Probing. This approach involves methods using KGs to evaluate and verify the knowledge stored within an LLM. Typically, KG triples are transformed into cloze-style statements, in which a part of the statement is masked and the model is asked to predict the missing entity. A representative example is LAMA Probe [13], the first work to prob the factual knowledge stored in LLMs using KG-derived queries.
KGs for LLM Analysis. This approach leverages KGs to explain the semantic and structural aspects behind the reasoning capability of LLMs. For example, QA-GNN [14] grounds the LLM outputs produced at each reasoning step in a KG, enabling the extraction of graph structures that help explain how the model reason. To investigate the implicit knowledge stored within LLMs, [15] introduce the concept of knowledge neuron, where specific neuron activations are found to correlate with particular knowledge associations.
6. Challenges of KG-enhanced LLM
Integration Complexity. Bridging the gap between structured KG data and unstructured text-based models is non-trivial. LLMs work with sequences of tokens, while KGs operate on graph structures.
Scalability and Performance. Querying large-scale KGs during inference can be computationally expensive and slow.
Incomplete or Noisy Knowledge Graphs. KGs are often incomplete, domain-limited, or contain inaccuracies, which can lead to misleading outputs when used blindly.
7. Conclusion
KG-enhanced LLMs merge the strengths of structured knowledge graphs and unstructured language models, enabling AI systems to achieve higher factual accuracy, improved reasoning, and greater interpretability. By integrating KGs at different stages of the LLM lifecycle—whether during pre-training, inference, or interpretability analysis—these methods provide flexible pathways for enhancing domain-specific performance and keeping models aligned with up-to-date information. Pre-training approaches excel in stable, domain-focused settings, while inference-based techniques are better suited to rapidly evolving, open-domain scenarios. Interpretability-focused methods further enhance trustworthiness by grounding model outputs in explicit knowledge structures.
8. Key Takeaways
KG-enhanced LLMs important advantages:
Synergy of Structured and Unstructured Knowledge
Improved Factuality and Reasoning
Better Domain-Specific Performance
Dynamic and Updatable Knowledge
9. References
[1] VASWANI, Ashish, SHAZEER, Noam, PARMAR, Niki, et al. Attention is all you need. Advances in neural information processing systems, 2017, vol. 30.
[2] BROWN, Tom, MANN, Benjamin, RYDER, Nick, et al. Language models are few-shot learners. Advances in neural information processing systems, 2020, vol. 33, p. 1877–1901.
[3] WEI, Jason, TAY, Yi, BOMMASANI, Rishi, et al. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
[4] HOGAN, Aidan, BLOMQVIST, Eva, COCHEZ, Michael, et al. Knowledge graphs. ACM Computing Surveys (Csur), 2021, vol. 54, no 4, p. 1-37.
[5] PAN, Shirui, LUO, Linhao, WANG, Yufei, et al. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 2024, vol. 36, no 7, p. 3580-3599.
[6] MINAEE, Shervin, MIKOLOV, Tomas, NIKZAD, Narjes, et al. Large language models: A survey. arXiv preprint arXiv:2402.06196, 2024.
[7] SUN, Yu, WANG, Shuohuan, FENG, Shikun, et al. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137, 2021.
[8] ZHANG, Zhengyan, HAN, Xu, LIU, Zhiyuan, et al. ERNIE: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019.
[9] YE, Hongbin, ZHANG, Ningyu, DENG, Shumin, et al. Ontology-enhanced Prompt-tuning for Few-shot Learning. In : Proceedings of the ACM web conference 2022. 2022. p. 778-787.
[10] LEWIS, Patrick, PEREZ, Ethan, PIKTUS, Aleksandra, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 2020, vol. 33, p. 9459-9474.
[11] LI, Shiyang, GAO, Yifan, JIANG, Haoming, et al. Graph reasoning for question answering with triplet retrieval. arXiv preprint arXiv:2305.18742, 2023.
[12] WANG, Jianing, SUN, Qiushi, LI, Xiang, et al. Boosting language models reasoning with chain-of-knowledge prompting. arXiv preprint arXiv:2306.06427, 2023.
[13] PETRONI, Fabio, ROCKTÄSCHEL, Tim, LEWIS, Patrick, et al. Language models as knowledge bases?. arXiv preprint arXiv:1909.01066, 2019.
[14] YASUNAGA, Michihiro, REN, Hongyu, BOSSELUT, Antoine, et al. QA-GNN: Reasoning with language models and knowledge graphs for question answering. arXiv preprint arXiv:2104.06378, 2021.
[15] DAI, Damai, DONG, Li, HAO, Yaru, et al. Knowledge neurons in pretrained transformers. arXiv preprint arXiv:2104.08696, 2021.
[16] AKNOUCHE, Anis. Connecting Data Assets to Business Glossary Concepts using Metadata. 2024.