A New Way To Increase The Capabilities Of Large Language Models

Bonisiwe Shabane
-
a new way to increase the capabilities of large language models

Most languages use word position and sentence structure to extract meaning. For example, “The cat sat on the box,” is not the same as “The box was on the cat.” Over a long text, like a financial document or a novel, the syntax of these... Similarly, a person might be tracking variables in a piece of code or following instructions that have conditional actions. These are examples of state changes and sequential reasoning that we expect state-of-the-art artificial intelligence systems to excel at; however, the existing, cutting-edge attention mechanism within transformers — the primarily architecture used in large... An attention mechanism allows an LLM to look back at earlier parts of a query or document and, based on its training, determine which details and words matter most; however, this mechanism alone does... It “sees” all of the input words, a.k.a.

tokens, at the same time and handles them in the order that they’re presented, so researchers have developed techniques to encode position information. This is key for domains that are highly structured, like language. But the predominant position-encoding method, called rotary position encoding (RoPE), only takes into account the relative distance between tokens in a sequence and is independent of the input data. This means that, for example, words that are four positions apart, like “cat” and “box” in the example above, will all receive the same fixed mathematical rotation specific to that relative distance. Now research led by MIT and the MIT-IBM Watson AI Lab has produced an encoding technique known as “PaTH Attention” that makes positional information adaptive and context-aware rather than static, as with RoPE. “Transformers enable accurate and scalable modeling of many domains, but they have these limitations vis-a-vis state tracking, a class of phenomena that is thought to underlie important capabilities that we want in our AI...

So, the important question is: How can we maintain the scalability and efficiency of transformers, while enabling state tracking?” says the paper’s senior author Yoon Kim, an associate professor in the Department of Electrical... Nature Computational Science volume 5, pages 689–690 (2025)Cite this article This issue of Nature Computational Science features a Focus that highlights both the promises and perils of large language models, their emerging applications across diverse scientific domains, and the opportunities to overcome the challenges... Large language models (LLMs) are increasingly shaping the way we live and work. In everyday life, they assist with writing, translation, learning, communication, and so on — by making information more accessible and tools more efficient. LLMs also profoundly influence how knowledge is created and shared.

In scientific research, for example, LLMs are transforming how research is conducted — from literature synthesis and hypothesis generation to experimental design and scientific code development. Their impact spans a wide range of disciplines, including life sciences and medicine, chemistry and materials science, physics, engineering, urban and Earth sciences, psychology, linguistics, and the humanities. As these models continue to evolve, they are not only enhancing existing methods but also unlocking new possibilities for scientific exploration. In this issue, we present a Focus that brings together expert perspectives from various fields to explore the opportunities, risks, and challenges of advancing and applying LLMs in scientific research. Without a doubt, the transformer architecture has been central to the success of modern LLMs, powering models such as ChatGPT — whose website ranks among the most visited globally, as highlighted by Pedro Burgos... The transformer’s self-attention mechanism enables the capture of long-range dependencies and contextual relationships far more effectively than earlier architectures such as recurrent neural networks.

In a Perspective, Eva Portelance and Masoud Jasbi examine how non-symbolic generative artificial intelligence (AI) — particularly transformer-based LLMs — align with Chomsky’s generative linguistic principles, demonstrating noteworthy linguistic capabilities that are reshaping language... Similarly, in a Comment, Gabrielle O’Brien emphasizes how LLMs can assist with computer programming — the language humans use to interact with digital systems — and accelerate scientific workflows. The success of LLMs extends far beyond language tasks, as highlighted by several examples in this Focus issue. In cognitive science, Ilia Sucholutsky and colleagues explore, in a Comment, the potential of LLMs in advancing the study of collective cognition — cognitive phenomena that emerge from social interaction between multiple individuals. They identify five roles for LLMs in cognitive research: participant, analyst, environment, interviewer, and facilitator, enabling the study of complexities that challenge conventional methodologies. Yong Li and colleagues envision, in their Perspective, LLMs as intelligent assistants in urban planning, capable of synthesizing ideas, generating designs, and evaluating planning outcomes to address growing urban complexities.

In the humanities, Ted Underwood notes in a Comment that scholars are using LLMs to frame new research questions and even rethink how these models are trained, signaling a transformative dialogue between AI and... In chemistry and materials science, Gabe Gomes and collaborators, in their Perspective, emphasize opportunities for LLMs to support planning, optimization, data analysis, and automation, positioning them as active partners in chemical research. Collectively, these examples illustrate how LLMs are enhancing existing practices and opening new frontiers across various scientific domains. A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation.[1][2] The largest and most capable... LLMs can be fine-tuned for specific tasks or guided by prompt engineering.[3] These models acquire predictive power regarding syntax, semantics, and ontologies[4] inherent in human language corpora, but they also inherit inaccuracies and biases... They consist of billions to trillions of parameters and operate as general-purpose sequence models, generating, summarizing, translating, and reasoning over text.

LLMs represent a significant new technology in their ability to generalize across tasks with minimal task-specific supervision, enabling capabilities like conversational agents, code generation, knowledge retrieval, and automated reasoning that previously required bespoke systems.[6] LLMs evolved from earlier statistical and recurrent neural network approaches to language modeling. The transformer architecture, introduced in 2017, replaced recurrence with self-attention, allowing efficient parallelization, longer context handling, and scalable training on unprecedented data volumes.[7] This innovation enabled models like GPT, BERT, and their successors, which... Reinforcement learning, particularly policy gradient algorithms, has been adapted to fine-tune LLMs for desired behaviors beyond raw next-token prediction.[9] Reinforcement learning from human feedback (RLHF) applies these methods to optimize a policy, the LLM's... Benchmark evaluations for LLMs have evolved from narrow linguistic assessments toward comprehensive, multi-task evaluations measuring reasoning, factual accuracy, alignment, and safety.[11][12] Hill climbing, iteratively optimizing models against benchmarks, has emerged as a dominant strategy,... Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1152))

Included in the following conference series: Large Language models (LLMs) based on the Transformer architecture are designed to understand and generate human-like text by learning patterns and relationships from vast amounts of textual data. These models have been reported to have “sparks” of artificial general intelligence owing to their very attractive properties such as high generalization and performance in unseen data. These models have demonstrated strong capabilities in generalization and performance on novel data, leading some to speculate they may contain early sparks of artificial general intelligence. However, true AGI remains an unachieved goal requiring breakthroughs beyond current techniques. LLMs have poor mathematical reasoning capabilities, inherit biases present in training data, and can hallucinate and deliver false information.

In this chapter, we discuss several methods that improve the current state-of-the-art models through the use of external tools such as web browsing, better prompting techniques, scaling, using Reinforcement learning (RL) and tree-based search... This is a preview of subscription content, log in via an institution to check access. Tax calculation will be finalised at checkout The OpenAI models released last week are a game changer for large language models (LLMs), which have for the last year-and-a-half been rather stable in performance. Some had caught up with OpenAI’s ChatGPT-4, but nobody had significantly surpassed it. The frontier had not advanced far.

This had raised speculation, including my own, over whether the scaling of ever larger training datasets, ever larger numbers of parameters in the models, and ever longer training runs had reached some kind of... That limit might be related to operational issues surrounding compute cost and capacity that would be relaxed over time. Or maybe there was a more substantive limit of the underlying paradigm of the transformer model architecture. The conclusion was that something else, a novel innovation, could be needed for further gains. One such innovation appears now to have emerged. The GPT-4 o1 model is a change of paradigm in the sense that it is not simply the same thing scaled bigger, but rather that lit thinks (or computes) more at inference time (i.e.

at the time of use). It is shifting the distribution of computation from being almost entirely at training time, when the model is developed, towards increased computation at inference time, when the model is used. This model is not scaling more data, parameters, or training compute to increase its knowledge, but it is scaling more compute and data to train specifically for problem solving and in particular it is... It has in itself embedded steps to evaluate and improve its solutions to problems and shows much better performance than GPT-4 in solving problems that involve complex, multistep reasoning. It’s much more capable at generating detailed code from ambiguous descriptions and generating coherent, structured content over much longer texts, for instance. Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022.

LLMs’ ability of general-purpose language understanding and generation is acquired by training billions of model’s parameters on massive amounts of text data, as predicted by scaling laws [1, 2]. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

Language modeling is a long-standing research topic, dating back to the 1950s with Shannon’s application of information theory to human language, where he measured how well simple n-gram language models predict or compress natural... Since then, statistical language modeling became fundamental to many natural language understanding and generation tasks, ranging from speech recognition, machine translation, to information retrieval [4, 5, 6]. The recent advances on transformer-based large language models (LMs), pretrained on Web-scale text corpora, significantly extended the capabilities of language models (LLMs). For example, OpenAI’s ChatGPT and GPT-4 can be used not only for natural language processing, but also as general task solvers to power Microsoft’s Co-Pilot systems, for instance, can follow human instructions of complex... LLMs are thus becoming the basic building block for the development of general-purpose AI agents or artificial general intelligence (AGI). As the field of LLMs is moving fast, with new findings, models and techniques being published in a matter of months or weeks [7, 8, 9, 10, 11], AI researchers and practitioners often find...

This paper gives a timely survey of the recent advances on LLMs. We hope this survey will prove a valuable and accessible resource for students, researchers and developers. LLMs are large-scale, pre-trained, statistical language models based on neural networks. The recent success of LLMs is an accumulation of decades of research and development of language models, which can be categorized into four waves that have different starting points and velocity: statistical language models,... Karyn Allee, Elias Blinkoff, Kathy Hirsh-Pasek Han, S.; Wang, M.; Zhang, J.; Li, D.; Duan, J.

A Review of Large Language Models: Fundamental Architectures, Key Technological Evolutions, Interdisciplinary Technologies Integration, Optimization and Compression Techniques, Applications, and Challenges. Electronics 2024, 13, 5040. https://doi.org/10.3390/electronics13245040 Han S, Wang M, Zhang J, Li D, Duan J. A Review of Large Language Models: Fundamental Architectures, Key Technological Evolutions, Interdisciplinary Technologies Integration, Optimization and Compression Techniques, Applications, and Challenges. Electronics.

2024; 13(24):5040. https://doi.org/10.3390/electronics13245040 Han, Songyue, Mingyu Wang, Jialong Zhang, Dongdong Li, and Junhong Duan. 2024. "A Review of Large Language Models: Fundamental Architectures, Key Technological Evolutions, Interdisciplinary Technologies Integration, Optimization and Compression Techniques, Applications, and Challenges" Electronics 13, no. 24: 5040.

https://doi.org/10.3390/electronics13245040 Han, S., Wang, M., Zhang, J., Li, D., & Duan, J. (2024). A Review of Large Language Models: Fundamental Architectures, Key Technological Evolutions, Interdisciplinary Technologies Integration, Optimization and Compression Techniques, Applications, and Challenges. Electronics, 13(24), 5040. https://doi.org/10.3390/electronics13245040

People Also Search

Most Languages Use Word Position And Sentence Structure To Extract

Most languages use word position and sentence structure to extract meaning. For example, “The cat sat on the box,” is not the same as “The box was on the cat.” Over a long text, like a financial document or a novel, the syntax of these... Similarly, a person might be tracking variables in a piece of code or following instructions that have conditional actions. These are examples of state changes a...

Tokens, At The Same Time And Handles Them In The

tokens, at the same time and handles them in the order that they’re presented, so researchers have developed techniques to encode position information. This is key for domains that are highly structured, like language. But the predominant position-encoding method, called rotary position encoding (RoPE), only takes into account the relative distance between tokens in a sequence and is independent o...

So, The Important Question Is: How Can We Maintain The

So, the important question is: How can we maintain the scalability and efficiency of transformers, while enabling state tracking?” says the paper’s senior author Yoon Kim, an associate professor in the Department of Electrical... Nature Computational Science volume 5, pages 689–690 (2025)Cite this article This issue of Nature Computational Science features a Focus that highlights both the promises...

In Scientific Research, For Example, LLMs Are Transforming How Research

In scientific research, for example, LLMs are transforming how research is conducted — from literature synthesis and hypothesis generation to experimental design and scientific code development. Their impact spans a wide range of disciplines, including life sciences and medicine, chemistry and materials science, physics, engineering, urban and Earth sciences, psychology, linguistics, and the human...

In A Perspective, Eva Portelance And Masoud Jasbi Examine How

In a Perspective, Eva Portelance and Masoud Jasbi examine how non-symbolic generative artificial intelligence (AI) — particularly transformer-based LLMs — align with Chomsky’s generative linguistic principles, demonstrating noteworthy linguistic capabilities that are reshaping language... Similarly, in a Comment, Gabrielle O’Brien emphasizes how LLMs can assist with computer programming — the lang...