Latest Publications & Patents on Small Language Models (SLMs)

This week: holography, optical systems, spatial light modulator, computer-generated hologram, self-speculative decoding, generative AI, forecast embedding, bias parameter, aluminum alloy, selective laser melting, intermetallic lamellae, high-strength, Quantization, LoRA, fine-tuning, LLM, marine machine equipment, operational advice, Small Language Model, real-time data, speculative decoding, LLM inference, token sequence selection, text data statistics, mechanism synthesis, deep learning, contrastive graph learning, optimization stability, Fine-Tuning, Discrete Wavelet Transform, Low-Rank Adaptation, Automatic Speech Recognition

July 12, 2026

Artificial Intelligence (AI), Deep Learning, Edge AI Inference, Embedded Systems, Generative Pre-trained Transformer (GPT), Machine Learning, Natural Language Processing (NLP), Neural Network, System on a Chip (SoC)

Tip: further to the selection below, you can search and filter our 2 entire databases:

> free publications search tool < by author, topic, keywords, date or journal.

> free patents search tool < for patents in English from the European Patent Office.

Small language models designate transformer-based natural language processing systems operating below approximately 7 billion parameters — a threshold defined less by a formal boundary than by the practical constraint of deployability on consumer hardware, mobile devices, and embedded systems without cloud inference infrastructure.

The domain emerged as a direct response to the computational and economic costs of frontier-scale models: while billion-parameter-plus architectures demonstrate broad general capability, their memory footprint, inference latency, and energy consumption make them structurally incompatible with on-device deployment, privacy-sensitive applications, and low-bandwidth or offline operational contexts.

The central research program is closing the capability gap between compact and frontier models through a combination of knowledge distillation — training a smaller student model against the output distributions of a larger teacher — structured and unstructured pruning, aggressive weight quantization down to INT4 and INT8 representations, and parameter-efficient fine-tuning methods such as LoRA and QLoRA that adapt a compressed base model to domain-specific tasks at minimal additional compute cost.

The publications and patents indexed below address model compression techniques, quantization algorithms, distillation protocols, efficient transformer architectures, on-device inference optimization, and domain-specific fine-tuning pipelines:

This is our latest selection of worldwide publications and patents in english on Small Language Models (SLMs), between many scientific online journals, classified and focused on small language model, SLM, on-device language model, edge language model, compact transformer, sub-7B parameter model, language model compression, knowledge distillation NLP, structured pruning language model, unstructured pruning language model, weight quantization language model, INT4 quantization NLP, INT8 quantization NLP, parameter-efficient fine-tuning, LoRA fine-tuning, QLoRA fine-tuning, adapter tuning language model, on-device inference, edge inference NLP, speculative decoding, model distillation transformer, GGUF quantization format and mixture-of-experts compact model.

Optical device for generating holographic images

Patent published on the 2026-06-18 in WO under Ref WO2026127190 by EPIC OPTIX CO LTD [KR] (Kim Dong Ha [kr], Son Byoung Soo [kr], Kwon Jae Young [kr], Seo Gye Won [kr])

Abstract: An optical device for generating holographic images according to the present invention comprises: a first optical system including a laser light source for emitting parallel light, and a reflective spatial light modulator (SLM) for reflecting light generated by the laser light source and modulating same by means of a computer-generated hologram (CGH); a second optical system onto which light reflected by the spatial light modulator is incident, which has positive power, and which includes a non-[...]

Our summary: The device generates holographic images using a laser light source and a spatial light modulator. It includes multiple optical systems to filter and manipulate light. A variable-position virtual image is created by the SLM, with an intermediate holographic image formed between specific optical elements.

holography, optical systems, spatial light modulator, computer-generated hologram

Patent

Injected self-speculative decoding in generative artificial intelligence models

Patent published on the 2026-06-18 in WO under Ref WO2026128124 by QUALCOMM INCORPORATED [US] (Goel Raghavv [us], Lee Mingu [us], Gagrani Mukul [us], Jeon Wonseok [us], Lott Christopher [us], Park Junyoung [us])

Abstract: Techniques and apparatus for generating a response to an input prompt using efficient self-speculative decoding in a generative artificial intelligence model. An example method generally includes receiving an input prompt for processing. A forecast embedding representing one or more forecasted tokens responsive to the input prompt is generated. Generally, the one or more forecasted tokens include tokens speculatively decoded by a generative artificial intelligence model based on generation of an[...]

Our summary: Injected self-speculative decoding enhances generative AI models. The method generates forecast embeddings based on input prompts. Responses are produced using forecasted tokens and bias parameters to improve accuracy.

self-speculative decoding, generative AI, forecast embedding, bias parameter

Patent

Deformable high-strength aluminum alloy compositions and methods of making the same

Patent published on the 2026-06-04 in US under Ref US20260152827 by PURDUE RES FOUNDATION [US] (Zhang Xinghang [us], Wang Haiyan [us], Stegman Benjamin Thomas [us], Shang Anyu [us])

Abstract: [0000] An alloy comprising 92 at % aluminum, 2 at % titanium, 2 at % iron, 2 at % cobalt, and 2 at % nickel. A method of making an alloy is disclosed. The method contains the steps of providing particles of desired composition, utilizing a selective leaser melting (SLM) apparatus producing a first layer of the particles on a substrate and melting and solidifying a first group selected areas of the layer of particles, wherein the melting and the solidification results in an alloy of desired compo[...]

Our summary: The content describes a high-strength aluminum alloy with specific composition percentages. It outlines a method for creating the alloy using selective laser melting to achieve desired thickness and shape. The process involves layering particles, melting, and solidifying selected areas to form intermetallic structures.

aluminum alloy, selective laser melting, intermetallic lamellae, high-strength

Patent

Quantization-aware lora fine-tuning for llm

Patent published on the 2026-06-04 in US under Ref US20260154540 by MEDIATEK SINGAPORE PTE LTD [SG] (Lim Jia Yao Christopher [sg], Huang Ya-lin [tw], Li Huai-ting [tw], Wong Wai Mun [sg], Liang Jen-wei [tw], Lee Timothy Jun Jie [sg])

Abstract: [0000] In an aspect of the disclosure, a method of using a LoRA for inference with a FC layer of a LLM is provided. The method includes: dequantizing an INT input to an FP output; processing the FP output from the DQ and a first FP input from first weights of a down projection module of the LoRA, to output a first FP output; processing the first FP output from the first BMM and a second FP input from second weights of an up projection module of the LoRA, to output a second FP output; quantizing [...]

Our summary: The method describes using LoRA for inference in a fully connected layer of a large language model. It involves dequantizing inputs, processing them through down and up projection modules, and quantizing outputs. The final output is an INT inference result derived from the LoRA adjustments.

Quantization, LoRA, fine-tuning, LLM

Patent

Systems and methods for assisting operation and maintenance of marine machine equipment

Patent published on the 2026-06-03 in EP under Ref EP4752805 by ALFA LAVAL CORP AB [SE] (Karlsson Jimmie [se], Boman Jesper [se])

Abstract: [0001] The present invention relates to a method of operating and maintaining a piece of marine machine equipment. The piece of marine machine equipment is connected to a local processor. The method comprising the steps of obtaining a set of training data specific to the piece of marine machine equipment and training a Small Language Model (SLM) with the set of training data specific to the piece of marine machine equipment. The method further comprising the step of executing the trained SLM on [...]

Our summary: The invention describes a method for operating and maintaining marine machine equipment using a local processor. It involves training a Small Language Model (SLM) with specific training data for the equipment. The trained SLM provides offline operational advice utilizing real-time data from the equipment.

marine machine equipment, operational advice, Small Language Model, real-time data

Patent

Parameter-free method for efficient and accurate llm inference acceleration via speculative decoding

Patent published on the 2026-05-07 in WO under Ref WO2026092843 by MARZOLLO MICHELE [DE] (Marzollo Michele [de], Mueller Lorenz [de], Zhuang Jiawei [de], Roemer Niklas [de], Cavigelli Lukas [de])

Abstract: In some examples, apparatus and methods are provided for selecting a draft token sequence for verification by using a large language model, LLM. Different sources of statistics on text data (prompt, generated output, large dataset of text data) can be utilized in order to choose candidates to use for speculative decoding via look-ups.[...]

Our summary: This method accelerates LLM inference without parameters by using speculative decoding. It selects draft token sequences for verification through statistical analysis of text data. The approach utilizes various sources of statistics to optimize candidate selection for decoding.

speculative decoding, LLM inference, token sequence selection, text data statistics

Patent

Automated synthesis of planar linkage mechanisms with diverse joint types via spring-connected link models and contrastive graph learning

Published on 2026-03-28 by @OXFORD

Abstract: AbstractThe automated synthesis of planar linkage mechanisms has long been a challenge in mechanism design, requiring both geometric feasibility and motion accuracy. Recent advances in data-driven and neural network–based methods have shown promise in automating linkage synthesis, improving efficiency and scalability compared to traditional analytical or optimization-based techniques. Nevertheless, existing data-driven approaches remain limited in handling diverse joint configurations and ofte[...]

Our summary: This study presents a framework for automating the synthesis of planar linkage mechanisms using deep learning and physics-based modeling. It employs a spring-connected link model for diverse joint configurations and utilizes contrastive graph learning for efficient linkage retrieval. The method demonstrates improved accuracy and optimization stability compared to traditional approaches.

mechanism synthesis, deep learning, contrastive graph learning, optimization stability

Publication

Enhancing Whisper Fine-Tuning with Discrete Wavelet Transform-Based LoRA Initialization

Published on 2026-01-29 by Liang Lan, Molin Fang, Yuxuan Chen, Daliang Wang, Wenyong Wang @MDPI

Abstract: In low-resource automatic speech recognition (ASR) scenarios, parameter-efficient fine-tuning (PEFT) has become a crucial approach for adapting large pre-trained speech models. Although low-rank adaptation (LoRA) offers clear advantages in efficiency, stability, and deployment friendliness, its performance remains constrained because random initialization fails to capture the time&ndash;frequency structural characteristics of speech signals. To address this limitation, this work proposes[...]

Our summary: This work introduces a structured initialization mechanism combining LoRA with discrete wavelet transform for fine-tuning in low-resource ASR. The proposed DWTLoRA method enhances convergence speed, stability, and accuracy by aligning with speech signal characteristics. Experimental results show DWTLoRA outperforms standard LoRA and other PEFT methods in character error rate and training efficiency.

Fine-Tuning, Discrete Wavelet Transform, Low-Rank Adaptation, Automatic Speech Recognition

Publication

Topics covered: Small Language Models, Natural Language Processing, Transformer-based Systems, Parameter Efficiency, Knowledge Distillation, Model Compression, Structured Pruning, Unstructured Pruning, Weight Quantization, INT4, INT8, Fine-tuning Methods, On-device Deployment, Inference Latency, Energy Consumption, Privacy-sensitive Applications, Low-bandwidth Operations, Offline Operational Contexts, IEEE 80211, ISO/IEC 30170, ISO/IEC 27001, ISO/IEC 25010, and NIST SP 800-53..

Glossary of Terms Used

Natural Language Processing (NLP): a field of artificial intelligence focused on the interaction between computers and human language, enabling machines to understand, interpret, and generate natural language text or speech. It encompasses tasks such as language translation, sentiment analysis, and speech recognition.

Small Language Models (SLM): compact neural networks designed for natural language processing tasks, typically characterized by fewer parameters and reduced computational requirements compared to larger models, while still capable of generating coherent text and understanding context within limited scopes.

Historical Context

Mode-locking (lasers)

Mode-locking is a technique for producing extremely short laser pulses, on the order of picoseconds (\(10^{-12}\) s) to femtoseconds (\(10^{-15}\) s). It works by forcing the many longitudinal modes of the laser cavity to oscillate with a fixed phase relationship. This causes the modes to interfere constructively, creating a single, intense, ultrashort pulse circulating in the cavity.

Top-Down Nanomaterial Synthesis

Top-down synthesis involves creating nanomaterials by starting with a larger, bulk material and breaking it down or patterning it to the nanoscale. Key techniques include mechanical methods like ball milling and lithographic methods like photolithography, electron-beam lithography, and nanoimprint lithography. These methods are often used for creating structured surfaces and integrated circuits, but can suffer from surface imperfections.

Flywheel energy storage system in industrial mechanics application.

Flywheel Energy Storage (FES)

Flywheel energy storage (FES) works by accelerating a rotor (flywheel) to a very high speed and maintaining the energy in the system as rotational kinetic energy. The energy stored is proportional to the square of the rotational speed. When energy is extracted, the flywheel's rotation slows down. The formula for stored energy is \(E = \frac{1}{2} I \omega^2\), where I is the moment of inertia and ω is the angular velocity.

Molecular Electronics

Molecular electronics explores using individual molecules or nanoscale molecular collections as fundamental electronic components. This approach aims to build circuits at the ultimate limit of miniaturization, far beyond traditional silicon-based technology. Key components include molecular wires, switches, and rectifiers, leveraging quantum mechanical properties like electron tunneling through molecular orbitals for their function.

Engineers analyzing microelectronic components for thermal fatigue and electromigration.

Physics of Failure (PoF)

Physics of Failure (PoF) is a reliability engineering approach that uses knowledge of materials science and physics to understand and model the root-cause mechanisms of failure. Instead of relying purely on statistical data from past failures, it focuses on predicting failure by analyzing the physical processes (e.g., fatigue, corrosion, creep) that lead to degradation and breakdown.

Laboratory analysis of quantum dots demonstrating quantum size effect in semiconductor physics.

Quantum Size Effect in Nanomaterials

The Quantum Size Effect describes the phenomenon where the electronic and optical properties of a material change as its size approaches the nanoscale. When the dimensions of a material become comparable to the electron's de Broglie wavelength, quantum confinement occurs. This quantizes the electron energy levels, leading to a size-dependent band gap, \(E_g(R) \approx E_{g,\b\u\lk} + \frac{\hbar^2\pi^2}{2R^2}(\frac{1}{m_e^*} + \frac{1}{m_h^*})\).

Vapor Pressure Enhancement Factor

The equilibrium vapor pressure of water over a liquid surface in moist air (\(p^*_{H_2O,a}\)) is slightly greater than the equilibrium vapor pressure over a pure water surface (\(p^*_{H_2O}\)). This difference is quantified by the water vapor enhancement factor, \(f_w\), which depends on temperature and the pressure of the moist air. The relationship is \(p^*_{H_2O,a} = f_w(T, p_{ms}) \cdot p^*_{H_2O}\).

1965

1970

1974-11-15

1980

1964

1968

1970

1975

1980

Laboratory analysis of europium-doped yttrium vanadate phosphors for color television applications.

Europium Phosphors for Color Television

The discovery that europium-doped yttrium vanadate (\(YVO_4:Eu^{3+}\)) could act as a brilliant red phosphor was a critical breakthrough for color television. Before this, red phosphors were weak, resulting in dull colors. The intense, narrow-band red emission from the \(Eu^{3+}\) ion allowed for bright, vibrant color displays, dramatically improving the quality of color TV and setting the standard for display technology.

Bézier Curves

Developed by French engineer Pierre Bézier for Renault in the 1960s, UNISURF was one of the first true 3D CAD/CAM systems. Its core innovation was the use of what are now known as Bézier curves and surfaces. These are parametric curves defined by a set of control points, allowing for the intuitive and mathematical creation of complex freeform shapes for car bodies.

GPS receiver displaying satellite signals and distance measurements in radio-wave physics.

GPS Trilateration Principle

The GPS determines a receiver's position using trilateration. By measuring the distance to at least three satellites, the receiver can pinpoint its location on Earth's surface. The distance is calculated by multiplying the signal's travel time by the speed of light. A fourth satellite is required to synchronize the receiver's clock, resolving for the four unknowns: latitude, longitude, altitude, and time.

Superconducting Magnetic Energy Storage system in a laboratory for solid state physics applications.

Superconducting Magnetic Energy Storage (SMES)

Superconducting Magnetic Energy Storage (SMES) systems store energy in the magnetic field created by the flow of direct current in a superconducting coil. The energy can be stored indefinitely as long as the coil is kept at superconducting temperatures, as there is virtually no energy loss due to electrical resistance. The stored energy is given by \(E = \frac{1}{2} L I^2\).

Laboratory technician measuring whiteness index of textiles using spectrophotometer in colorimetry.

Ganz-Griesser Whiteness Index

The Ganz-Griesser whiteness index is a linear formula widely used, particularly in the textile industry. It is derived from CIE tristimulus values and is defined as \(W_{GG} = Y - Px - Qy + C\), where P, Q, and C are constants specific to the illuminant and observer. For the D65/10° condition, the formula is \(W_{GG} = Y - 1868.322x - 3695.690y + 1809.441\).

Lithium-ion battery disassembly process in electrochemistry laboratory.

Lithium-ion Intercalation Mechanism

Lithium-ion batteries function via an intercalation mechanism, a reversible insertion of ions into a layered host material. During discharge, lithium ions (\(Li^+\)) de-intercalate from a negative electrode (anode), typically graphite, and move through a non-aqueous electrolyte to intercalate into a positive electrode (cathode), typically a metal oxide. Electrons travel through the external circuit, creating current.

Battery management system interface showing Depth of Discharge metrics for electric vehicles.

Depth of Discharge (DoD)

Depth of Discharge (DoD) indicates the percentage of a battery's capacity that has been discharged. It is the inverse of State of Charge (SoC), where 100% DoD means the battery is empty. A battery's cycle life is highly dependent on its average DoD; lower DoD cycles (e.g., discharging to only 80% capacity) significantly increase the number of cycles a battery can endure.

Engineers assembling microelectromechanical systems in a cleanroom environment.

MEMS Scaling Laws

MEMS scaling laws describe how physical forces and properties change as device dimensions shrink to the microscale. Unlike the macroscopic world dominated by gravity and inertia, micro-domains are governed by surface forces like surface tension, viscosity, and electrostatic forces. For example, force due to gravity scales with volume (\(L^3\)), while electrostatic force scales with area (\(L^2\)), becoming relatively stronger at smaller sizes.

(if date is unknown or not relevant, e.g. "fluid mechanics", a rounded estimation of its notable emergence is provided)