Generative language models, colloquially known as “AI”, have been making waves across many different sectors in a wide range of roles, ranging from customer support chatbots to programming assistants, and even to a new wave of web search tools.
The invasion of so many (large language model) LLM-based products has also sparked furious debate about their reliability, particularly in high-risk applications. In the medical field, potentially gigantic opportunity is balanced against life and death stakes. This new technology must be used responsibly in a way that can enhance the skills of the clinicians or improve the patient experience without exposing either to increased risk.
For high-risk systems, a good rule of thumb is that no generated text should be user-facing. This is important for several reasons. Firstly, LLMs are statistical text generators, which have no sense of statement accuracy or nuance. You cannot depend on an LLM to have factual output, and while you can massage the statistics towards the right direction, this is simply not enough for critical tasks that require more control and accuracy.
Second, LLMs are susceptible to prompt injection, a technique that involves tuning the input one provides to a model in order to modify its behaviour. This exploit methodology can bypass safety training, or even constraints placed on its actions.
Finally, these issues could expose the product owner to legal liability, as courts have been holding companies accountable for promises made by language model-based agents ostensibly acting on their behalf.
This begs the question: how does one use this new technology in a safe manner?
In high-risk applications, determinism is key. It’s not particularly difficult to achieve basic determinism with LLMs, however semantic determinism is substantially harder to guarantee.
DiagnosisBot9000 should respond the same way to similar inputs regardless of formatting or punctuation (or even verbiage), but if a spelling mistake, misplaced comma, or minor change to the input text can lead to a misdiagnosis, it will be hard to sell the advantages of an automated system.
Instead, one can use tabular data as input. The consistency will not only boost reliability, but also potentially allow for simpler (or more highly-quantized) models to achieve similar results to a much larger LLM operating on more vaguely worded inputs. This could result in significant savings in the long run, considering the upwards trend in GPU rental costs on cloud services.
One should also consider using other functionalities of language model technology, such as semantic classification and embeddings, which offer unique advantages as components in a toolchain.
Classifying entries in a large text corpus is a laborious task that can be massively simplified with LLMs. Instead of looking over every single entry, one can focus on edge cases that the automated classifier could not confidently label.
Embeddings provide a similar functionality, effectively distilling a sentence or paragraph into a vector, which can be compared against vectors created from other pieces of text to determine semantic similarity. Classification and embedding models are also generally much more lightweight than LLMs aimed at mimicking human writing.
Avoiding the use of user-facing generative language models in your high-risk application will not only spare potential branding and legal headaches, but will also save significant amounts of money to implement more efficient equivalents that still have the capability to operate in the smaller problem space.
There are many well-controlled niches that can benefit from the technologies that have been developed over the past couple years. Continued advances in the development of LLMs will further solidify their utility as effective tools for those able to wield them responsibly.
Thor Tronrud is a research and data analysis-focused software engineer at StarFish Medical who specializes in the development and application of machine learning tools. Previously an astrophysicist working with magnetohydrodynamic simulations, Thor joined StarFish in 2021 and has applied machine learning techniques to problems including image segmentation, signal analysis, and language processing.