Different concepts of AI explained using AI

Drafted with AI; researched, edited, and fact-checked by me — how I write.

If you're curious about artificial intelligence and have experimented with tools like ChatGPT, this article is for you. It offers a concise introduction to key AI concepts.

A map of modern AI concepts grouped into three columns: Generative (Large Language Models, Stable Diffusion, RETRO), Understanding (Computer Vision, Semantic Search, Document Understanding), and Deployment & Interaction (Edge AI, Binarised Neural Networks, Conversational AI)

LLMs

Large language models (LLMs) are a type of artificial intelligence (AI) that can generate and understand text. They are trained on massive datasets of text and code, which allows them to learn the nuances of human language and perform a variety of tasks, including:

Generating text, such as poems, code, scripts, lyrics, chord progressions, musical ideas, emails, letters, etc.
Translating languages
Answering questions in a comprehensive and informative way
Summarising text
Writing different kinds of creative content

LLMs are continually evolving, but they have the potential to revolutionise the way we interact with computers. For example, LLMs could be used to develop new types of chatbots that can have more natural and engaging conversations with humans.

Here is a concise description of LLMs in one sentence:

LLMs are AI models that can generate and understand human language by learning from massive datasets.

Stable Diffusion

Stable Diffusion is a latent diffusion model that can generate high-quality images, sometimes photorealistic ones, often from text descriptions. It is a relatively new technology, but it has quickly become one of the most popular text-to-image models due to its ability to produce high-quality images in a variety of styles.

During training, diffusion models learn to reverse a process that adds noise to images. When generating, Stable Diffusion starts from random noise in a compressed latent space and gradually denoises it, guided by the text prompt, until an image emerges.

Stable Diffusion can be used to generate images for a variety of purposes, including:

Creating concept art
Designing products
Generating illustrations
Creating realistic images of people, places, and things that do not exist in the real world

Stable Diffusion is still under development, but it has the potential to revolutionize the way we create and consume visual content.

Here is a concise description of stable diffusion in one sentence:

Stable Diffusion is a latent diffusion model that can generate high-quality, sometimes photorealistic, images from text descriptions.

Edge AI

Edge AI is the deployment of AI applications on devices throughout the physical world. It is called "edge AI" because the AI computation is done near the user at the edge of the network, close to where the data is located, rather than centrally in a cloud computing facility or private data center.

Edge AI offers a number of benefits, including:

Reduced latency: Edge AI applications can process data in real time, without the need to send it to the cloud and back. This is important for applications where fast response times are critical, such as self-driving cars and industrial automation systems.
Improved privacy and security: Edge AI applications can process data on the device, without the need to send it to a central server. This can help to protect users' privacy and sensitive data.
Reduced bandwidth usage: Edge AI applications can reduce the amount of data that needs to be sent to the cloud. This can save money on bandwidth costs and improve network performance.

Edge AI is still a relatively new technology, but it is rapidly gaining adoption in a wide range of industries, including manufacturing, healthcare, retail, and transportation.

Here is a concise description of edge AI in one sentence:

Edge AI refers to the processing of AI algorithms on local devices near the data source, rather than in a centralised cloud-based system.

BNNs

Binarised Neural Networks (BNNs) are a type of neural network where the weights and activations are constrained to be binary, i.e., either 1 or -1. This makes BNNs much more efficient than traditional neural networks, as binary arithmetic is much faster and simpler to implement.

BNNs have been shown to achieve comparable accuracy to traditional neural networks in some settings, especially on simpler or carefully tuned tasks. However, BNNs are typically less accurate than traditional neural networks on complex tasks; a tradeoff for the efficiency gains.

Despite their limitations, BNNs have a number of advantages over traditional neural networks, including:

Efficiency: BNNs are much more efficient than traditional neural networks, as binary arithmetic is much faster and simpler to implement.
Resilience to noise: In some contexts, certain architectures or applications demonstrate that BNNs are more resilient to noise than traditional neural networks, as the binary weights and activations are less sensitive to small changes in the input data.

BNNs are still under development, but they have the potential to revolutionise the way we deploy AI applications. For example, BNNs could be used to develop new types of mobile and embedded AI applications that are more efficient and resilient to noise.

Here is a concise description of BNNs in one sentence:

BNNs are a type of neural network where the weights and activations are constrained to be binary, making them more efficient in general and resilient to noise in some contexts.

Computer Vision

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information.

Computer vision tasks include object detection, tracking, classification, and segmentation. It can also be used to estimate 3D structure, recognize gestures, and interpret facial expressions.

Computer vision is used in a wide range of applications, including:

Self-driving cars: Computer vision is used to detect and track other vehicles, pedestrians, and traffic signs.
Security and surveillance: Computer vision is used to detect and identify intruders, and to monitor crowds for suspicious activity.
Medical imaging: Computer vision is used to help with diagnosis, surgical planning, and surgical guidance.
Robotics: Computer vision is used to help robots navigate their environment and interact with objects.
Consumer electronics: Computer vision is used in features such as facial recognition, augmented reality, and image search.

Computer vision is a rapidly evolving field, and new applications are being developed all the time. It is a powerful technology that has the potential to revolutionize the way we interact with the world around us.

Here is a concise description of computer vision in one sentence:

Computer vision is an AI field that enables computers to understand the visual world.

Semantic Search

Semantic search is a type of search engine technology that attempts to understand the meaning of search queries and the relationships between words and concepts. This allows semantic search engines to return more relevant results, even for ambiguous or complex queries.

Semantic search engines use a variety of techniques to understand the meaning of queries, including:

Natural language processing (NLP): NLP is a field of AI that deals with the interaction between computers and human language. In the context of semantic search NLP can discern the intent behind search queries.
Knowledge graphs: Knowledge graphs are databases of entities and the relationships between them. Knowledge graphs can be used to understand the context of a query and the relationships between the entities mentioned in the query to help disambiguate terms with multiple meanings.

Semantic search continues to evolve and versions of it are already in use. It is becoming increasingly important as search engines strive to return more relevant results to users.

Here is a concise description of semantic search in one sentence:

Semantic search is a type of search engine technology that attempts to understand the meaning of search queries and the relationships between words and concepts.

Here is an example of how semantic search can be used to improve the accuracy of search results:

If you search for "what is the largest mammal?", a traditional search engine might return results about the blue whale, the elephant, and the rhinoceros. However, a semantic search engine would be more likely to understand that you are asking about the largest mammal in the world, and would therefore prioritize results about the blue whale.

RETRO

Retrieval-Enhanced Transformers (RETRO) are a type of large language model (LLM) that combine the power of transformers with the ability to retrieve information from large external databases. This allows RETRO models to generate more informative and comprehensive responses to a wider range of prompts and questions.

RETRO models work by first retrieving the most relevant passages of text from the database, given the input prompt or question. These passages are then used to condition a transformer decoder, which generates the output response.

RETRO models have been shown to achieve more accurate and factual results than models without retrieval capabilities, especially when the required knowledge is not encoded in the model's weights but is available in the retrieval database.

Here is a concise description of RETRO in one sentence:

RETRO models are large language models that combine the power of transformers with the ability to retrieve information from large external databases.

Conversational AI

Conversational AI is a type of artificial intelligence (AI) that enables computers to understand and respond to human language in a natural way. It is used in a variety of applications, including chatbots, virtual assistants, and voice assistants.

Conversational AI systems use a variety of techniques to understand human language, including:

Natural language processing (NLP): NLP is a field of AI that deals with the interaction between computers and human language. NLP techniques can be used to identify the parts of speech in a sentence, the relationships between words, and the overall meaning of the sentence.
Machine learning (ML): ML is a type of AI that allows computers to learn from data without being explicitly programmed. ML techniques can be used to train conversational AI systems to understand and respond to a wide range of different prompts and questions.

Conversational AI is a rapidly evolving field, and new applications are being developed all the time. It has the potential to revolutionise the way we interact with computers, making them more accessible and user-friendly.

Here is a concise description of conversational AI in one sentence:

Conversational AI is a type of AI that enables computers to understand and respond to human language in a natural way.

Document Understanding

Document understanding in the context of AI is the ability of a computer to extract and understand information from documents, such as invoices, contracts, and medical records. This is a challenging task because documents can be in a variety of formats, with different structures and layouts. Additionally, the information contained in documents can be complex and may require knowledge of specific domains, such as law or medicine.

AI-powered document understanding systems use a variety of techniques to extract and understand information from documents, including:

Computer vision: Computer vision techniques can be used to identify and extract objects, such as text, tables, and images, from documents.
Natural language processing (NLP): NLP techniques can be used to understand the meaning of the text in a document and extract key information, such as dates, names, and amounts.
Machine learning (ML): ML techniques can be used to train document understanding systems to identify and extract information from documents, often with good accuracy.

AI-powered document understanding systems are being used in a variety of industries to automate manual document processing tasks. This can save time and money, and improve the accuracy and efficiency of business processes.

Here is a concise description of document understanding in the context of AI in one sentence:

Document understanding systems use computer vision, NLP, and ML to extract and interpret information from documents, often with good accuracy.

¹ Answers generated using Google Bard, now Gemini, and cross checked with ChatGPT-4.0, with a reading and final factual accuracy pass by a human (me) to verify correctness