Multimodal learning is the ability of artificial intelligence (AI) models to use multiple types of input, such as images, audio, and text, to generate and retrieve information.



Source link