Brave introduces Leo AI, partners with NVIDIA for GPU boost

Fri, 4th Oct 2024

Artificial intelligence (AI) continues to make significant inroads into diverse applications, enhancing user experiences across fields such as gaming, productivity, and software development. This technological advancement extends to everyday tasks, including web browsing. The privacy-focused web browser Brave has recently introduced an AI assistant named Leo AI, designed to assist users by summarising content, answering queries, and providing search results.

The development and efficiency of AI-powered tools such as Leo AI rely on sophisticated software in tandem with powerful hardware. NVIDIA's Graphics Processing Units (GPUs) are crucial to this process. They feature Tensor Cores, which accelerate AI tasks through parallel processing, allowing applications like Leo AI to perform the necessary calculations more rapidly than traditional methods.

AI applications operate best when software and hardware are optimally aligned. This involves several layers, beginning with the AI inference libraries that translate tasks into machine-readable instructions. Brave's Leo AI utilises llama.cpp, a popular inference library, in conjunction with other technologies like NVIDIA TensorRT and Microsoft's DirectML.

On the software side, local inference servers play a significant role by managing the installation of specific AI models, thereby simplifying the integration process for applications. An example of this is Ollama, an open-source project that works atop llama.cpp, offering a seamless ecosystem for applications to tap into local AI capabilities. NVIDIA enhances software like Ollama, ensuring it performs efficiently with NVIDIA hardware, particularly for RTX-powered AI solutions.

Brave and Leo AI provide flexibility in terms of operational environments, as they can function both in the cloud and locally on a user's personal computer through Ollama. Local processing presents several advantages, foremost among them privacy. Users keep all data processing on their own devices, thereby safeguarding privacy and maintaining constant accessibility without relying on an external server. Additionally, local processing circumvents the need for cloud service fees.

Ollama further allows users to interact with a varied array of open-source AI models rather than the limited selection typical of many hosted services. These models may include specialized features such as bilingual capabilities or code generation. Through NVIDIA's RTX technology, local AI operations are notably fast, with the Llama 3 8B model in llama.cpp delivering responses at the pace of approximately 149 tokens or 110 words per second.

To facilitate the use of local models with Brave and Leo AI, users can install Ollama simply by downloading the installer, which runs in the background on their PC. The user then can download and configure a wide array of models from the command line. Brave users can toggle between using cloud-based models and those hosted locally at any time, allowing for versatile application depending on user preference and needs.

In addition, developers have access to resources through the NVIDIA developer blog, which provides further guidance on using Ollama and llama.cpp effectively. This initiative appears to be part of a broader trend of integrating faster and more responsive AI tools into everyday technological interactions.

Share on: