Anjali Shah – NVIDIA Technical Blog

Channel: Anjali Shah – NVIDIA Technical Blog

↧

Image may be NSFW.
Clik here to view.

Mastering LLM Techniques: Customization

December 8, 2023, 10:54 am

Large language models (LLMs) are becoming an integral tool for businesses to improve their operations, customer interactions, and decision-making processes. However, off-the-shelf LLMs often fall short...

View Article

Image may be NSFW.
Clik here to view.

Mastering LLM Techniques: Training

January 22, 2024, 2:05 pm

Large language models (LLMs) are a class of generative AI models built using transformer networks that can recognize, summarize, translate, predict, and generate language using very large datasets....

View Article

Image may be NSFW.
Clik here to view.

NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma

February 22, 2024, 12:34 pm

NVIDIA is collaborating as a launch partner with Google in delivering Gemma, a newly optimized family of open models built from the same research and technology used to create the Gemini models. An...

View Article

Image may be NSFW.
Clik here to view.

Generate Stunning Images with Stable Diffusion XL on the NVIDIA AI Inference...

April 9, 2024, 4:45 pm

Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by iteratively shaping random noise into AI-generated...

View Article

Image may be NSFW.
Clik here to view.

Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA...

November 14, 2024, 7:54 am

We’re excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You can immediately try Llama 3 8B and Llama...

View Article

Image may be NSFW.
Clik here to view.

Supercharging Llama 3.1 across NVIDIA Platforms

November 20, 2024, 12:01 pm

Meta’s Llama collection of large language models are the most popular foundation models in the open-source community today, supporting a variety of use cases. Millions of developers worldwide are...

View Article

Image may be NSFW.
Clik here to view.

Revolutionizing Code Completion with Codestral Mamba, the Next-Gen Coding LLM

August 8, 2024, 11:48 am

In the rapidly evolving field of generative AI, coding models have become indispensable tools for developers, enhancing productivity and precision in software development. They provide significant...

View Article

Image may be NSFW.
Clik here to view.

Power Text-Generation Applications with Mistral NeMo 12B Running on a Single GPU

August 28, 2024, 8:32 am

NVIDIA collaborated with Mistral to co-build the next-generation language model that achieves leading performance across benchmarks in its class. With a growing number of language models purpose-built...

View Article

Image may be NSFW.
Clik here to view.

Jamba 1.5 LLMs Leverage Hybrid Architecture to Deliver Superior Reasoning and...

September 5, 2024, 10:57 am

AI21 Labs has unveiled their latest and most advanced Jamba 1.5 model family, a cutting-edge collection of large language models (LLMs) designed to excel in a wide array of generative AI tasks. These...

View Article

Image may be NSFW.
Clik here to view.

Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model...

November 14, 2024, 7:58 am

The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a variety of use cases. With 405 billion...

View Article

Image may be NSFW.
Clik here to view.

Deploying Accelerated Llama 3.2 from the Edge to the Cloud

November 6, 2024, 9:08 pm

Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an updated Llama Guard model with support...

View Article

Image may be NSFW.
Clik here to view.

Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs

November 22, 2024, 3:11 pm

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are multimodal, supporting both text and image inputs....

View Article

Image may be NSFW.
Clik here to view.

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

December 12, 2024, 11:46 am

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that provides blazing-fast inference support for...

View Article

Image may be NSFW.
Clik here to view.

NVIDIA TensorRT-LLM Now Accelerates Encoder-Decoder Models with In-Flight...

December 12, 2024, 11:35 am

NVIDIA recently announced that NVIDIA TensorRT-LLM now accelerates encoder-decoder model architectures. TensorRT-LLM is an open-source library that optimizes inference for diverse model architectures,...

View Article

Image may be NSFW.
Clik here to view.

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM...

December 19, 2024, 3:03 pm

Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance...

View Article

Image may be NSFW.
Clik here to view.

Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM

January 23, 2025, 11:54 am

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the previous tokens are used as historical context...

View Article

Image may be NSFW.
Clik here to view.

Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding

February 20, 2025, 7:52 am

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents, these models assist developers with...

View Article

Image may be NSFW.
Clik here to view.

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

March 24, 2025, 11:58 am

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a...

View Article

Latest Images