Quantcast
Channel: Anjali Shah – NVIDIA Technical Blog
Browsing all 18 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Mastering LLM Techniques: Customization

Large language models (LLMs) are becoming an integral tool for businesses to improve their operations, customer interactions, and decision-making processes. However, off-the-shelf LLMs often fall short...

View Article


Image may be NSFW.
Clik here to view.

Mastering LLM Techniques: Training 

Large language models (LLMs) are a class of generative AI models built using transformer networks that can recognize, summarize, translate, predict, and generate language using very large datasets....

View Article


Image may be NSFW.
Clik here to view.

NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma 

NVIDIA is collaborating as a launch partner with Google in delivering Gemma, a newly optimized family of open models built from the same research and technology used to create the Gemini models. An...

View Article

Image may be NSFW.
Clik here to view.

Generate Stunning Images with Stable Diffusion XL on the NVIDIA AI Inference...

Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by iteratively shaping random noise into AI-generated...

View Article

Image may be NSFW.
Clik here to view.

Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA...

We’re excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You can immediately try Llama 3 8B and Llama...

View Article


Image may be NSFW.
Clik here to view.

Supercharging Llama 3.1 across NVIDIA Platforms

Meta’s Llama collection of large language models are the most popular foundation models in the open-source community today, supporting a variety of use cases. Millions of developers worldwide are...

View Article

Image may be NSFW.
Clik here to view.

Revolutionizing Code Completion with Codestral Mamba, the Next-Gen Coding LLM

In the rapidly evolving field of generative AI, coding models have become indispensable tools for developers, enhancing productivity and precision in software development. They provide significant...

View Article

Image may be NSFW.
Clik here to view.

Power Text-Generation Applications with Mistral NeMo 12B Running on a Single GPU

NVIDIA collaborated with Mistral to co-build the next-generation language model that achieves leading performance across benchmarks in its class. With a growing number of language models purpose-built...

View Article


Image may be NSFW.
Clik here to view.

Jamba 1.5 LLMs Leverage Hybrid Architecture to Deliver Superior Reasoning and...

AI21 Labs has unveiled their latest and most advanced Jamba 1.5 model family, a cutting-edge collection of large language models (LLMs) designed to excel in a wide array of generative AI tasks. These...

View Article


Image may be NSFW.
Clik here to view.

Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model...

The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a variety of use cases. With 405 billion...

View Article

Image may be NSFW.
Clik here to view.

Deploying Accelerated Llama 3.2 from the Edge to the Cloud

Expanding the open-source Meta Llama collection of models, the Llama 3.2 collection includes vision language models (VLMs), small language models (SLMs), and an updated Llama Guard model with support...

View Article

Image may be NSFW.
Clik here to view.

Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are multimodal, supporting both text and image inputs....

View Article

Image may be NSFW.
Clik here to view.

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that provides blazing-fast inference support for...

View Article


Image may be NSFW.
Clik here to view.

NVIDIA TensorRT-LLM Now Accelerates Encoder-Decoder Models with In-Flight...

NVIDIA recently announced that NVIDIA TensorRT-LLM now accelerates encoder-decoder model architectures. TensorRT-LLM is an open-source library that optimizes inference for diverse model architectures,...

View Article

Image may be NSFW.
Clik here to view.

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM...

Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance...

View Article


Image may be NSFW.
Clik here to view.

Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the previous tokens are used as historical context...

View Article

Image may be NSFW.
Clik here to view.

Optimizing Qwen2.5-Coder Throughput with NVIDIA TensorRT-LLM Lookahead Decoding

Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents, these models assist developers with...

View Article


Image may be NSFW.
Clik here to view.

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a...

View Article
Browsing all 18 articles
Browse latest View live


Latest Images