Embracing the Hybrid AI Future: Building Today, Optimizing for Tomorrow

Thoughts

Dec 11

Genuix Dimensions implements the Genuix hybrid-first architecture that combines cloud and on-device AI. We’re proud of the multimodal setup and optimistic about the future of hybrid AI given today’s technical capabilities and limitations.

The Hybrid Architecture & Our Multimodal Approach

Genuix Dimensions supports three modes:

Hybrid (cloud for batch, local for interactive)
Cloud-Only
Local-Only

The hybrid split optimizes for different workloads: cloud handles initial folder summarization (batch processing of many documents), while local handles interactive drilldowns and related content discovery (low-latency, privacy-sensitive queries). This gives users cost efficiency (batch work uses cloud tokens efficiently; interactive queries stay local), privacy (sensitive queries run entirely on-device), and reliability (graceful fallbacks if one provider encounters issues). We’re proud to enable this flexible, multimodal configuration that adapts to different use cases while exploiting the resources that are available.

Local AI Today

To balance workloads in Genuix Dimensions, the hybrid mode uses a cloud model for large batch operations. For optimum local processing, we’ve implemented microprompts and strategic caching that result in smaller chunk sizes, a reduction in parallelism, and effective management of the load on the NPU.

Small local models have constraints: smaller effective context windows, output quality that may need post-processing, limited token counting/metadata APIs, and complex initialization and setup processes. We’ve built abstractions and workarounds, but these limitations are real. On the flip side, we’ve put in the work so you don’t have to. Developers using Genuix will get model-aware settings and workflows, without having to resort to trial and error to get the best results.

We use a unified programming interface so services can switch providers without code changes. This lets us experiment with different providers and adapt as the ecosystem evolves. Genuix combines local AI and cloud for a hybrid one-two punch: breaking through constraints and delivering faster, smarter processing at scale.

Optimism for the Road Ahead

We’re excited about upcoming platforms like Foundry Local that promise enhanced performance and efficiency, improved model quality and capabilities, more robust APIs and developer experience, and better integration with Windows and NPU (and GPU!) hardware. These advances will help address current limitations.

This is what Genuix is here for: to keep pushing the technology forward so these limitations become invisible to users. We’re building adaptive systems that adjust to model capabilities, intelligent caching to minimize reprocessing, graceful degradation when models hit limits, and future-proof architectures that improve with the hardware and models. We expect similar challenges with other local models, and we’re building the foundations to scale to handle them.

We’re excited that our platform and product can grow with the technology. As local AI improves, our hybrid architecture will automatically take advantage of better performance, leverage improved model capabilities, benefit from more robust APIs, and provide a better experience as the ecosystem matures. The abstraction layers and adaptive systems we’ve built today will pay dividends as new models and platforms emerge.

Technical Highlights

Our ILlmClient interface lets us swap providers without changing service code. This flexibility is essential as the ecosystem evolves. We track all LLM calls with provider identification, performance metrics, usage patterns, and error rates. This helps us compare providers and optimize the hybrid split. We use ONNX Runtime with sentence transformers for embeddings, enabling fast semantic search before LLM summarization. This reduces LLM calls and improves responsiveness regardless of the provider. We cache folder summaries with fingerprinting to avoid reprocessing unchanged documents. This is especially valuable for local models, where reprocessing is slower.

Key Takeaways:

Hybrid architectures work

They balance cost, performance, and privacy effectively.

Abstraction is essential

Unified interfaces let us adapt as the ecosystem evolves.

Limitations are expected

Current local models have constraints; we build around them.

Future-proofing matters

The systems we build today will benefit from tomorrow’s improvements.

Optimism is warranted

The trajectory is clear. Hybrid AI is future-ready, today.

Conclusion

We’re proud of our hybrid, multimodal architecture. While we have encountered limitations with today’s technology, we’ve also built to make them invisible to users. As platforms like Foundry Local and other advances emerge in this rapidly-evolving world, our flexible architecture will automatically benefit.

We’re here to keep pushing forward, to keep these limitations invisible, and to ensure our platform grows with the technology. The future of local AI is bright, and we’re excited to be part of it.

Jonathan Khoo