Hacker News: rishramanathan

Constitutional AI for All

rishramanathan — Wed, 16 Oct 2024 16:21:24 +0000

Article URL: https://www.openlayer.com/blog/post/constitutional-ai-for-all

Comments URL: https://news.ycombinator.com/item?id=41860935

Points: 9

# Comments: 0

New comment by rishramanathan in "Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI"

rishramanathan — Thu, 07 Dec 2023 04:23:16 +0000

Not sure I follow — could you elaborate on what you mean by “directly interacting”?

New comment by rishramanathan in "Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI"

rishramanathan — Wed, 06 Dec 2023 00:43:05 +0000

Thanks! We’ve broken our evals down into three primary categories — integrity, consistency and performance.

Integrity tests tackle data quality issues (e.g. no PII in input data, no duplicate rows, schema checks on specific fields).

Consistency tests help ensure your fine-tuning & validation datasets are well constructed in relation to one another (e.g. don’t have overlap, are sized correctly), and your production data doesn’t drift from your reference data.

Performance tests are focused on your model outputs, and measure common metrics for each task (e.g. accuracy, F1, PR for classification) as well as custom metrics designed to be evaluated by an LLM (e.g. “make sure these outputs don’t contain profanity”). You can apply these metrics to specific subpopulations of your data by setting filters on your input fields.

Re: adding your own evals — yes, you can! The evals are not statically defined — they are flexible structures that allow you to customize them to your needs.

Re: importing evaluations from other libraries — this is something we’re adding more support for. We’ve just added an integration with Great Expectations, and can add an integration with OpenAI’s evals if that is something the community is interested in.

New comment by rishramanathan in "Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI"

rishramanathan — Tue, 05 Dec 2023 17:42:25 +0000

Thanks! Glad Openlayer is working well for you :)

New comment by rishramanathan in "Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI"

rishramanathan — Tue, 05 Dec 2023 16:58:29 +0000

We’ve actually been building a testing and evaluation platform from the start, but started with discriminative ML tasks like classification and regression. We waited to do a Launch HN because we were mostly focused on enterprise / mid-market.

These past few months, however, we’ve prioritized building out features for testing and monitoring LLMs.

LLMs certainly have their unique challenges, but the evaluation problem in general is not new, and much of what we’ve built historically is very much applicable to this new crop of ML use cases!

New comment by rishramanathan in "Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI"

rishramanathan — Tue, 05 Dec 2023 16:48:35 +0000

We realize the lack of information about pricing isn’t ideal, and that people will be turned away by this. In the meantime, we do have a free plan with generous limits that allows you to get started self-serve. This plan isn’t time bounded, so there won’t be pressure to upgrade unless you need increased data limits.

On open-core — we’ve been considering open-sourcing the engine that evaluates your models. Will have more on this soon!

We’re definitely prioritizing increasing transparency, and we appreciate your feedback about it!

New comment by rishramanathan in "Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI"

rishramanathan — Tue, 05 Dec 2023 16:33:09 +0000

Broadly, on the monitoring side, we’re more focused on evaluating the quality of the model’s outputs (is it violating your rules, handling specific subpopulations / edge cases correctly etc.). OpenLLMetry is more focussed on telemetry and tracing, whereas for us ‘monitoring’ is a means to running your tests on production data.

Openlayer’s also intended to be used on non-LLM use cases. Here are a few other ways we’re different:

1. Support for other ML task types

2. Includes a development mode for versioning and experimentation

3. Native slack and email alerts (openllmetry might integrate with other platforms that do that, but not sure)

4. Collaboration is deeply embedded into the product

New comment by rishramanathan in "Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI"

rishramanathan — Tue, 05 Dec 2023 16:10:07 +0000

Oops, thanks for the heads up! Fixed.

Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI

rishramanathan — Tue, 05 Dec 2023 16:01:38 +0000

Hey HN, Rish, Vikas and Gabe here. We're building Openlayer (https://www.openlayer.com/), an observability platform for AI. We've developed comprehensive testing tools to check both the quality of your input data and the performance of your model outputs.

The complexity and black-box nature of AI/ML have made rigorous testing a lot harder than it is in most software development. Consequently, AI development involves a lot of head-scratching and often feels like walking in the dark. Developers need reliable insights into how and why their models fail. We're here to simplify this for both common and long-tail failure scenarios.

Consider a scenario in which your model is working smoothly. What happens when there's a sudden shift in user behavior? This unexpected change can disrupt the model's performance, leading to unreliable outputs. Our platform offers a solution: by continuously monitoring for sudden data variations, we can detect these shifts promptly. That's not all though – we’ve created a broad set of rigorous tests that your model, or agent, must pass. These tests are designed to challenge and verify the model's resilience against such unforeseen changes, ensuring its reliability under diverse conditions.

We support seamlessly switching between (1) development mode, which lets you test, version, and compare your models before you deploy them to production, and (2) monitoring mode, which lets you run tests live in production and receive alerts when things go sideways.

Say you're using an LLM for RAG and want to make sure the output is always relevant to the question. You can set up hallucination tests, and we'll buzz you when the average score dips below your comfort zone.

Or imagine you're managing a fraud prediction model and are losing sleep over false negatives. Openlayer offers a two-step solution. First, it helps pinpoint why the model misses certain fraudulent data points using debugging tools such as explainability. Second, it enables converting these identified cases into targeted tests. This allows you to deep dive into tackling specific incidents, like fraud within a segment of US merchants. By following this process, you can understand your model's behavior and refine it to capture future fraudulent cases more effectively.

The MLOps landscape is currently fragmented. We’ve seen countless data and ML teams glue together a ton of bespoke and third-party tools to meet basic needs: one for experiment tracking, another for monitoring, and another for CI automation and version control. With LLMOps now thrown into the mix, it can feel like you need yet another set of entirely new tools.

We don’t think you should, so we're building Openlayer to condense and simplify AI evaluation. It’s a collaborative platform that solves long-standing ML problems like the ones above, while tackling the new crop of challenges presented by Generative AI and foundation models (e.g. prompt versioning, quality control). We address these problems in a single, consistent way that doesn't require you to learn a new approach. We’ve spent a lot of time ensuring our evaluation methodology remains robust even as the boundaries of AI continue to be redrawn.

We're stoked to bring Openlayer to the HN community and are keen to hear your thoughts, experiences, and insights on building trust into AI systems.

Comments URL: https://news.ycombinator.com/item?id=38532593

Points: 94

# Comments: 31

New comment by rishramanathan in "Show HN: Openlayer – test, fix, and improve your ML models"

rishramanathan — Wed, 10 May 2023 18:27:24 +0000

Good question! Security is our highest priority - here's a few key things we've done to help ensure the privacy of customer data:

- We offer on-prem deployments on all cloud platforms, so your data never needs to leave your infrastructure

- We're on the verge of receiving our SOC2 Type 2 certification

- All data is encrypted in transit and at rest, and lives in an isolated private subnet

New comment by rishramanathan in "Show HN: BentoML goes 1.0 – A faster way to ship your models to production"

rishramanathan — Wed, 13 Jul 2022 21:00:19 +0000

BentoML is by far the best open source tool I've seen for model deployment. Congrats, team!