What is Lilac?

Lilac is an open-source tool that enables data and AI practitioners to improve their products by improving their data. It allows users to search, quantify, and edit data for LLMs. Lilac provides features like semantic and keyword search, editing and comparing fields, PII detection, duplicate identification, language detection, custom signal integration, and fuzzy-concept search with refinement.


How to use Lilac?

To get started with Lilac, install it using pip: `pip install lilac`. Then, use the Python User Interface to interact with your data.


Lilac’s Core Features

Semantic & keyword search Edit & compare fields PII, duplicates, language detection, or custom signal Fuzzy-concept search with refinement Blazing fast dataset computations Clustering and titling of large datasets Embedding datasets at high token rates Accelerating data transformations


Lilac’s Use Cases

  • Data exploration and quality control
  • Evaluating datasets
  • Democratizing data across an organization
  • Understanding concepts in datasets
  • Selecting the right data for a task
  • Determining topics covered in datasets

Relevant Navigation