Success Story: We use AI to generate SQL queries from natural language

In an increasingly data-driven environment, accessing the right information quickly and accurately is key to decision making. However, many organizations still rely on technical tools such as SQL (Structured Query Language), which represents a barrier for non-technical users and slows down internal processes.

Our client faced this challenge: to allow non-technical profiles to query data in an autonomous and efficient way. To solve it, we developed a solution capable of transforming natural language queries into SQL queries, eliminating the dependence on advanced technical knowledge and significantly speeding up access to information.

What we did

We developed an application that allows anyone in the organization to query data in a simple and autonomous way. The process begins with the selection of the table on which a query is to be made.

Next, the person must choose whether to perform the query in SQL or through an internal client tool that operates as a REST API, which uses a specific syntax to build queries and access the data. Then, the language model (LLM) in charge of transforming the natural language query into a structured query is selected. Available options include models such as GPT-3.5 Turbo, GPT-4o, Gemini Pro or Claude 2.

Then, the person writes the question in natural language (for example: “Where does Sofía Martínez Delgado work?”).

Once this configuration is completed, different processes are activated in the backend. First, all contextual information related to the selected table (table and column names and descriptions, data types, etc.) is retrieved from a vectorstore. With this information, and taking into account the selected tool and model, a call to an API (FastAPI) that integrates the available LLMs is launched.

The model receives a complete prompt including:

The question in natural language
The technical information of the table
The basic documentation of the selected tool
Instructions on the task to be performed

The model returns the query already generated, either in SQL format or in JSON, depending on the tool chosen.

Before executing the query, the person can review and, if desired, modify the generated query. Once validated, the query process is launched to the database through an API request.

Finally, the requested information is displayed on the screen in response to the query.

Project Challenges

During the development of the project, one of the main challenges was the low quality of the data and the poor documentation of the tables and views, fundamental elements for the language model (LLM) to correctly build the queries.

We encountered the absence of a clear standard in the data. For example, in some tables, string type column values could appear in uppercase, lowercase, with or without accents, and even with incomplete or inconsistent proper names. This created a major problem: even if the model correctly generated the query structure, if the values used for filtering did not exactly match those in the database, the API did not return any results.

To mitigate this situation, we implemented different strategies.

Strategy 1: example of values (data profiling)

We tried to enrich the model prompt by including, next to the table description, a sample of typical values per column (data profiling). This helped the LLM to better understand how the real data were represented. However, this approach had an important limitation: the considerable increase in the number of tokens, which impacted costs and compatibility with certain models.

Strategy 2: normalization with regular expressions

To avoid the token limit problem, we designed a solution based on the generation of more tolerant filters using regular expressions (regexp), which would allow ignoring differences in uppercase, lowercase and accents. For example:

This expression makes it possible to retrieve records with spelling or capitalization variations in the value “Sofía”, thus increasing the robustness of the query.

However, this solution also presented challenges: the complexity of the expressions meant that the prompts required very high precision from the model. We had to spend time fine-tuning the instructions and training the prompts well to get the model to consistently generate efficient queries.

Finally, after several iterations, we were able to achieve a satisfactory accuracy rate, where in most cases the model generated valid and effective queries, even in contexts with inconsistent or poorly documented data.

Beneficits

Simplified data access: allows anyone, regardless of technical knowledge, to interact with databases and APIs autonomously.
Time savings: automates the generation of queries and APIs, eliminating the need to write code manually and speeding up access to key information.
Accuracy and reliability: automated generation minimizes human error, ensuring that queries and exposed data are accurate.
Flexibility and customization: the ability to review and adjust the generated instructions ensures that the system adapts to the specific needs of each user.

Project results

Model performance

Increased accuracy in automatically generated queries.
Higher success rate in query execution.
Reduction of semantic errors in the results.
Reduction of failures due to data quality problems.

Technical performance

Improved end-to-end response time.
Optimized use of tokens per prompt.
Controlled cost per query generated.
Reduction of rejections due to excess of tokens.

User experience

Less need to modify the generated queries.
Increased satisfaction after using the tool.
Sustained growth in adoption and recurrent use of the solution.

Operational performance

Expansion of the coverage of documented tables.
Reduction of the effort required to document new tables.
Balanced use between SQL and internal API according to context.

Testing and evolution

High success rate in regression testing for changes.
Reduced time spent on prompts adjustments.

The ability to convert natural language queries into SQL statements or REST API calls represents a significant change in the way organizations access and manage their data. This solution removes technical barriers, allowing non-technical profiles to interact autonomously with information.

Thanks to this tool, companies can streamline their processes, reduce dependence on specialized teams for repetitive tasks and foster a more collaborative and data-driven environment. In an increasingly competitive business environment, having agile and reliable access to information translates into better decisions and greater operational efficiency.

Do you want to harness the power of LLMs to address your information democratization challenges? We have a team of experts in Generative AI and Data who have developed GenIA Ecosystem, an ecosystem of proprietary AI solutions to meet any challenge. Tell us about your project!

Want to learn more about our Artificial Intelligence services for SQL queries?

Contact with our Data & AI team

1 Comment


Veo 3 API

Jul 03, 2025 at 05:57

Really interesting approach to making data querying more accessible using LLMs! Did you notice any differences in performance or accuracy between models like GPT-4o and Gemini Pro when generating SQL from natural language? It would be fascinating to know how you evaluate which model fits best for these structured query tasks.

Find out how we can help you harness the potential of AI to boost your business.

What we did

Project Challenges

Strategy 1: example of values (data profiling)

Strategy 2: normalization with regular expressions

Beneficits

Project results

Model performance

Technical performance

User experience

Operational performance

Testing and evolution

Want to learn more about our Artificial Intelligence services for SQL queries?

Miriam Arroyo

1 Comment