How to Feed Tabular Data to ChatGPT?
As organizations continue to grapple with data, one question is increasingly common: How can we effectively utilize advanced AI systems like ChatGPT to extract valuable insights from tabular data? The advent of tools powered by sophisticated algorithms has made data extraction from tables seem easy, but the method requires a detailed understanding of the process. Here’s a step-by-step guide on how to utilize ChatGPT for this purpose.
Introduction
Picture this: You have a mountain of tabular data nesting away in an Excel sheet or a CSV file, and you want to extract some insightful nuggets without manually sorting through it. Sounds tedious, right? Well, conventional data analysis methods can indeed be very labor-intensive. Fear not! Thanks to the emergence of innovative models like OpenAI’s ChatGPT, the quest for crucial information becomes a much more manageable task.
OpenAI has introduced an API for ChatGPT, particularly powered by the advanced gpt-3.5-turbo lines. This evolution allows users to smooth out those rough, time-consuming edges of data extraction. In this blog post, I will walk you through the entire procedure of extracting meaningful data from tables using this powerful API.
Data Set
For the examples and illustrations in this article, we’ll be utilizing a collection of student performance dataset available on Kaggle. The data can be accessed at: Kaggle Students Performance Dataset. For the sake of simplicity, we are only working with a sample of 30 records from this dataset.
Imagine this dataset containing key factors like gender, race, parental education level, lunch type, test preparation course, and scores in math, reading, and writing. It’s a rich trove of information waiting to be tapped into!
Data Extraction Using ChatGPT API
At its core, ChatGPT harnesses cutting-edge natural language processing (NLP) techniques to interpret inputs and extract information from not just plain text but structured tabular data as well. However, before we dive into the nitty-gritty, let’s break down the process into manageable steps.
Step 1: Prepare Input
The first step is to ensure that your tabular data is ready for processing. Here, we’ll rely on the Pandas library, which is perfect for handling CSV files and offers useful functions to manipulate data efficiently. Install the library if you haven’t already!
pip install pandas
Now, let’s load our CSV file containing the dataset:
import pandas as pd read_csv = pd.read_csv(« Student.csv »)
Make sure your CSV file is appropriately structured. If the information is organized into cells, with rows and columns clearly defined, you are set to go!
Step 2: Use the ChatGPT API
Before you ignite your journey into AI’s realm, ensure that you have the OpenAI Python library installed in your system. Here’s how you can do it:
pip install openai
With the library installed, let’s begin by connecting to the ChatGPT API using your personal API key:
import openai openai.api_key = ‘<YOUR OPENAI API KEY>’
Next up, we’re ready to extract information! Armed with the tabular data and a question, we can get the API to illuminate the information you need. Let’s say we want to find the average math score for male students. We craft our query like this:
input_text = »’What is the average math score for male students? »’ prompt = « » »Please regard the following data:\n {}. Answer the following question and please return only value: {} » » ».format(read_csv, input_text)
Now, with the prompt prepared, we can send the request using ChatGPT:
request = openai.ChatCompletion.create( model= »gpt-3.5-turbo-0301″, messages=[{« role »: « user », « content »: prompt}], ) result = request[‘choices’][0][‘message’][‘content’] print(« ChatGPT Response=> », result)
With the API purring away, it processes your input and generates a response based on the data you provided. Easy-peasy, right? But there’s a catch. We found that the ChatGPT API may have limitations when it comes to performing aggregations like summing or averaging straightforwardly. Often, it might rely more on interpretations unless given clear instructions.
Interactive Experimentation: Ever curious? You can check out its capabilities directly in the ChatGPT playground without needing to mess with API codes. You can play around at: ChatGPT Playground.
SQL-Based Data Extraction from Database Using ChatGPT API
So, here’s where we turn the dial up. Instead of just utilizing the ChatGPT API for direct table analysis, have you considered employing it to generate SQL statements for a database? SQL can significantly assist in filtering and aggregating data more effectively.
For our setup, we’ll leverage SQLite as our database engine and use the sqlite3 Python library to interface seamlessly with it. Here’s how you can set up a database from scratch:
Step 1: Create SQLite Database and Table
Let’s whip up a quick piece of code to create our database and define a table structure:
import sqlite3 # Connect to SQLite database conn = sqlite3.connect(« chatgpt.db ») cursor = conn.cursor() # Create a table cursor.execute(« » »CREATE TABLE IF NOT EXISTS student ( gender TEXT, race TEXT, parentallevelofeducation TEXT, lunch TEXT, testpreparationcourse TEXT, mathscore INTEGER, readingscore INTEGER, writingscore INTEGER ) » » ») # Commit the transaction and close the connection conn.commit() conn.close() Step 2: Adding Data to the Database
Now that we have our table structure ready, let’s import the data from our earlier CSV file into the SQLite database:
import pandas as pd import sqlite3 df = pd.read_csv(« Student.csv ») # Connect to SQLite database conn = sqlite3.connect(‘chatgpt.db’) # Insert DataFrame into SQLite database df.to_sql(‘student’, conn, if_exists=’replace’, index=False) # Close database connection conn.close() Step 3: Use ChatGPT API for SQL Query Generation
At this stage, you can transform your natural language queries into SQL queries using the ChatGPT API. You achieve this by providing the API with the table name, the relevant columns, and your input text:
import sqlite3 import openai # Connect to SQLite database conn = sqlite3.connect(‘chatgpt.db’) cursor = conn.cursor() openai.api_key = ‘<YOUR OPENAI API KEY>’ # Function to get table columns from SQLite database def get_table_columns(table_name): cursor.execute(« PRAGMA table_info({}) ».format(table_name)) columns = cursor.fetchall() return [column[1] for column in columns] # Function to generate SQL query from input text using ChatGPT def generate_sql_query(table_name, text, columns): prompt = « » »You are a ChatGPT language model that can generate SQL queries. Please provide a natural language input text, and I will generate the corresponding SQL query for you. The table name is {} and corresponding columns are {}.\nInput: {}\nSQL Query: » » ».format(table_name, columns, text) request = openai.ChatCompletion.create( model= »gpt-3.5-turbo-0301″, messages=[{« role »: « user », « content »: prompt}], ) sql_query = request[‘choices’][0][‘message’][‘content’] return sql_query # Function to execute SQL query on SQLite database def execute_sql_query(query): cursor.execute(query) result = cursor.fetchall() return result text = « What is the average math score for male students? » table_name = ‘student’ columns = get_table_columns(table_name) sql_query = generate_sql_query(table_name, text, columns) if sql_query: result = execute_sql_query(sql_query) print(« ChatGPT Response=> », result) # Close database connection cursor.close() conn.close()
Isn’t this unraveling some truly wonderful results? After sending the input text to the ChatGPT API, you receive an SQL query crafted just for you. You can subsequently execute this query to pull in the results you are most interested in.
Comparing NLP With SQL-Based Analysis
ChatGPT relies primarily on NLP techniques when dealing with direct data queries. This reliance can sometimes lead to incorrect or vague responses. However, the fusion of SQL capabilities with ChatGPT significantly amplifies its performance when you need precise data interpretation, thus providing more advanced and flexible interaction with tabular data.
Wrapping It Up: If you’re someone who frequently interacts with tabular data, utilizing an API that couples language models with data management systems can innovate how you analyze information. Just imagine the possibilities — whether for academic research, business analytics, or even machine learning pipeline creation.
If you need help building an effective application to extract data from tabular formats or want to integrate Generative AI models, feel free to reach out to us at letstalk@pragnakalp.com.
This journey through leveraging ChatGPT for extracting data from tables has opened a new world for many data users out there while preserving creativity and efficiency!