<ChatGPT, Claude, and Gemini: A Data Analysis Showdown>
Written on
Welcome to the inaugural entry of my series, ChatGPT, Claude, and Gemini: A Data Analysis Showdown. In this series, I will evaluate these AI models across various data science and analytics tasks to assist data professionals and enthusiasts in selecting the best AI companion for their requirements.
I have chosen to compare these three models due to their user-friendly chatbot interfaces, which make them ideal for everyday tasks. For this evaluation, I utilized my subscriber accounts for Claude and Gemini Advanced, which I recently subscribed to for this series.
ChatGPT 4o: Accessible to all ChatGPT users, though free users face a limited usage cap. The Plus subscription costs $20 per month. - Context window: 128k tokens - Training data up to October 2023
Claude 3.5 Sonnet: Open to all Claude.ai users, with free users also having a usage limit. A professional plan is available for $20 per month. - Context window: 200k tokens - Training data up to April 2024
Gemini Advanced: Offered to Google One AI Premium plan subscribers at $20 per month. - Context window: 1 million tokens - Training data up to November 2023
Evaluating SQL Skills
In this first article, we will focus on SQL, a fundamental skill for data analysts and scientists. Its straightforward syntax makes it an excellent starting point for assessing large language models (LLMs). I will evaluate each model based on query accuracy, efficiency, formatting, and explanations.
My evaluation consists of ten questions divided into three categories: 1. Problem Solving: The first three questions are derived from LeetCode SQL challenges, featuring clear descriptions and input/output structures akin to interview questions. 2. Business Logic: The next four questions are original queries designed to mimic real-world SQL applications, complete with datasets and brief business questions. 3. Query Optimization: The final section challenges the models to enhance three provided SQL queries.
Let’s dive into the assessment!
Round 1: Problem Solving (LeetCode SQL)
#### Question 1 — LeetCode SQL 262 Trips and Users
This question is classified as "Hard" on LeetCode, with a low acceptance rate of 36.5%. It requires joining the same table twice on different columns while interpreting multiple filters.
GPT-4o: Delivered the correct solution with thorough explanations, earning full points.
Claude 3.5 Sonnet: Also provided an accurate code with comprehensive explanations, receiving full points.
-- Claude's answer for reference
SELECT
t.request_at AS Day,
ROUND(
SUM(CASE
WHEN t.status = 'cancelled_by_driver' OR t.status = 'cancelled_by_client' THEN 1
ELSE 0
END) / COUNT(*), 2
) AS 'Cancellation Rate'
FROM
Trips tJOIN
Users uc ON t.client_id = uc.users_idJOIN
Users ud ON t.driver_id = ud.users_idWHERE
uc.banned = 'No' AND ud.banned = 'No' AND t.request_at BETWEEN '2013-10-01' AND '2013-10-03'GROUP BY
t.request_at
Gemini Advanced: Provided a correct query with explanations, also earning full points. Interestingly, it linked a GitHub repository at the end of its response, which was a random LeetCode practice repo in Java, not related to the SQL question.
#### Question 2 — LeetCode SQL 185 Department Top Three Salaries
This question is also marked "Hard" and requires a window function.
GPT-4o: Presented a correct solution utilizing a window function and clear explanations, earning full points.
-- GPT's solution for reference
WITH RankedSalaries AS (
SELECT
e.id,
e.name,
e.salary,
e.departmentId,
d.name AS department,
DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary DESC) AS salary_rank
FROM
Employee eJOIN
Department d ON e.departmentId = d.id)
SELECT
department,
name AS Employee,
salary AS Salary
FROM
RankedSalariesWHERE
salary_rank <= 3ORDER BY
department, salary DESC, name;
Claude 3.5 Sonnet: Provided a similar correct solution with detailed explanations, earning full points.
Gemini Advanced: Delivered the correct solution, earning full points. This time, it linked a relevant GitHub repository containing LeetCode SQL solutions and the specific LeetCode question, which was a positive improvement.
#### Question 3 — LeetCode SQL 1341 Movie Rating
This question presents a "Medium" difficulty yet has a low acceptance rate of 40.1%. It requires combining the top results from two aggregations.
GPT-4o: Provided a correct solution, although it was somewhat lengthy and inefficient, so it received half a point.
-- GPT's solution
WITH UserRatings AS (
SELECT
u.name AS user_name,
COUNT(*) AS rating_count
FROM
MovieRating mrJOIN
Users u ON mr.user_id = u.user_idGROUP BY
u.user_id),
MovieRatingsInFeb2020 AS (
SELECT
m.title,
AVG(mr.rating) AS avg_rating
FROM
MovieRating mrJOIN
Movies m ON mr.movie_id = m.movie_idWHERE
mr.created_at BETWEEN '2020-02-01' AND '2020-02-29'GROUP BY
m.movie_id),
MaxUser AS (
SELECT
user_nameFROM
UserRatingsWHERE
rating_count = (SELECT MAX(rating_count) FROM UserRatings)ORDER BY
user_nameLIMIT 1
),
MaxMovie AS (
SELECT
titleFROM
MovieRatingsInFeb2020WHERE
avg_rating = (SELECT MAX(avg_rating) FROM MovieRatingsInFeb2020)ORDER BY
titleLIMIT 1
)
SELECT user_name AS results FROM MaxUser
UNION ALL
SELECT title AS results FROM MaxMovie;
Claude 3.5 Sonnet: Provided a clean and efficient solution, receiving full points.
-- Claude's solution
(SELECT name AS results
FROM Users u
JOIN MovieRating mr ON u.user_id = mr.user_id
GROUP BY u.user_id
ORDER BY COUNT(DISTINCT mr.movie_id) DESC, name
LIMIT 1)
UNION ALL
(SELECT title AS results
FROM Movies m
JOIN MovieRating mr ON m.movie_id = mr.movie_id
WHERE EXTRACT(YEAR_MONTH FROM mr.created_at) = 202002
GROUP BY m.movie_id
ORDER BY AVG(mr.rating) DESC, title
LIMIT 1);
Gemini Advanced: Unfortunately, it produced an incorrect solution, resulting in a syntax error. However, it did link to a relevant GitHub repository.
-- Gemini's solution
WITH MostProlificUser AS (
SELECT
u.name AS results,
RANK() OVER (ORDER BY COUNT(*) DESC, u.name ASC) AS user_rank
FROM
MovieRating mrJOIN
Users u ON mr.user_id = u.user_idGROUP BY
u.user_id),
MovieAvgRatings AS (
SELECT
m.title AS results,
AVG(mr.rating) AS avg_rating,
RANK() OVER (ORDER BY AVG(mr.rating) DESC, m.title ASC) AS movie_rank
FROM
MovieRating mrJOIN
Movies m ON mr.movie_id = m.movie_idWHERE
mr.created_at BETWEEN '2020-02-01' AND '2020-02-29'GROUP BY
m.title)
SELECT results
FROM (
SELECT results, user_rank FROM MostProlificUser
UNION ALL
SELECT results, movie_rank FROM MovieAvgRatings
) combined_results
WHERE user_rank = 1 OR movie_rank = 1;
Round 1 Recap
In this round, the standout performer was Claude 3.5 Sonnet, which accurately answered all three questions with clear explanations. GPT-4o scored 2.5 points, losing half a point due to an inefficient query. Gemini Advanced earned 2 points due to a syntax error in the final question.
Round 2: Business Logic
#### Bonus: Dataset Preparation
In this round, I will upload synthetic datasets to the three AI tools, providing limited descriptions, and allow them to interpret my business questions and write SQL queries. First, I will generate the synthetic data.
I also sought assistance from LLMs in this task.
ChatGPT-4o: It successfully generated four CSV files with download links, impressively creating both specified and additional context-relevant columns using the faker package.
Claude 3.5 Sonnet: Provided Python code for generating synthetic datasets using np.random methods, but it could not execute the code to provide CSV download links. The data produced felt less authentic without the use of the faker package.
Gemini Advanced: Performed poorly in this task, generating only tables of table descriptions. After clarification, it created fake tables with merely 10 rows each, refusing to provide larger datasets.
#### Load Datasets
With four synthetic datasets prepared, I attempted to upload them to the three AI tools, totaling 920 KB, with datasets ranging from 500 to 5,000 rows each.
ChatGPT-4o: Successfully loaded all four datasets with previews for each table, allowing for closer examination. Its file upload limit is 512MB per file and up to 10 files in one conversation.
Claude 3.5 Sonnet: Encountered an error when I tried to upload the four datasets, stating, “Conversation is 119% over the length limit.” This was surprising given its higher context window than ChatGPT-4o. Although the stated upload limit is “30MB per file (maximum 5 files)”, the effective limit appeared lower, possibly due to demand. I had to reduce the total file size to 320 KB by trimming over half the rows, after which it worked.
Gemini Advanced: Responded with, “I can’t assist you with that, as I’m only a language model and don’t have the capacity to understand and respond.” Consequently, I had to provide table descriptions instead of real datasets for the follow-up questions.
#### Question 4: Monthly Sales Calculation
Prompt: Please help me write a SQL query to calculate the total order amount by month from US users.
GPT-4o: Provided the correct answer with a filter country = 'United States'. It also proactively added an order status filter order_status = 'Completed', earning full points and a bonus for this insight.
Claude 3.5 Sonnet: Also delivered a correct query. For the “US users” filter, it wrote country = 'United States' OR country = 'US', correctly identifying potential data inconsistencies, which earned it full points and a bonus.
-- Claude's SQL code for reference
SELECT
EXTRACT(YEAR FROM o.order_time) AS year,
EXTRACT(MONTH FROM o.order_time) AS month,
SUM(op.total_price) AS total_amount
FROM
orders oJOIN
users u ON o.user_id = u.user_idJOIN
ordered_products op ON o.order_id = op.order_idWHERE
u.country = 'United States' OR u.country = 'US'GROUP BY
EXTRACT(YEAR FROM o.order_time), EXTRACT(MONTH FROM o.order_time)ORDER BY
year, month;
Gemini Advanced: Generated the correct query with the country = 'United States' filter, earning full points.
#### Question 5: New User Counts
Prompt: Please help me write a SQL query to calculate the number of first-time users each month. First-time users are users who placed their first order in that month.
GPT-4o, Claude 3.5 Sonnet, and Gemini Advanced: All three provided correct queries, earning full points.
-- Gemini's SQL code for reference
WITH first_orders AS (
SELECT
user_id,
MIN(order_time) AS first_order_time
FROM
ordersGROUP BY
user_id)
SELECT
DATE_TRUNC('month', first_order_time) AS first_order_month,
COUNT(DISTINCT user_id) AS num_first_time_users
FROM
first_ordersGROUP BY
first_order_monthORDER BY
first_order_month;
#### Question 6: Best-selling Product Categories
Prompt: Please help me write a SQL query to get the top 5 popular product categories by total order amount each month.
GPT-4o, Claude 3.5 Sonnet, and Gemini Advanced: All three provided correct queries using window functions, earning full points.
-- Gemini's code for reference
WITH category_orders AS (
SELECT
DATE_TRUNC('month', o.order_time) AS order_month,
p.product_category,
SUM(op.total_price) AS total_category_amount
FROM
orders oJOIN
ordered_products op ON o.order_id = op.order_idJOIN
products p ON op.product_id = p.product_idGROUP BY
order_month, p.product_category),
ranked_categories AS (
SELECT
order_month,
product_category,
total_category_amount,
ROW_NUMBER() OVER (PARTITION BY order_month ORDER BY total_category_amount DESC) AS category_rank
FROM
category_orders)
SELECT
order_month,
product_category,
total_category_amount
FROM
ranked_categoriesWHERE
category_rank <= 5ORDER BY
order_month, category_rank;
#### Question 7: Retention Rate
Prompt: Please help me write a SQL query to calculate the user retention rate each month. Retention Rate in month X = Out of the users who placed an order in month X-1, the % also placed an order in month X.
GPT-4o: Initially provided an incorrect answer, failing to meet the requirement. After two follow-ups, it produced a correct response, earning 0 points.
-- GPT's initial code
WITH MonthlyOrders AS (
SELECT
o.user_id,
DATE_FORMAT(STR_TO_DATE(o.order_time, '%m/%d/%y %H:%i'), '%Y-%m') AS order_month
FROM
orders oWHERE
o.order_status = 'Completed'GROUP BY
o.user_id, order_month)
SELECT
pmu.current_month,
pmu.previous_month_users,
ru.retained_users,
CASE
WHEN pmu.previous_month_users = 0 THEN 0
ELSE ROUND(ru.retained_users * 100.0 / pmu.previous_month_users, 2)
END AS retention_rate
FROM
PreviousMonthUsers pmuLEFT JOIN
RetainedUsers ru ON pmu.current_month = ru.current_monthORDER BY
pmu.current_month;
Claude 3.5 Sonnet: Its initial solution also failed but was closer than GPT. After one follow-up, it corrected its code, earning 0.5 points.
Gemini Advanced: Initially failed to calculate the retention rate as required. After a follow-up, it adjusted its code, earning 0.5 points.
Round 2 Recap
In this round, Claude 3.5 Sonnet led with 3.5 out of 4 points, plus 0.5 bonus points. ChatGPT-4o scored 3 out of 4 points, plus 0.5 bonus points, particularly excelling in creating and managing synthetic datasets. Gemini Advanced scored 3.5 out of 4 points but struggled with direct CSV uploads.
Round 3: Query Optimization
In this final round, I focused on optimizing SQL queries using inefficient examples from my previous SQL optimization article, based on the same synthetic datasets.
#### Question 8: Only Select Necessary Columns
Expectations: It is better to select only necessary columns in a window function instead of SELECT *.
GPT-4o: Optimized the query by selecting relevant columns and suggested ensuring proper indexing for performance, earning full points.
Claude 3.5 Sonnet: Not only selected relevant columns but also included COALESCE(SUM(price), 0) AS first_order_amount to handle potential null cases, earning full points.
Gemini Advanced: Also optimized the query by selecting relevant columns and provided indexing recommendations, earning full points.
-- Claude's code for reference
WITH first_order AS (
SELECT user_id, order_id, order_time
FROM orders
QUALIFY ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY order_time) = 1
)
SELECT
fo.user_id,
fo.order_id,
COALESCE(SUM(op.price), 0) AS first_order_amount
FROM
first_order foLEFT JOIN
ordered_products op ON fo.order_id = op.order_idGROUP BY
fo.user_id, fo.order_id
#### Question 9: Pre-Aggregation
Expectations: Given users is a wide table with many columns, it is better to do the aggregation to calculate first_order_amount first, then join to the users table.
GPT-4o: Initially focused on formatting but later suggested performing pre-aggregation first, earning 0.5 points.
Claude 3.5 Sonnet: Recommended pre-aggregation and proposed moving the large table join to the final step for optimization, earning full points.
-- Claude's code
WITH first_order AS (
SELECT
u.user_id,
o.order_id,
o.order_time
FROM
users uLEFT JOIN
orders o ON u.user_id = o.user_idQUALIFY ROW_NUMBER() OVER(PARTITION BY u.user_id ORDER BY o.order_time) = 1
),
first_order_amount AS (
SELECT
fo.user_id,
COALESCE(SUM(op.price), 0) AS first_order_amount
FROM
first_order foLEFT JOIN
ordered_products op ON fo.order_id = op.order_idGROUP BY
fo.user_id)
SELECT
u.*,
COALESCE(foa.first_order_amount, 0) AS first_order_amount
FROM
users uLEFT JOIN
first_order_amount foa ON u.user_id = foa.user_id
#### Question 10: De-duplication
Expectations: This query uses DISTINCT, UNION, and ROW_NUMBER() for de-duplication. All three operations are expensive and do the same thing here, so we should eliminate the duplicate deduping efforts.
GPT-4o: Optimized the code but altered the output structure. After clarification, it provided optimized code with the correct columns, earning 0.5 points.
Claude 3.5 Sonnet: Combined two window function CTEs into one and used CASE WHEN + WHERE rn_first=1 OR rn_last=1 for output with the same structure, earning full points.
Gemini Advanced: Combined two window functions into one but got the first and last order ID using DISTINCT, requiring two table scans, and received 0.5 points.
Round 3 Recap
In this round of query optimization, Claude 3.5 Sonnet was the clear winner, earning full points on all questions with innovative solutions. ChatGPT-4o needed guidance on two questions but eventually answered all correctly, earning 2 points. Gemini Advanced faced one syntax error and produced less optimized code, resulting in 1.5 points.
Summary
Claude 3.5 Sonnet (10 points) - Demonstrated exceptional performance in both SQL generation and optimization, quickly correcting errors after clarification. I would wholeheartedly recommend Claude for SQL query assistance. - Its user interface is preferable for formatted text input. - The project functionality allows for sharing knowledge at the project level, making it convenient for team collaboration. - However, its lower file upload limit could pose a challenge for sharing larger datasets.
GPT-4o (8 points) - Capable of generating syntactically correct SQL and excels in business logic, though it lags in query optimization. - A key advantage is its ability to load datasets directly in the UI, enhancing data understanding and exploration. - Its capability to generate synthetic datasets with download links is also noteworthy.
Gemini Advanced (7 points) - Underperformed compared to Claude and ChatGPT but still achieved a 70% success rate. - Its ability to search the web and provide reference links could be beneficial, but the relevance of those links may undermine its credibility. - Integrates well with Google Suite products, such as Google Spreadsheet.
What’s Next
Having compared SQL capabilities, I plan to assess other data science skills in upcoming articles. Please share any additional areas you would like me to cover! - Exploratory Data Analysis (EDA) - Visualization - Machine Learning - Text Analytics - Data Science Business Cases
Interested in AI and data science? Explore these additional articles: - Can ChatGPT replace a data scientist? My perspective. - An evaluation of ChatGPT’s interactive tables and charts features. - Utilizing OpenAI API for text categorization and summarization—my coding example. - Discover Google’s new AI notebook. Check it out here!