SQL GROUP BY, then WHERE, then COUNT: A Detailed Guide to Counting Courses with Passed Tests

In this article, we’ll explore how to write an efficient SQL query that counts the number of courses where both evaluations (test1 and test2) have been passed on the first attempt. We’ll break down the problem into two steps: first, retrieving the first attempts for each course, and then filtering out the courses that don’t meet the condition.

Understanding the Problem

We’re given a database table tbl with columns row #, course_id, eval_type, eval_date, and Passed?. Our goal is to write an SQL query that returns the count of course IDs where both test1 and test2 have been passed on their first attempt.

Step 1: Retrieving First Attempts for Each Course

To start, we’ll create an inner query that retrieves the first attempts for each course. We can achieve this by using a subquery or a Common Table Expression (CTE). In our case, we’ll use a CTE to make the query more readable.

WITH FirstAttempts AS (
  SELECT 
    row_number() over (partition by course_id, eval_type order by date) as seq,
    course_id,
    eval_type,
    date,
    passed
  FROM tbl
)
SELECT *
FROM FirstAttempts
WHERE seq = 1 AND passed = 'Y'

In this query:

We create a CTE named FirstAttempts that selects the required columns from the tbl table.
The row_number() function assigns a unique number to each row within each partition (in this case, course_id and eval_type). The rows are ordered by date.
We then select only the rows where seq equals 1 (i.e., the first attempt for each course) and passed equals ‘Y’.

Step 2: Filtering Out Courses with Unpassed Tests

Now that we have the first attempts, we need to filter out courses where both test1 and test2 haven’t been passed on their first attempt. We can achieve this by using another query or modifying the previous one.

SELECT DISTINCT t1.course_id
FROM (
  SELECT 
    course_id,
    eval_type,
    min(date) as date
  FROM tbl
  GROUP BY course_id, eval_type
) t1
WHERE NOT EXISTS (
  SELECT * 
  FROM tbl t2 
  WHERE 
    t2.course_id = t1.course_id
    AND t2.date = t1.date
    AND t2.passed = 'N'
)

In this query:

We use a subquery to select the minimum date for each course and evaluation type (using min(date)).
The outer query selects distinct courses where there doesn’t exist another row with the same course ID, date, and passed status equal to ‘N’.

Combining the Two Queries

To get our final result, we need to combine the two queries. We can do this by selecting the required columns from both queries.

SELECT 
  t1.course_id,
  COUNT(*) as count_passed_courses
FROM (
  SELECT 
    course_id,
    eval_type,
    min(date) as date
  FROM tbl
  GROUP BY course_id, eval_type
) t1
LEFT JOIN (
  SELECT *
  FROM FirstAttempts
  WHERE seq = 1 AND passed = 'Y'
) t2 ON t1.course_id = t2.course_id AND t1.date = t2.date
WHERE NOT EXISTS (
  SELECT * 
  FROM tbl t3 
  WHERE 
    t3.course_id = t1.course_id
    AND t3.date = t1.date
    AND t3.passed = 'N'
)
GROUP BY t1.course_id, t1.eval_type

In this query:

We join the two queries on course_id and date.
The outer query selects distinct courses where there doesn’t exist another row with the same course ID, date, and passed status equal to ‘N’.

Conclusion

In conclusion, we’ve broken down the problem of counting courses with both test1 and test2 passed on their first attempt into two steps: retrieving the first attempts for each course and filtering out courses with unpassed tests. We’ve also provided an example query that combines these two steps.

The final result should give us a clear understanding of how to write an efficient SQL query to solve this problem.

Last modified on 2023-06-30