SQL Issues
I am trying to adapt an SQL report previously created, and I keep getting errors that I don't understand. The first has to do with trying to use WHERE (multiple columns) IN ... it comes up as an error. See image of code then error.
The second error comes from trying to use the WITH name AS ...
Lastly, I use the date in the image above to limit the range of results while working, but when I use the date as show above (WHERE start_date > '2018-01-01'), I get an error unless I include a space in front of 2018... is that usual?
-
I think that trying to use the IN operator to compare multiple columns is the thing you can't do in the first example. Generally, a subquery that you use in your SELECT, WHERE, GROUP BY, or ORDER BY clause is expected to return no more than one column.
One alternative is to concatenate the multiple columns into a single string. This example uses a caret character as a delimiter:
WHERE (osc.calculated_at::VARCHAR || '^' || osc.student_id::VARCHAR || '^' || osc.gradebook_id::VARCHAR) IN
(
SELECT (MAX(calculated_at)::VARCHAR || '^' || student_id::VARCHAR || '^' || gradebook_id::VARCHAR)
...Forcing data type conversions before the concatenation with "::VARCHAR" there may not be strictly necessary.
Alternatively--and generally the faster alternative--you could use a join to match on multiple columns. In that case, your FROM and WHERE clauses would look something like this:
FROM gradebook.overall_score_cache osc
INNER JOIN ( -- most recent score in the cache for each student, gradebook, & timeframe
SELECT osc_last.student_id
, osc_last.gradebook_id
, osc_last.timeframe_start_date, osc_last.timeframe_end_date
, MAX(osc_last.calculated_at) last_calculated_at
FROM gradebook.overall_score_cache osc_last
GROUP BY osc_last.student_id
, osc_last.gradebook_id
, osc_last.timeframe_start_date, osc_last.timeframe_end_date
) AS osc_stgb_latest
ON osc.student_id = osc_stgb_latest.student_id
AND osc.gradebook_id = osc_stgb_latest.gradebook_id
AND osc.calculated_at = osc_stgb_latest.last_calculated_at
AND osc.timeframe_start_date = osc_stgb_latest.timeframe_start_date --to match grouping
AND osc.timeframe_end_date = osc_stgb_latest.timeframe_end_date --to match grouping
WHERE
osc_stgb_latest.student_id IS NOT NULL -- redundant with the inner join
AND osc.percentage IS NOT NULLUsing joins for things like that can take advantage of a database engine's query-processing optimizations that just aren't available for the concatenation example.
In your screenshot, you group by timeframe dates (grading period, from what I've seen), but you're not returning those for comparison with the main query. In my example, I added those to the join conditions. If you have multiple grading periods in a scheduling term, and you use those dates for grouping without using them in the comparison, that leaves a possibility you could get some matching records you don't want. If all you want is the most recent grading period represented in each gradebook, then maybe you don't want to group by timeframe dates.
A somewhat more esoteric approach, which I find useful in cases where the row I want to match is more straightforwardly obtained by ordering than by grouping and aggregating, is to use "DISTINCT ON" syntax.
FROM gradebook.overall_score_cache osc
INNER JOIN ( -- most recent score in the cache for each student, gradebook, & timeframe
SELECT
DISTINCT ON (
osc_last.student_id
, osc_last.gradebook_id
, osc_last.timeframe_start_date, osc_last.timeframe_end_date
)
osc_last.student_id
, osc_last.gradebook_id
, osc_last.timeframe_start_date, osc_last.timeframe_end_date
, osc_last.calculated_at last_calculated_at
FROM gradebook.overall_score_cache osc_last
ORDER BY osc_last.student_id
, osc_last.gradebook_id
, osc_last.timeframe_start_date, osc_last.timeframe_end_date
, osc_last.calculated_at DESC NULLS LAST
) AS osc_stgb_latest
ON osc.student_id = osc_stgb_latest.student_id
AND osc.gradebook_id = osc_stgb_latest.gradebook_id
AND osc.calculated_at = osc_stgb_latest.last_calculated_at
AND osc.timeframe_start_date = osc_stgb_latest.timeframe_start_date --to match grouping
AND osc.timeframe_end_date = osc_stgb_latest.timeframe_end_date --to match grouping
WHERE
osc_stgb_latest.student_id IS NOT NULL -- redundant with the inner join
AND osc.percentage IS NOT NULLI don't see a problem with the second screenshot. I do know that Windows ODBC apps, when the ODBC connection is configured as read-only, don't like it when a query doesn't literally start with the word "select". Brian's suggestion would address an issue like that. In a little more verbose form, it would look like this:
SELECT * FROM ( -- the real work
WITH [etc...]
SELECT [and the rest of the main query, omitting any final semicolon]
) AS How_ODBC_Likes_It ;I hope that helps!
-
For your IN problem: Instead of concatenating a bunch of strings together and using IN, I would recommend using EXISTS and move the values you are returning to the WHERE clause of the sub query.
CURRENT:
SELECT * FROM table1
WHERE (a, b, c) IN (
SELECT a, b, c FROM table2
)BETTER:
SELECT * FROM table1
WHERE EXISTS (
SELECT 1 FROM table2
WHERE table2.a = table1.a
AND table2.b = table1.b
AND table2.c = table1.c
) -
@Marianna,
Have you tried actually running the query? Do you get an error when you actually run the query? Those little red markers in Jaspersoft are definitely not to be trusted. You’ll see those little red markers in Jasper all the time, even when the query is perfectly legitimate.
If you click the “Read Fields” button in the top right, do you get an error? If you actually preview the report, do you get an error? If not, then you can completely ignore those little red markers on the side.
If you are getting an error when you actually run the query (not just the little red markers on the side), please provide the entire query in text form (not a screenshot), and I would be more than happy to help you troubleshoot.I think that trying to use the IN operator to compare multiple columns is the thing you can’t do in the first example.
You can definitely have multiple values in a WHERE, IN clause. I see it / use it all the time.
Alternatively—and generally the faster alternative—you could use a join to match on multiple columns.
That would definitely work, but I was always taught to never use JOINS for filtering purposes. JOINS are expensive and unnecessary when you’re not actually selecting any data from the joined table/subquery.
-
Franck, thank you for the correction. I'll be sure to look for uses of (multiple columns) IN (a multi-column subselect) . That definitely renders my suggestions inelegant, at best.
Marianna, I did notice that your first screenshot shows the columns in a different order on one side of the IN versus the other.
Please sign in to leave a comment.
Comments
6 comments