SQL Issues

Comments

6 comments

  • Avatar
    Brian Cortes

    I can't help you with the first error. I will say though that I didn't know that WHERE IN can have multiple values.

    For the second error, try SELECT * FROM (your entire query here) 

     

  • Avatar
    Casey Mirch

    I think that trying to use the IN operator to compare multiple columns is the thing you can't do in the first example. Generally, a subquery that you use in your SELECT, WHERE, GROUP BY, or ORDER BY clause is expected to return no more than one column.

    One alternative is to concatenate the multiple columns into a single string. This example uses a caret character as a delimiter:

    WHERE (osc.calculated_at::VARCHAR || '^' || osc.student_id::VARCHAR || '^' || osc.gradebook_id::VARCHAR) IN 
          (
          SELECT (MAX(calculated_at)::VARCHAR || '^' || student_id::VARCHAR || '^' || gradebook_id::VARCHAR)
          ...

    Forcing data type conversions before the concatenation with "::VARCHAR" there may not be strictly necessary.

    Alternatively--and generally the faster alternative--you could use a join to match on multiple columns. In that case, your FROM and WHERE clauses would look something like this:

    FROM gradebook.overall_score_cache osc
    INNER JOIN (  -- most recent score in the cache for each student, gradebook, & timeframe
      SELECT osc_last.student_id
    , osc_last.gradebook_id
    , osc_last.timeframe_start_date, osc_last.timeframe_end_date
    , MAX(osc_last.calculated_at) last_calculated_at
      FROM gradebook.overall_score_cache osc_last
      GROUP BY osc_last.student_id
    , osc_last.gradebook_id
    , osc_last.timeframe_start_date, osc_last.timeframe_end_date
      ) AS osc_stgb_latest
      ON osc.student_id = osc_stgb_latest.student_id
      AND osc.gradebook_id = osc_stgb_latest.gradebook_id
    AND osc.calculated_at = osc_stgb_latest.last_calculated_at
    AND osc.timeframe_start_date = osc_stgb_latest.timeframe_start_date --to match grouping
    AND osc.timeframe_end_date = osc_stgb_latest.timeframe_end_date --to match grouping

    WHERE
    osc_stgb_latest.student_id IS NOT NULL  -- redundant with the inner join
    AND osc.percentage IS NOT NULL

    Using joins for things like that can take advantage of a database engine's query-processing optimizations that just aren't available for the concatenation example.

    In your screenshot, you group by timeframe dates (grading period, from what I've seen), but you're not returning those for comparison with the main query. In my example, I added those to the join conditions. If you have multiple grading periods in a scheduling term, and you use those dates for grouping without using them in the comparison, that leaves a possibility you could get some matching records you don't want. If all you want is the most recent grading period represented in each gradebook, then maybe you don't want to group by timeframe dates.

    A somewhat more esoteric approach, which I find useful in cases where the row I want to match is more straightforwardly obtained by ordering than by grouping and aggregating, is to use "DISTINCT ON" syntax.

    FROM gradebook.overall_score_cache osc
    INNER JOIN (  -- most recent score in the cache for each student, gradebook, & timeframe
      SELECT
    DISTINCT ON (
    osc_last.student_id
    , osc_last.gradebook_id
    , osc_last.timeframe_start_date, osc_last.timeframe_end_date
    )
    osc_last.student_id
    , osc_last.gradebook_id
    , osc_last.timeframe_start_date, osc_last.timeframe_end_date
        , osc_last.calculated_at last_calculated_at
      FROM gradebook.overall_score_cache osc_last
      ORDER BY osc_last.student_id
    , osc_last.gradebook_id
    , osc_last.timeframe_start_date, osc_last.timeframe_end_date
    , osc_last.calculated_at DESC NULLS LAST
      ) AS osc_stgb_latest
      ON osc.student_id = osc_stgb_latest.student_id
      AND osc.gradebook_id = osc_stgb_latest.gradebook_id
      AND osc.calculated_at = osc_stgb_latest.last_calculated_at
    AND osc.timeframe_start_date = osc_stgb_latest.timeframe_start_date --to match grouping
    AND osc.timeframe_end_date = osc_stgb_latest.timeframe_end_date --to match grouping

    WHERE
    osc_stgb_latest.student_id IS NOT NULL  -- redundant with the inner join
    AND osc.percentage IS NOT NULL

    I don't see a problem with the second screenshot. I do know that Windows ODBC apps, when the ODBC connection is configured as read-only, don't like it when a query doesn't literally start with the word "select". Brian's suggestion would address an issue like that. In a little more verbose form, it would look like this:

    SELECT * FROM (  -- the real work

    WITH [etc...]
    SELECT [and the rest of the main query, omitting any final semicolon]

    ) AS How_ODBC_Likes_It ;

    I hope that helps!

  • Avatar
    Jon Gaydos

    For your IN problem: Instead of concatenating a bunch of strings together and using IN, I would recommend using EXISTS and move the values you are returning to the WHERE clause of the sub query.

    CURRENT:
    SELECT * FROM table1
    WHERE (a, b, c) IN (
      SELECT a, b, c FROM table2
    )

    BETTER:
    SELECT * FROM table1
    WHERE EXISTS (
      SELECT 1 FROM table2
      WHERE table2.a = table1.a
        AND table2.b = table1.b
        AND table2.c = table1.c
    )

  • Avatar
    Franck Reyherme (Edited )

    @Marianna,

    Have you tried actually running the query? Do you get an error when you actually run the query? Those little red markers in Jaspersoft are definitely not to be trusted. You’ll see those little red markers in Jasper all the time, even when the query is perfectly legitimate.

    If you click the “Read Fields” button in the top right, do you get an error? If you actually preview the report, do you get an error? If not, then you can completely ignore those little red markers on the side.

    If you are getting an error when you actually run the query (not just the little red markers on the side), please provide the entire query in text form (not a screenshot), and I would be more than happy to help you troubleshoot.



    I think that trying to use the IN operator to compare multiple columns is the thing you can’t do in the first example.

    You can definitely have multiple values in a WHERE, IN clause. I see it / use it all the time.


    Alternatively—and generally the faster alternative—you could use a join to match on multiple columns.

    That would definitely work, but I was always taught to never use JOINS for filtering purposes. JOINS are expensive and unnecessary when you’re not actually selecting any data from the joined table/subquery.

  • Avatar
    Marianna Ripple

    Using the WITH name AS statement in the second picture, when I read fields, I only show the fields from the SELECT statement below the WITH.  No matter how I try to join the table tm, it won't show.  That is why I thought the error mattered.

  • Avatar
    Casey Mirch

    Franck, thank you for the correction. I'll be sure to look for uses of (multiple columns) IN (a multi-column subselect) . That definitely renders my suggestions inelegant, at best.

    Marianna, I did notice that your first screenshot shows the columns in a different order on one side of the IN versus the other.

     

Please sign in to leave a comment.