Greetings, Illuminate geeks!
This is the first in what I hope will be a series of posts about good tricks for querying the Illuminate back-end with SQL.
A look at Schema Spy or the back-end tables in the Illuminate database reveals a beautifully normalized data model built with obvious care. On the flip side, representing a complicated reality with such attention to good relational design results in a data model with lots of tables. More tables and more joins to include in one’s FROM clauses naturally engenders more questions about whether one is using the right tables and joining them correctly. Wouldn't it be nice to be able to see how Illuminate pulls familiar data together internally, like to see what sort of SQL queries define the data available in Custom Reporting Categories?
Fortunately, that’s entirely possible. Owing to another bit of elegant and clever design, most of the Categories and Columns available as Custom Reporting Metadata are defined by data in your database--right down to the SQL queries. The data model is more than a little bit tricky, but it’s all in the reports schema. To take a simple approach, you can view the data in reports.db_tables. The system_key and description columns do an adequate job of representing the notional entity described in each row. Everything interesting has values in the schema_name and table_name columns, or else in the sql_query column. If the reporting entity only involves a single table, schema_name plus table_name point the way to it. If it’s more complicated, then the sql_query column should contain the relevant SQL statement. (Be warned that many of the entries in the sql_query column start with one or more blank lines. Don’t be confused if your database client only shows the first line of multi-line field values.)
The system_key and description don’t match up with Category names as we know them from Custom Reporting, exactly. It would be nice to connect those table pointers or SQL queries with recognizable Category names. Here’s a query to do that:
-- Returns back-end data defining Custom Reporting Category metadata.
-- Most subcategories reference either a specific schema and table by name, or else a SQL query that defines the category.
-- The SQL queries are useful for understanding how back end tables go together to create recognizable reporting data.
-- There doesn't seem to be a column or table to specify the top-level "Data Type" classification of a reporting Category, so this does some decent guessing.
WHEN Cat.system_key ~ '^national_assessments.' -- Tilde does a regular expression match
THEN 'State Assessments'
WHEN Cat.system_key ~ '^state_data_.'
THEN 'State Assessments'
WHEN Cat.system_key ~ '^ontrack.'
WHEN DT.schema_name = 'dna_repositories'
AND (SELECT DRR.repository_type FROM dna_repositories.repositories DRR WHERE DRR.db_virtual_table_id = VT.db_virtual_table_id LIMIT 1 ) = 'summary_assessment'
THEN 'Summary Assessment'
WHEN DT.schema_name = 'dna_repositories'
AND (SELECT DRR.repository_type FROM dna_repositories.repositories DRR WHERE DRR.db_virtual_table_id = VT.db_virtual_table_id LIMIT 1 ) = 'demographic'
THEN 'Demographics (Non-Core Data)'
WHEN Cat.system_key LIKE 'dna_assessment_%mapping'
THEN 'Assessment' -- maybe? or maybe Assessment View?
WHEN Cat.system_key IS NOT NULL
THEN 'Core Data'
END "CR Data Type"
, Cat.disabled "Category disabled"
, Cat.system_key "Category system_key"
, Cat.new_meta_data "Category new_meta_data"
, DT.system_key --varchar
, DT.description --text
, STRING_AGG( VC.display_name, '; ' ORDER BY VC.sort_name, VC.display_name ) "Category Column list"
, DT.schema_name --varchar
, DT.table_name --varchar
, LTRIM(DT.sql_query, CHR(13)||CHR(10)||CHR(9)||' ') sql_query -- Strips out leading white space before SQL text common in this column
, DT.is_global --bool
, DT.db_table_id --int4
-- , DT.new_meta_data "DB Table new_meta_data" --bool
-- , DT.last_updated --timestamp
reports.data_categories Cat -- Category-level configuration attributes for page=ReportsCreatorManageData. Excludes assessments and repositories.
FULL OUTER JOIN reports.column_category_aff CCA
ON Cat.data_category_id = CCA.data_category_id
FULL OUTER JOIN reports.db_virtual_columns VC -- Column-level configuration attributes for page=ReportsCreatorManageData
ON CCA.db_virtual_column_id = VC.db_virtual_column_id
LEFT OUTER JOIN reports.db_columns DC -- Specifies column names and data types found in source data/output
ON VC.db_column_id = DC.db_column_id
LEFT OUTER JOIN reports.db_tables DT -- Specifies native table or SQL query from which to retrieve source data. Sub-Category level.
ON DC.db_table_id = DT.db_table_id
LEFT OUTER JOIN reports.db_virtual_tables VT -- Looks like data to help Custom Reports join query/table output in reports involving multiple sub-Categories
ON DT.db_table_id = VT.db_table_id
AND VC.db_virtual_table_id = VT.db_virtual_table_id
Cat.data_category_id IS NOT NULL -- Excludes individual assessments and repositories, which have db_virtual_columns but no data_category
-- AND ( Cat.disabled = TRUE OR VC.hide = TRUE ) -- If active, excludes Categories unavailable for use in Custom Reports
-- AND Cat.category_name ILIKE 'Contacts - Basic' -- If active, filters list to the specified front-end Category
, VT.db_virtual_table_id -- Used in Category Type calculation
"CR Data Type"
, Cat.system_key, Cat.category_name
, DT.system_key, DT.db_table_id
Execute that against your Illuminate database and you’ll probably notice that a lot of familiar Category names from Custom Report appear in multiple rows of output from this query. What’s actually defined by a given row in reports.db_tables might be described most accurately as a sub-Category. My theory is that the extra level of specificity in the back-end definition permits some Columns in a Category to be involved in joins only when they’re invoked in a given Custom Report. There are tables in the reports schema that describe how these sub-Categories should be joined with other data, so breaking Categories up into sub-Categories in the back end can help Custom Reports do a better job yielding just the desired number of rows of output and no more. For instance, if you include the 'Contacts - Advanced' Category in custom report, you’ll commonly get more rows for a given group of students if you include phone number information than if you just include contact names without phone numbers. Each Contact can have many phone numbers, but the Custom Report only behaves as if phone numbers participate in a join when those Columns are pulled onto the report’s Manage Columns page.
Just so everyone knows where I’m coming from, this is all based on explorations of the data in one database. I haven’t checked the design with Illuminate developers, so your mileage may vary. Please do comment on the post to let people know how well this works or doesn’t work for you. And if there are any Illuminate devs reading this, by all means set me straight on any inaccuracies.
So here’s a final thought: if all those very functional but not-always-just-what-you-want Categories in Custom Report are just data in the database, what if you could add your own? What if you, the local SQL geek, could use your skills to build homegrown Categories that supplement gaps in the existing metadata, or that return Illuminate data within Custom Reports in ways that are better suited to local quirks of using or thinking about your data? What if you could quit posting requests for Custom Report metadata to the Ideas Portal and just make those things happen? Have a look at my Aha idea for just that; I’d be grateful for a vote or comment.
Leadership Public Schools
Please sign in to leave a comment.