Skip to content

[CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect#4933

Open
xuzifu666 wants to merge 7 commits into
apache:mainfrom
xuzifu666:distinct_on_support
Open

[CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect#4933
xuzifu666 wants to merge 7 commits into
apache:mainfrom
xuzifu666:distinct_on_support

Conversation

@xuzifu666

@xuzifu666 xuzifu666 commented May 11, 2026

Copy link
Copy Markdown
Member

@xuzifu666 xuzifu666 force-pushed the distinct_on_support branch 2 times, most recently from 99a667c to a5629e3 Compare May 12, 2026 02:39
@xuzifu666 xuzifu666 changed the title [CALCITE-7517] Support DISTINCT ON clause in SELECT statements [CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect May 12, 2026
@cjj2010

cjj2010 commented May 12, 2026

Copy link
Copy Markdown
Contributor

"DISTINCT ON" is a unique syntax of PGSQL. Would it be more appropriate to implement the parsing support for this function on the babel parser

@xuzifu666

xuzifu666 commented May 12, 2026

Copy link
Copy Markdown
Member Author

"DISTINCT ON" is a unique syntax of PGSQL. Would it be more appropriate to implement the parsing support for this function on the babel parser

  1. Babel's only extensibility is:

Configuring additional keywords, join types, and binary operators in config.fmpp;

Adding custom parser methods (such as PostgreSQL's BEGIN/COMMIT) to includes/*.ftl.

Babel itself does not have a separate syntax file; all SELECT-related syntax must be defined in coreParser.jj.

  1. Consistent with QUALIFY Implementation

QUALIFY is a syntax specific to Snowflake, which Calcite implements in the core:

The <QUALIFY> token is defined in core.Parser.jj.

The SqlSelect class adds a qualify field in the core.

The Validator/SqlToRelConverter logic is in the core; DISTINCT ON uses the exact same pattern.

  1. Conformance Controls Semantics, Not Syntax

The current Parser unconditionally parses DISTINCT ON, but semantic validation is controlled by SqlConformance:

// Using `LENIENT conformance` in testing
`fixture().withConformance(SqlConformanceEnum.LENIENT)`

This is consistent with the handling of other extensions such as QUALIFY, LATERAL, and TABLESAMPLE: the Parser is as lenient as possible, and the Validator decides whether to allow based on the dialect.

so in my view:The current implementation conforms to Calcite's architectural conventions and does not require migration to Babel. For adjustments, finer-grained switches can be added to SqlConformance to control the availability of DISTINCT ON.

By the way: Besides PG, ClickHouse also supports DISTINCT ON.

@xiedeyantu

Copy link
Copy Markdown
Member

I attempted to verify this across several databases (though my testing may not have been exhaustive) and indeed found that only PostgreSQL supports this syntax; furthermore, DISTINCT ON does not appear to be part of the standard SQL specification. I would suggest implementing this within Babel (specifically, by adding a configuration parameter to control it); we can also wait to see if anyone else has any better suggestions.

@xuzifu666

Copy link
Copy Markdown
Member Author

I attempted to verify this across several databases (though my testing may not have been exhaustive) and indeed found that only PostgreSQL supports this syntax; furthermore, DISTINCT ON does not appear to be part of the standard SQL specification. I would suggest implementing this within Babel (specifically, by adding a configuration parameter to control it); we can also wait to see if anyone else has any better suggestions.

Good suggestion, I had add a config to control it in babel.

Comment thread core/src/main/codegen/config.fmpp Outdated
"parserImpls.ftl"
]

includeDistinctOn: true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there’s a better place for this parameter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to add a test case similar to "SELECT DISTINCT ON (deptno)" here? I tried executing "SELECT DISTINCT ON (deptno)" and it didn't generate any error

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I had added related test cases in SqlParserTest.java.

@cjj2010 cjj2010 May 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's a problem with my input method. I intended to add"SELECT DISTINCTON (deptno) empno" and "SELECT DISTINCT ON(deptno) empno". Scenarios like this, where text is written consecutively, should normally result in an error. However, the outcomes of the two situations I executed did not meet my expectations.
https://www.postgresql.org/docs/current/queries-select-lists.html#QUERIES-DISTINCT

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, distinct on needs to be followed by an order by statement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, distinct on needs to be followed by an order by statement.

Perhaps I should elaborate and add test cases similar to "SELECT DISTINCTON (deptno) empno, ename FROM emp ORDER BY deptno, empno"

@sonarqubecloud

Copy link
Copy Markdown

@mihaibudiu

Copy link
Copy Markdown
Contributor

What is the status of this PR?

@xuzifu666

Copy link
Copy Markdown
Member Author

What is the status of this PR?

Currently I should had addressed all comments, maybe need a further review @mihaibudiu If you're interested, please help review this PR~

@BaseMessage("SELECT DISTINCT ON requires an ORDER BY clause")
ExInst<SqlValidatorException> distinctOnRequiresOrderBy();

@BaseMessage("SELECT DISTINCT ON expressions must match initial ORDER BY expressions")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "initial" mean here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be ambiguous, so I deleted the word.

// not in the SELECT list.
final SqlValidatorScope selectScope3 =
select.isDistinct()
(select.isDistinct() && !select.isDistinctOn())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for distinctOn() isDistinct is true as well?
Or you can have SELECT DISTINCT DISTINCT ON(...)?
If yes, I hope there is a test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then you won't need an aggregate because the result has only 1 row?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then you won't need an aggregate because the result has only 1 row?

Yes, that’s the reason. I also added tests for DISTINCT and DISTINCT ON; in fact, this syntax isn't supported and results in an immediate error.

@xuzifu666 xuzifu666 Jun 19, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT DISTINCT DISTINCT ON(...) is also not support in PostgreSQL, we can refer this link: https://onecompiler.com/postgresql/44sqp4qfd

*
* <ul>
* <li>0: distinct ({@link SqlLiteral})</li>
* <li>0: keywordList ({@link SqlNodeList})</li>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there were some bugs in this comment...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no need. I changed it.

.ok();

// DISTINCT ON with qualified column reference
f.withSql("SELECT DISTINCT ON (e.deptno) e.deptno AS x, e.empno "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work if the ORDER BY column is not qualified?
Maybe there is such a test already

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It work well, I also added related test in last commit.

includeIntervalWithoutQualifier: true
includeStarExclude: true
includeSelectBy: true
includeDistinctOn: true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to edit the reference.md grammar too, if it's possible to show babel-only extensions

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, done.


if (select.isDistinct()) {
distinctify(bb, true);
if (!select.isDistinctOn()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently it's impossible to have both DISTINCT and DISTINCT ON, so perhaps isDistinct() should return 'false' for DISTINCT ON

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I changed here.

@mihaibudiu

Copy link
Copy Markdown
Contributor

What's going on with this PR?

@xuzifu666

Copy link
Copy Markdown
Member Author

What's going on with this PR?

Sorry! I missed this message earlier. I will fix the address related issues soon. Thank you very much for the review. @mihaibudiu

@xuzifu666 xuzifu666 force-pushed the distinct_on_support branch from f7012ed to 9107937 Compare June 18, 2026 07:10
@xuzifu666

Copy link
Copy Markdown
Member Author

I think I had addressed the comments, PTAL @mihaibudiu

@xuzifu666 xuzifu666 requested a review from mihaibudiu June 18, 2026 17:15
SELECT * REPLACE list contains unknown column(s): DEPTNO
!error

# [CALCITE-5406] Support the SELECT DISTINCT ON statement for PostgreSQL dialect

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made this suggestion before: if these were validated on postgres, pleas say so in the comment, to spare future reviewers the need to validate these. I have seen tests with incorrect results in the past.

@xuzifu666 xuzifu666 Jun 19, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sincerely apologize; this taught me a valuable lesson. This time, I conducted rigorous testing and cross-verification using PostgreSQL, and the results matched perfectly. Comment had been added and refer to the result: https://onecompiler.com/postgresql/44sqjg9t6

fetch = operand;
break;
case 11:
hints = (SqlNodeList) operand;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a bug you are fixing

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pre-existing bug. While adding case 12 for DISTINCT ON, I also filled in the missing case 11 (hint handling). The code is now correct.

return distinctOn;
}

public void setDistinctOn(@Nullable SqlNodeList distinctOn) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?
In general immutable data structures are preferred.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is not needed; the corresponding code has been removed.

@xuzifu666

Copy link
Copy Markdown
Member Author

I’ve addressed the recent relevant comments; please help to review them again. @mihaibudiu

@sonarqubecloud

Copy link
Copy Markdown

@mihaibudiu

Copy link
Copy Markdown
Contributor

I have approved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants