Skip to content

Conversation

@owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Jan 13, 2026

The first commit was done by Opus 4.5. I then rebased it after C++: Support models-as-data barriers and barrier guards was merged and deleted most of its work.

@owen-mc owen-mc added the no-change-note-required This PR does not need a change note label Jan 13, 2026
@github-actions github-actions bot added C++ and removed no-change-note-required This PR does not need a change note labels Jan 13, 2026
This commit was done by Opus 4.5 with the following prompt:

In the commit 004d40e I have made it so that C# CodeQL queries which use sinks defined using data extensions (also known as "models-as-data"), which are accessed using `sinkNode(Node node, string kind)`, also use barriers defined using models-as-data, which are accessed using `barrierNode(Node node, string kind)`, with the same `kind` string. Please do the same for C++. If there are any complicated cases then list them at the end for me to do manually.
@owen-mc owen-mc added the no-change-note-required This PR does not need a change note label Jan 21, 2026
@owen-mc owen-mc marked this pull request as ready for review January 21, 2026 16:26
@owen-mc owen-mc requested a review from a team as a code owner January 21, 2026 16:26
Copilot AI review requested due to automatic review settings January 21, 2026 16:26
@owen-mc owen-mc changed the title C: Allow MaD barriers C++: Allow MaD barriers Jan 21, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables Models-as-Data (MaD) barriers for SQL injection detection in C/C++ code. The change allows SQL injection barriers to be defined using the extensible Models-as-Data framework, complementing the existing hardcoded barrier for integral types.

Changes:

  • Added MaD barrier support to the SQL injection query's isBarrier predicate

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jketema
Copy link
Contributor

jketema commented Jan 21, 2026

It seems this might also need a test?

@owen-mc
Copy link
Contributor Author

owen-mc commented Jan 22, 2026

@jketema The PR that added the ability to specify barrier nodes added a test of that (cpp/ql/test/library-tests/dataflow/ir-barrier-guards). So I feel as if the functionality is tested.

With the other languages I've done this for, there have been existing sanitizers that I can convert, so that has acted as a kind of test for these extension points. Now that I look closer, I see that there are two classes extending SqlBarrierFunction that could possibly be converted to MaD. However, I'm not really familiar enough with our C++ library, or how it implements MaD, to do that easily. Do you think you could do it?

@jketema
Copy link
Contributor

jketema commented Jan 22, 2026

But that ir barrier guard test does not exercise the change made here, or does it?

@jketema
Copy link
Contributor

jketema commented Jan 22, 2026

However, I'm not really familiar enough with our C++ library, or how it implements MaD, to do that easily. Do you think you could do it?

I can have a look.

@jketema
Copy link
Contributor

jketema commented Jan 22, 2026

Looks like the MySql one should be easy to rewrite. However, it seems there is something a bit off with the query. The use of isBarrierIn seems incorrect. It seems that all that code needs to be in isBarrier (which it was at some point).

It also seems we have no test cases for the MySql barriers. We have internal tests.

I'll try fixing those two issues first.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

This may be C++ specific, but how do MaD summary models and MaD barrier models interact?

I tried to rewrite the MySQL barrier model to use MaD, but realized that the tests weren't triggering an injection without the model either, because we don't model flow through the escaping functions. So, I though I'd add both a summary and a barrier model:

# partial model of the MySQL api
extensions:
  - addsTo:
      pack: codeql/cpp-all
      extensible: summaryModel
    data: # namespace, type, subtypes, name, signature, ext, input, output, kind, provenance
      - ["", "", False, "mysql_real_escape_string", "", "", "Argument[*2]", "Argument[*1]", "taint", "manual"]
      - ["", "", False, "mysql_real_escape_string_quote", "", "", "Argument[*2]", "Argument[*1]", "taint", "manual"]
  - addsTo:
      pack: codeql/cpp-all
      extensible: barrierModel
    data: # namespace, type, subtypes, name, signature, ext, input, kind, provenance
      - ["", "", False, "mysql_real_escape_string", "", "", "Argument[*2]", "sql-injection", "manual"]
      - ["", "", False, "mysql_real_escape_string_quote", "", "", "Argument[*2]", "sql-injection", "manual"]

Quick-eval'ing isBarrier with you change, I see a results for mysql_real_escape_string calls (I obviously removed the relevant bits of MqSql.qll first). However, the internal queries that test this still give me injection alerts. It somehow seems like the taint model takes preference over the barrier model?

WIP branch: https://github.com/github/codeql/tree/jketema/sql-barrier

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

it seems there is something a bit off with the query. The use of isBarrierIn seems incorrect. It seems that all that code needs to be in isBarrier (which it was at some point).

@owen-mc I merged a fix for this.

@MathiasVP
Copy link
Contributor

MathiasVP commented Jan 23, 2026

Oh! I know why this doesn't work. It comes from a misunderstand of the "Argument" column of the new barrier MaD models.

Looking at the shared code for how these rows are interpreted we see that the column @jketema specified is interpreted as an "output" column and not an "input" column (let's ignore C++ specific details such as indirections for this discussion).

This means that for a call such as mysql_real_escape_string(mySql, to, from, length) the barrier specified by the new MaD barrier model places the barrier on from [post update] and not on from.

However, the summary specified for mysql_real_escape_string gives flow from from to to [post update]. So as the node from [post update] is never part of the dataflow path the barrier doesn't do anything.

If you place the barrier on Argument[*1] instead I expect that things will Just Work.

I think this is a mistake in the design of the new MaD rows. It should be possible to place a barrier not only on output columns, but also on input columns ... but I can also see the benefit of this simple model (i.e., "it's always interpreted as an output"), but it should probably be documented.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

🤦‍♂️ I think using the output argument (1st argument) is fine here. I think I just got confused by a bit too many copy-and-pastes. And I also got confused by the MySql.qll code using both the input and the output.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

@owen-mc I've opened a PR against your branch here: owen-mc#5

C++: Add MySQL MaD taint and barrier models
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming Owen looked my changes over, this LGTM. I'm currently still running DCA, so want to hold off merging until that comes back ok.

@owen-mc
Copy link
Contributor Author

owen-mc commented Jan 23, 2026

Yes, I reviewed your changes. Does Argument[*1] mean "the thing pointed to by Argument[1]"? We have very occsaional need for expressing that in Go models and I've picked a different syntax (Argument[1].Dereference), but it would probably be better to be consistent with C++.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

Does Argument[*1] mean "the thing pointed to by Argument[1]"?

Correct. Note that multiple * can occur, and also an @ symbol. The latter indicates an arbitrary under of indirections (deferences) up to a certain maximum hard coded in the dataflow library.

@jketema
Copy link
Contributor

jketema commented Jan 23, 2026

DCA was uneventful.

@jketema jketema merged commit e360800 into github:main Jan 23, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ documentation no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants