Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add RegExp support to PoolCluster #3451

Merged

Conversation

uPaymeiFixit
Copy link
Contributor

@uPaymeiFixit uPaymeiFixit commented Mar 4, 2025

mysqljs/mysql has been updated to support regular expressions in PoolCluster. This brings most of that code to mysql2.

Closes #3450
Closes #503

Co-authored-by: Weslley Araújo <[email protected]>
@wellwelwel
Copy link
Collaborator

Thanks, @uPaymeiFixit 🙋🏻‍♂️

Do you think it would be possible to include a failing test without your changes? I would try an alternative similar to the original (without using RegExp) before creating regular expressions dynamically.

Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.08%. Comparing base (e70160b) to head (140228a).
Report is 16 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3451      +/-   ##
==========================================
+ Coverage   88.97%   89.08%   +0.11%     
==========================================
  Files          86       86              
  Lines       13527    13531       +4     
  Branches     1564     1569       +5     
==========================================
+ Hits        12035    12054      +19     
+ Misses       1492     1477      -15     
Flag Coverage Δ
compression-0 89.08% <100.00%> (+0.11%) ⬆️
compression-1 89.08% <100.00%> (+0.11%) ⬆️
static-parser-0 86.66% <100.00%> (+0.11%) ⬆️
static-parser-1 87.44% <100.00%> (+0.11%) ⬆️
tls-0 88.50% <100.00%> (+0.11%) ⬆️
tls-1 88.85% <100.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wellwelwel wellwelwel changed the title Add RegExp support to PoolCluster feat: add RegExp support to PoolCluster Mar 4, 2025
@wellwelwel
Copy link
Collaborator

About the lint errors, you can run npm run lint:fix 🙋🏻‍♂️

@uPaymeiFixit
Copy link
Contributor Author

@wellwelwel I'm not sure if I understand what you mean by a failing test without my changes.

Maybe a useful test would be checking that if you run PoolCluster.add('a', poolConfig) and then run PoolCluster.getConnection('aa') a POOL_NOEXIST error is thrown?

@wellwelwel
Copy link
Collaborator

wellwelwel commented Mar 5, 2025

I'm not sure if I understand what you mean by a failing test without my changes.

This PR includes both a feature (support for RexExp selectors for Pool Cluster) and a fix (ensure the start/end of a pattern).

It would be great to include a test covering the RegExp pattern and other one ensuring the selector respect the correct pattern, similar to what you did in #3450:

import mysql from "mysql2/promise";

(async () => {
  const poolCluster = mysql.createPoolCluster();
  poolCluster.add("129", {
    user: "root",
    database: "exampledb",
  });
  poolCluster.add("12", {
    user: "root",
    database: "exampledb",
  });

  let connection = await poolCluster.getConnection("125");
  // connection may be ID 129 or ID 12. Should be POOL_NOEXIST error.
  await connection.query("SELECT 1");

  poolCluster.end();
})();
  • But using a test approach (assertion).

Maybe a useful test would be checking that if you run PoolCluster.add('a', poolConfig) and then run PoolCluster.getConnection('aa') a POOL_NOEXIST error is thrown?

That's a good idea 🙋🏻‍♂️


EDIT: In #3261, we created a new test for Pool Cluster. Just rebasing to trigger it here.

@wellwelwel wellwelwel added the mysqljs/mysql incompatibilities Previously: feligxe-mysql-incompatibilities label Mar 6, 2025
@uPaymeiFixit
Copy link
Contributor Author

@wellwelwel I think I've got the fix-related test down, but I'm struggling a bit with the RegEx test.

I've created a new file, test/unit/pool-cluster/test-regex-pattern.test.cjs with this contents:

'use strict';

const { assert } = require('poku');
const common = require('../../common.test.cjs');
const process = require('node:process');

// TODO: config poolCluster to work with MYSQL_CONNECTION_URL run
if (`${process.env.MYSQL_CONNECTION_URL}`.includes('pscale_pw_')) {
  console.log('skipping test for planetscale');
  process.exit(0);
}

const cluster = common.createPoolCluster();

const poolConfig = common.getConfig();
cluster.add('SLAVE1', poolConfig);

cluster.of(/SLAVE[12]/).getConnection((error, connection) => {
  assert.equal(connection._clusterId, 'SLAVE1', 'should match regex');
});

cluster.end();

And it's throwing this error:

  uncaughtException Error: Can't add new command when connection is in closed state
      at PoolConnection._addCommandClosedState (/Users/joshuagibbs/git/uPaymeiFixit/node-mysql2/lib/base/connection.js:159:17)
      at PoolConnection.end [as _realEnd] (/Users/joshuagibbs/git/uPaymeiFixit/node-mysql2/lib/base/connection.js:904:26)
      at Pool.end (/Users/joshuagibbs/git/uPaymeiFixit/node-mysql2/lib/base/pool.js:124:18)
      at PoolCluster._removeNode (/Users/joshuagibbs/git/uPaymeiFixit/node-mysql2/lib/pool_cluster.js:360:17)
      at PoolCluster._increaseErrorCount (/Users/joshuagibbs/git/uPaymeiFixit/node-mysql2/lib/pool_cluster.js:318:10)
      at /Users/joshuagibbs/git/uPaymeiFixit/node-mysql2/lib/pool_cluster.js:344:14
      at /Users/joshuagibbs/git/uPaymeiFixit/node-mysql2/lib/base/pool.js:37:37
      at process.processTicksAndRejections (node:internal/process/task_queues:77:11) {
    fatal: true
  }

Any ideas?

@wellwelwel
Copy link
Collaborator

And it's throwing this error:

This error is due to the callback not waiting for execution to end the connection. You can end the connection before starting the assertions or use a promise-based connection 🙋🏻‍♂️

@uPaymeiFixit
Copy link
Contributor Author

I've added what I think are valid tests. I'm not familiar with poku or the folder structure for these tests, so it's entirely possible they're in the wrong place. Please let me know if they could be better organized.

@wellwelwel
Copy link
Collaborator

wellwelwel commented Mar 6, 2025

@uPaymeiFixit, assert.fails trigger process.exit(1) instead of throw an error. Is that the intention?

I'm not familiar with poku or the folder structure for these tests

Poku API follows a similar usage as it is for Node.js (poku.io/docs/documentation/assert), for example:

  • 🐢 Node.js
import assert from 'node:assert';
import test from 'node:test';

test('My Test', () => {
  assert(true);
  assert.ok(true);
});


// In Node.js, asynchronous tests do not follow the standard JavaScript flow/behavior
test('My Test', async () => {
  assert(true);
  assert.ok(true);
});
  • 🐷 Poku
import { test, assert } from 'poku';

test('My Test', () => {
  assert(true);
  assert.ok(true);
});

// In Poku, asynchronous tests follow the standard JavaScript flow/behavior
await test('My Test', async () => {
  assert(true);
  assert.ok(true);
});

@wellwelwel
Copy link
Collaborator

I've seen that mysqljs/mysql uses this approach:

  const source = pattern
    .replace(/([.+?^=!:${}()|[\]/\\])/g, '\\$1')
    .replace(/\*/g, '.*');

  return new RegExp(`^${source}$`);

But I'd like to test first if it's possible to get the same result without building RegExp dynamically. However, the current state of this PR looks good to me 🙋🏻‍♂️

@uPaymeiFixit
Copy link
Contributor Author

Off the top of my head I can't think of a clean way to achieve the same result without building the regular expression manually. This comes to mind, but it's not fully compatible with mysqljs/mysql. Are you more worried about performance, or clarity?

--- lib/pool_cluster.js
+++ lib/pool_cluster.js
@@ -269,7 +269,10 @@ class PoolCluster extends EventEmitter {
     let foundNodeIds = this._findCaches[pattern];
 
     if (foundNodeIds === undefined) {
-      const expression = patternRegExp(pattern);
+      let expression = pattern;
+      if (typeof pattern === 'string' && pattern.at(-1) === '*') {
+        expression = patternRegExp(pattern);
+      }
 
       foundNodeIds = this._serviceableNodeIds.filter((id) =>
         id.match(expression)

@wellwelwel
Copy link
Collaborator

wellwelwel commented Mar 6, 2025

Are you more worried about performance, or clarity?

Performance is a consideration, but my concern is particularly about security against ReDoS.

In this project you can see how I usually handle recursions to create safe dynamics "matches" in a similar result of a dynamic RegExp: https://github.com/wellwelwel/jsonc.min/blob/main/src/index.ts.

But it's not a requirement, it's more of a curiosity. I'll try something and I'll come back with some idea or make sure there's no ReDoS possibility.

@wellwelwel
Copy link
Collaborator

wellwelwel commented Mar 7, 2025

I tried something and, performance-wise, I suppose it's better, but complexity without using RexExp dynamically is quite high:

_findNodeIds(pattern, includeOffline) {
  let currentTime = 0;
  let foundNodeIds = this._findCaches[pattern];

  if (foundNodeIds !== undefined) {
    if (includeOffline) {
      return foundNodeIds;
    }

    return foundNodeIds.filter((nodeId) => {
      const node = this._getNode(nodeId);
      if (!node._offlineUntil) return true;

      currentTime = currentTime || getMonotonicMilliseconds();
      return node._offlineUntil <= currentTime;
    });
  }

  if (pattern instanceof RegExp) {
    // RegExp
    foundNodeIds = this._serviceableNodeIds.filter((id) => pattern.test(id));
  } else if (pattern === '*') {
    // all
    foundNodeIds = this._serviceableNodeIds;
  } else if (this._serviceableNodeIds.indexOf(pattern) !== -1) {
    // one
    foundNodeIds = [pattern];
  } else if (pattern.indexOf('*') !== -1) {
    // wild matching
    const parts = pattern.split('*');
    foundNodeIds = this._serviceableNodeIds.filter((id) => {
      let idToCheck = id;

      if (parts[0] && !idToCheck.startsWith(parts[0])) {
        return false;
      }

      idToCheck = idToCheck.slice(parts[0].length);

      for (let i = 1; i < parts.length - 1; i++) {
        const part = parts[i];
        if (part === '') continue;

        const index = idToCheck.indexOf(part);
        if (index === -1) return false;

        idToCheck = idToCheck.slice(index + part.length);
      }

      const lastPart = parts[parts.length - 1];
      if (lastPart && !idToCheck.endsWith(lastPart)) {
        return false;
      }

      return true;
    });
  } else {
    foundNodeIds = [];
  }

  this._findCaches[pattern] = foundNodeIds;

  if (includeOffline) {
    return foundNodeIds;
  }

  return foundNodeIds.filter((nodeId) => {
    const node = this._getNode(nodeId);

    if (!node._offlineUntil) {
      return true;
    }

    if (!currentTime) {
      currentTime = getMonotonicMilliseconds();
    }

    return node._offlineUntil <= currentTime;
  });
}

On the other hand, the dynamic RexExp sanitization has been used for years by mysqljs/mysql without problems, so I think it's a safe alternative (and much simpler).

@uPaymeiFixit and @sidorares, please let me know what do you think 🙋🏻‍♂️

@uPaymeiFixit
Copy link
Contributor Author

I think the only way ReDoS is possible is if the user provides an instanceof RegExp as the pattern (because if they provide a string all of the regex special characters are escaped), in which case in my opinion it's up to them to sanitize the expression.

I can appreciate the effort that went into experimenting with that alternative, but I agree it feels a little complex. Maybe it could be a little easier if some of that logic gets broken out into another function, but I think my vote is still dynamic regular expression.

@wellwelwel wellwelwel added the needs typings Typings for new features label Mar 8, 2025
@wellwelwel
Copy link
Collaborator

wellwelwel commented Mar 10, 2025

Coming back, the concern about security remains. The problem is not the use of Regex as an instance. In this case, the responsibility for sanitizing and validating the regular expression lies with the user who creates their own RegExp.

My concern is with the process of dynamically creating strings into regular expressions, especially considering users rely on this value being assigned to a string and have never been introduced to pre-validating these value as a final RegExp (as this is a new feature being included):

  • getConnection(
    group: string,
    callback: (
    err: NodeJS.ErrnoException | null,
    connection: PoolConnection
    ) => void
    ): void;
    getConnection(
    group: string,
    selector: string,
    callback: (
    err: NodeJS.ErrnoException | null,
    connection: PoolConnection
    ) => void
    ): void;

Still on sanitization, the current logic normalizes RegExp's "magic" characters to guarantee their functionality and doesn't exactly sanitize them:

pattern
    .replace(/([.+?^=!:${}()|[\]/\\])/g, '\\$1')
    .replace(/\*/g, '.*')
  • The normalization step leads CodeQL misunderstanding that the data is in fact being sanitized.

in which case in my opinion it's up to them to sanitize the expression.

I understand it's unusual, but here are two RCE vulnerabilities fixed with severity 9.8 and valid PoCs of situations which, in theory, we expect to be the responsibility not of the final user, but of the developer:

@uPaymeiFixit
Copy link
Contributor Author

The normalization step leads CodeQL misunderstanding that the data is in fact being sanitized.

If that's the case, I agree. Because it certainly has me fooled.

Just to help me understand, can give an example of a string which would not be properly sanitized by .replace(/([.+?^=!:${}()|[\]/\\])/g, '\\$1')? I wonder if there's a regular expression which could properly sanitize it?

@wellwelwel
Copy link
Collaborator

wellwelwel commented Mar 11, 2025

If that's the case, I agree. Because it certainly has me fooled.

My previous comment implied that normalization is unsafe, but what I meant was about nullifying the effects of all special characters. Sorry, that wasn't the intention 🙋🏻‍♂️

In fact, normalization is almost complete. I assume it would be enough to add the * character to the escaped characters and removing the .* replacement. This way, we avoid unexpected behavior by those who used selectors as string:

Screenshot 2025-03-11 at 16 49 40

@wellwelwel
Copy link
Collaborator

wellwelwel commented Mar 11, 2025

@sidorares, do you know why mysqljs/mysql allows the .* for string conversion?

Just found the answer in the docs:

So in that case it makes sense not to normalize *.

Since this behavior is expected and documented in mysqljs/mysql, I believe that this matter can be considered resolved. Thank you for your patience, @uPaymeiFixit 🤝

@wellwelwel wellwelwel merged commit 2d5050d into sidorares:master Mar 11, 2025
100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first contribution mysqljs/mysql incompatibilities Previously: feligxe-mysql-incompatibilities needs typings Typings for new features
Projects
None yet
2 participants