Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block access via automation like puppeteer #266

Open
t2ym opened this issue May 19, 2018 · 2 comments
Open

Block access via automation like puppeteer #266

t2ym opened this issue May 19, 2018 · 2 comments

Comments

@t2ym
Copy link
Owner

t2ym commented May 19, 2018

Block access via automation like puppeteer

Current Status [email protected] in stack branch merged to develop and master branches

Update README.md for the next release 0.2.0

  • List up significant changes on stack branch
    • Pick up from this issue comments
    • Extract from the changes on the branch
    • Additional conceptual explanation on the changes if required
  • <!-- end of mandatory no-hook scripts -->
  • Wrapped global object access of the main document with whitelists
    • Parameters for wrapped global object access
      • const enableDebugging = false in hook-callback.js: true to enable debugging
        • const devToolsDisabled = true in disable-devtools.js: false to enable debugger
      • const wildcardWhitelist = [] in hook-callback.js: for Chrome new Error().stack format
        • new RegExp('^at (.* [(])?' + origin + '/components/'), // trust the site contents including other components
        • new RegExp('^at ([^(]* [(])?' + 'https://cdnjs.cloudflare.com/ajax/libs/vis/4[.]18[.]1/vis[.]min[.]js'),
        • new RegExp('^at ([^(]* [(])?' + 'https://www.gstatic.com/charts/loader[.]js'),
      • const excludes = new Set() : { 'window.Math' }
    • Return undefined for global objects unless they are in whitelists
      • hook-native-api.js deprecated and moved to hook-callback.js
      • Stack class for contextStack instead of Array
    • Transition to about:blank for non-whitelisted global object access
    • Performance overheads on global object acesss
    • How to mitigate the performance overheads on global object access
      • const URL = window.URL, RegExp = window.RegExp, ...
  • Hook changes on __hook__ arguments for global objects
  • Use symbols for contexts
    • $hook$.$ context symbol generator
    • const __context_mapper__ object
      • __context_mapper__ = __ + hex(sha256(context + code)) + __
    • __hook__('op', ThisObject, ['prop'], __context_mapper__[1])
    • $hook$.global(__hook__, __context_mapper__[24], 'o', 'let')[__context_mapper__[25]]
    • __hook__[Symbol()] -> context
    • ACL compatibility
      • Context Symbols are converted to their corresponding strings when handed to ACL callbacks
  • Automate generation of cache-bundle.json
    • gulp tasks
      • get-version
      • cache-bundle-automation-json
    • cacheBundleGeneration.js
    • cache-automation.js
      • hooked with the context https://thin-hook.localhost.localdomain/automation.json
    • cache-bundle.json for cache-automation.js
      • serverSecret - one-time build-time-only secret for validating cache-automation.js
  • developブランチへのマージは少し勇み足。これからREADME等を更新して0.2.0とする予定です。

Proof of Concept Implementation

  • Hook native global object access
    • Load all mandatory no-hook scripts before the first DomContentLoaded event
      • Insert this exact comment <!-- end of mandatory no-hook scripts --> to original-index.html
      • Effects
        • caches is inaccessible at the first domcontentloaded event
        • Theoretically speaking, the main document is single-threaded and the script loading should not be interrupted by other evaluators like script evaluation requests from puppeteer. Since the script evaluation is impossible at Page.frameStartedLoading event, there should be HOPEFULLY no chances of inspecting the main document until domcontentloaded. It is not verified whether any events between Page.frameStartedLoading and domcontentloaded are fired or not.
  // puppeteer client
  page.on('domcontentloaded', async function onDomContentLoaded() {
    console.log('domcontentloaded');
    let result = await page.evaluate(function getCaches() {
      return caches.constructor.name;
    });
    console.log('caches = ', result);
  });
[cacheBundleGeneration   ] goto https://localhost/components/thin-hook/demo/
[cacheBundleGeneration   ] domcontentloaded
[cacheBundleGeneration   ] (node:4362) UnhandledPromiseRejectionWarning: 
  Unhandled promise rejection (rejection id: 1): 
  Error: Evaluation failed: TypeError: Cannot read property 'constructor' of undefined
[cacheBundleGeneration   ]     at getCaches (<anonymous>:2:21)
  • All getter/setter functions are wrapped
    • Native window object properties
    • Object.prototype.constructor (=== Object)
    • EventTarget.prototype object properties such as addEventListener
    • Object.prototype object properties such as __lookupGetter__
    • __proto__
    • Non-native window object properties
    • Worker
      • Dedicated Worker
      • Shared Worker
  • Values are also wrapped via getter/setter functions
  • Block access
    • return undefined for any global object access
      • return undefined for own native properties of window
      • return undefined for own non-native properties of window
      • return undefined for globals from no-hook APIs
    • Handle exceptional objects (hook, $hook$, etc.) properly
      • Block access via __hook__ by using Symbols for contexts
        • About 25% slower
      • Block access via hook.eval('__hook__',...)('script') by using Symbol.for('__hook__')
      • Block access to hook.utils.createHash and hook.utils.HTMLParser via automation
    • about:blank transition
    • Clear caches
  • Handle whitelists
    • Reduce overheads
      • Cache RegExp matching whitelist to full matching whitelist
    • Full matching whitelist
    • RegExp matching whitelist
    • Global object access not permitted by the whitelists is blocked
  • Handle async callbacks
    • Handle async callbacks in no-hook scripts properly
      • Currently, async callbacks in no-hook scripts are mistreated as unexpected global object access via automation
        • Workaround: Add no-hook scripts to the whitelist
  • Allow gulp cache-bundle to fetch the bundle exclusively and securely
    • cache-automation.js to automate navigation for cache collection
    • Hook cache-automation.js and move cache operations to cache-bundle.js
      • Note: Now cache-automation.js performs only UI navigations for cache target UIs
  • Eliminate global object access without contextStack in hooked scripts
    • Read access is always required to call a function
// Original 
Function('return 1');
// Hooked: old
__hook__(Function, null, ['return 1'], 'context')
// Hooked: new (omitted)
__hook__($hook$.global(Function)..., null, ['return 1'], 'context')
  • Update tests

    • Add extra $hook$.global(...) in the expected hooked results
    • Symbols as contexts
  • Add puppeteer tests to cacheBundleGeneration.js

    • Read caches
    • Read this.__proto__.__proto__.__proto__.__proto__.constructor (=== Object)
    • Read non-native global property Polymer
    • Read __lookupGetter__ originally from Object.prototype
    • Read addEventListener originally from EventTarget.prototype
    • Read this.__proto__.__proto__.__proto__.__proto__.__lookupGetter__
    • Read Math.__lookupGetter__
    • Read Math.abs.__lookupGetter__
    • Call __hook__('.', this, ['navigator'], 'context')
    • Call __hook__('.', this, ['navigator'], Symbol.for('context'))
    • Call hook.eval('__hook__')('Object')
    • Read hook.utils.createHash.sha256
  • Reduce Overheads

    • Access global properties only once in hook.min.js
      • thin-hook/lib/*.js scripts
      • t2ym/espree
      • t2ym/escodegen
      • acorn
      • Other components
    • Access global properties only once in hook-callback.js
    • Optimize hooking performance for extra $hook$.global()
  • Issues

    • event is mistreated as a global object in setAttribute('onXX', 'event.target')
      • Root Cause: { event: true } is missing in the initial scope of hooking
    • Add enableDebugging flag in hook-callback.js
    • Object.$__proto__$ is unexpectedly defined by __hook__()
      • Root Cause: The statistics object globalObjectAccess.constructor unexpectedly points to Object while it is expected as a normal property of the globalObjectAccess object
    • Object['/components/polymer/lib/mixins/property-accessors.html,script@741,props'] is unexpectedly defined by __hook__()
      • Root Cause: same as above
    • Add comment "Moved to hook-callback.js" to hook-native-api.js
    • Unpredictable hook prefix
      • Example: hook-prefix=_uNpREdiC4aB1e_
    • compact=false is not supported by context symbols
    • In hook.encodeHtml(), </head></html> is inserted at a wrong position before <!-- end of mandatory no-hook scripts -->

Notes

  • npm run cache-bundle fails as expected for now (automated with cache-automation.js)
[cacheBundleGeneration   ] (node:28710) UnhandledPromiseRejectionWarning: 
  Unhandled promise rejection (rejection id: 1): 
  Error: Evaluation failed: TypeError: Promise is not a constructor
[cacheBundleGeneration   ]     at waitForBundleSetFetched (<anonymous>:2:12)
  • demo/cache-bundle.json can be generated via ?cache-bundle=save option, but can be automated with cache-automation.js
t2ym added a commit that referenced this issue May 19, 2018
t2ym added a commit that referenced this issue May 20, 2018
t2ym added a commit that referenced this issue May 20, 2018
t2ym added a commit that referenced this issue May 20, 2018
…d no-hook scripts before the first DomContentLoaded
t2ym added a commit that referenced this issue May 20, 2018
… -->" to load no-hook scripts before the first DomContentLoaded
t2ym added a commit that referenced this issue May 21, 2018
t2ym added a commit that referenced this issue May 21, 2018
t2ym added a commit that referenced this issue May 21, 2018
t2ym added a commit that referenced this issue May 21, 2018
…l) to use globalObjectAccess.constructor property as a normal one
t2ym added a commit that referenced this issue May 22, 2018
t2ym added a commit that referenced this issue May 22, 2018
t2ym added a commit that referenced this issue May 24, 2018
@t2ym
Copy link
Owner Author

t2ym commented May 24, 2018

Using Symbols for contexts to avoid disguised string contexts via puppeteer

__hook__('op', ThisObject, ['prop'], __context_mapper__[1])
$hook$.global(__hook__, __context_mapper__[24], 'o', 'let')[__context_mapper__[25]]

Notes:

  • ACL is compatible since Symbols are converted to their corresponding string contexts by __hook__[Symbol()]
  • Limitation: Same script in the same URL cannot be loaded to a single document multiple times
  • Names of __context_mapper__ objects cannot be retrieved via window object
    • for (let name in this) { /* name does not match with context mapper names */ }
  • No Symbols can be accessed by puppeteer

New helper function for Symbols

$hook$.$ = function contextSymbolGenerator(symbolToContext, contexts) {
    let result = [];
    let contextToSymbol = {};
    let hookGlobal = hook.global;
    for (let i = 0; i < contexts.length; i++) {
      symbolToContext[result[i] = _Symbol()] = contexts[i];
      contextToSymbol[contexts[i]] = result[i];
      hookGlobal[result[i]] = contextToSymbol;
    }
    return result;
  }

Example (__context_mapper__ is actually __ + hex(sha256(context + code)) + __)

const __context_mapper__ = $hook$.$(__hook__, [
  'examples/example2.js,C',
  '_p_C;examples/example2.js,C',
  'examples/example2.js,C,add',
  'examples/example2.js,C,add,plus'
]);
$hook$.global(__hook__, __context_mapper__[0], 'C', 'class')[__context_mapper__[1]] = class C {
  add(a, b) {
    return __hook__((a = 1, b = 2) => {
      let plus = (...args) => __hook__((x, y) => x + y, null, args, __context_mapper__[3]);
      return __hook__(plus, null, [
        a,
        b
      ], __context_mapper__[2], 0);
    }, null, arguments, __context_mapper__[2]);
  }
};

t2ym added a commit that referenced this issue May 26, 2018
@t2ym
Copy link
Owner Author

t2ym commented May 26, 2018

cache-automation.js to automate cache collection for cache-bundle.json

  • Put a special one-time cache-bundle.json containing cache-automation.js contents and serverSecret, which is a random hex value as below
  • If the one-time serverSecret is lost, the application cannot accept any automation scripts from a special cache-bundle.json since the matching sha256(serverSecret + code) === authorization is checked against authorization from <script src="cache-bundle.js?no-hook=true&authorization={VALUE}">, which is embedded as a fixed value for the build.
// gulpfile.js (extracted)
const serverSecret = crypto.randomFillSync(Buffer.alloc(32)).toString('hex');
const cacheBundlePath = path.join('demo', 'cache-bundle.json');
const cacheAutomationScriptPath = path.join('demo', 'cache-automation.js');
const cacheAutomationScript = fs.readFileSync(cacheAutomationScriptPath, 'UTF-8');
let version = 'version_1';
let authorization; // sha256(serverSecret + cacheAutomationScript)
let hash = hook.utils.createHash('sha256');
hash.update(serverSecret + cacheAutomationScript);
authorization = hash.digest('hex');

gulp.task('get-version', (done) => {
  return gulp.src(['demo/original-index.html'], { base: 'demo' })
    .pipe(through.obj((file, enc, callback) => {
      let html = String(file.contents);
      let versionIndex = html.indexOf('/hook.min.js?version=') + '/hook.min.js?version='.length;
      let versionIndexEnd = html.indexOf('&', versionIndex);
      version = 'version_' + html.substring(versionIndex, versionIndexEnd);
      callback(null, file);
    }))
    .pipe(through.obj((file, enc, callback) => {
      done();
    }));
});

// One-time special cache-bundle.json generation
gulp.task('cache-bundle-automation-json', (done) => {
  fs.writeFileSync(cacheBundlePath, JSON.stringify({
    "version": version,
    "https://thin-hook.localhost.localdomain/automation.json": JSON.stringify({
      "state": "init", // update state in the script to perform operations including reloading
      "serverSecret": serverSecret,
      "script": cacheAutomationScript
    },null,0)
  },null,2))
  done();
});
  • Embed hex(sha256(serverSecret + cacheAutomationScriptCode)) as <script context-generator src="cache-bundle.js?no-hook=true&authorization={HERE}"></script>
  • In cacheBundleGeneration.js, wait for the global value __{serverSecret}__ to obtain the raw cache-bundle JSON and save as cache-bundle.json after normalization. The one-time serverSecret is lost at this overwriting of cache-bundle.json
  // cacheBundleGeneration.js (extracted)
  let rawCacheBundleJSON;
  while (!rawCacheBundleJSON) {
    try {
      rawCacheBundleJSON = await page.evaluate(new Function(`return async function cacheBundle() {
        try {
          return __${serverSecret}__; // the variable disappears once read
        }
        catch (e) {
          return [][0]; // undefined;
        }
      }`)());
    }
    catch (e) {
      // try again
      console.log(e.message);
    }
    await new Promise(resolve => setTimeout(resolve, 5000));
  }
  console.log('cacheBundle raw length = ', rawCacheBundleJSON.length, ' bytes');
  • Example cache-automation.js can be customized for the target application
async function automationFunction() {
  /*
  @license https://github.com/t2ym/thin-hook/blob/master/LICENSE.md
  Copyright (c) 2018, Tetsuya Mori <[email protected]>. All rights reserved.
  */
  const timeoutForBundleSetFetched = 60000; // 60sec
  // wait for bundle-set-fetched event
  await new Promise((resolve, reject) => {
    const start = Date.now();
    let intervalId = setInterval(async () => {
      const now = Date.now();
      if (now - start > timeoutForBundleSetFetched) {
        clearInterval(intervalId);
        reject(new Error('timeout for bundle-set-fetched'));
      }
      try {
        let model = document.querySelector('live-localizer').shadowRoot
          .getElementById('main').shadowRoot
          .getElementById('dialog')
          .querySelector('live-localizer-panel').shadowRoot
          .getElementById('model');
        if (model) {
          clearInterval(intervalId);
          // Note: bundle-set-fetched is the load completion event for live-localizer widget and irrelevant to cache-bundle.json
          model.addEventListener('bundle-set-fetched', (event) => {
            resolve(event.type);
          });
        }
        else {
          // try again
        }
      }
      catch (e) {
        // try again
      }
    }, 1000);
  });

  await new Promise(async (resolve, reject) => {
    try {
      let menuItems = document.querySelector('my-app').shadowRoot
        .children[3] // app-drawer-layout
        .querySelector('app-drawer')
        .querySelector('iron-selector')
        .querySelectorAll('a');
      let result = [];
      for (let i = menuItems.length - 1; i >= 0; i--) {
        menuItems[i].click();
        result.push(menuItems[i].href);
        await new Promise(_resolve => {
          setTimeout(_resolve, 20000); // Note: It is better to wait for specific events or conditions than just for a fixed period
        });
      }
      resolve(result);
    }
    catch (e) {
      reject(e.message);
    }
  });
}
  • Tests for automation attacks via puppeteer are now performed AFTER the cache-bundle.json generation

t2ym added a commit that referenced this issue May 27, 2018
t2ym added a commit that referenced this issue May 28, 2018
…ook.eval('__hook__',...)('Object'), etc.
t2ym added a commit that referenced this issue May 28, 2018
t2ym added a commit that referenced this issue Jun 17, 2018
…ects in hook.min.js and hook-callback.js for perfomrance optimization
t2ym added a commit that referenced this issue Aug 22, 2018
…position of onXX attribute for compact=false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant