using enc_utf16() in prepare() and valid() functions is not working right. #2252

jfoug · 2016-08-31T17:59:50Z

Reference original issue #2243

Known formats using utf encoding in prepare/valid functions (note gpu will/should have same issue)

o3logon (valid)
mscash (valid and prepare calls valid)
mscash2 (valid and prepare calls valid)
as400-ssha1 (Convert)
oracle (prepare)

These appear to be the formats currently standing, which want to use the enc_to_utf16*() type functions (I did not check things like strlen16() or other unicode usages).

magnumripper · 2016-08-31T18:05:56Z

Here's some things the current, messy, implementation tries to achieve:

There are some defaults in config file: DefaultEncoding, DefaultMSCodepage and DefaultInternalCodepage.
Command-line encoding options override config file ones. BUT when we parse options we haven't yet loaded a config. That's a mess already.
Some formats (eg. LM) has other config-file defaults than the rest (ie. DefaultMSCodepage vs. the legacy default of ISO-8859-1). But at the time we are loading config, we don't yet know what format we'll be using.
At the time we do know that we should use DefaultMSCodepage, we still need to keep track of whether some encoding was set with command-line option. Because if it was, that is what we use.
Once all three of input, internal and target encodings are known, there are some logic that kicks in if the format uses UTF-16 internally. For example, if we ended up as running NT with UTF-8 -> CP850 -> UTF-8, it will silently change it to UTF-8-> CP850 -> CP850 because the format doesn't care about that difference but this is a huge performance optimization due to less conversions back and forth.

The list will grow as I recall more 😵

magnumripper · 2016-08-31T18:08:25Z

Please note that any core changes related to this issue has to go to a topic branch. That includes myself. This is really really tricky shit, I know that from the past. There are sooo many parameters and ways of starting john (let alone resuming).

jfoug · 2016-08-31T18:08:47Z

The list will grow as I recall more 😵

I figured as much. This 'seems' like not that big of a deal, but I know that encoding issues is MESSY.

jfoug · 2016-08-31T18:09:42Z

I agree on the branch. There was no way I was going to make any change, only to have 9 months go by before obscure stuff shows up totally busted.

jfoug · 2016-09-01T01:36:57Z

From #2243

So I propose that dc632b3 is the only thing we actually do for now.

I think if we make sure that all current formats are properly protected from this issue (formats listed in OP of this thread) then at least we buy time to look at this. Encoding is very powerful, but damn, it sux digging through complexities. Kinda like having something obscure in dynamic not work. Some little looking issue can end up being 100 lines changed in 8 places.

magnumripper · 2016-09-01T18:21:16Z

IRL this is much less of an issue than it may seem. I believe the problems are only showing up when using input files that are not encoded in UTF-8? At least that was the case in #2243.

All the affected formats are best used with UTF-8 input files. In the unlikely case a user picked some legacy encoding instead, he really should specify it on the command line and not rely on any heuristics (the heuristics are mainly for LM which is somewhat of a special case for several reasons).

So possibly this is a "solution":

Change the o3logon fix so it emits an informative warning (once) when we see that "negative length". Something along the lines of "Input is not UTF-8, you need to specify --input-encoding".
Ensure the other affected formats (listed in OP) gets same protection against segfault + warning output.

magnumripper · 2016-09-04T18:09:31Z

Please test cd4cf0d. It now bails with error like this:

$ ../run/john test.in -form:o3logon -w
Warning: invalid UTF-8 seen reading test.in
o3logon: Input file is not UTF-8. Please use --input-enc to specify a codepage.

Similar fixes made to mscash and oracle formats.

magnumripper · 2016-09-04T18:19:57Z

An even simpler fix made to as400-ssha1 in 33e4c9f.

jfoug · 2016-09-09T13:01:24Z

Are you sure you want to error() in this valid? I would think a 1 time message, and then return 0 for the bad hashes would be more proper for valid. If the user calls john with no -format= then each format's valid gets blindly called. Would suck to have an input file that john bailed out on, especially since the format does NOT know if it is being used, OR if it is just being queried for validity.

jfoug · 2016-09-10T20:37:34Z

9037a83 and 2684366

magnumripper · 2016-09-10T20:41:53Z

You are right. Thanks!

jfoug · 2016-09-10T23:11:41Z

Ok, this 'buys' us a bandage, until we can figure out a real solution, OR close this with a document listing that encoding usage is not valid until after init() has been called.

if we're not seeing "this" format. See #2252.

magnumripper · 2016-10-07T10:09:33Z

Documented in https://github.com/magnumripper/JohnTheRipper/wiki/split()-vs.-prepare()

jfoug added bug non-trivial labels Aug 31, 2016

magnumripper mentioned this issue Sep 1, 2016

Are we over-using prepare() ? #2254

Closed

magnumripper added a commit that referenced this issue Sep 4, 2016

Fixes for prepare/valid encoding checks/conversions. See #2252.

cd4cf0d

magnumripper added the fixed - pending verify label Sep 4, 2016

magnumripper added a commit that referenced this issue Sep 11, 2016

as400-ssha1 Unicode test was made too early. No need to complain

d381d58

if we're not seeing "this" format. See #2252.

magnumripper closed this as completed Oct 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using enc_utf16() in prepare() and valid() functions is not working right. #2252

using enc_utf16() in prepare() and valid() functions is not working right. #2252

jfoug commented Aug 31, 2016 •

edited by magnumripper

Loading

magnumripper commented Aug 31, 2016 •

edited

Loading

magnumripper commented Aug 31, 2016

jfoug commented Aug 31, 2016

jfoug commented Aug 31, 2016

jfoug commented Sep 1, 2016

magnumripper commented Sep 1, 2016 •

edited

Loading

magnumripper commented Sep 4, 2016

magnumripper commented Sep 4, 2016 •

edited

Loading

jfoug commented Sep 9, 2016

jfoug commented Sep 10, 2016

magnumripper commented Sep 10, 2016

jfoug commented Sep 10, 2016

magnumripper commented Oct 7, 2016

using enc_utf16() in prepare() and valid() functions is not working right. #2252

using enc_utf16() in prepare() and valid() functions is not working right. #2252

Comments

jfoug commented Aug 31, 2016 • edited by magnumripper Loading

magnumripper commented Aug 31, 2016 • edited Loading

magnumripper commented Aug 31, 2016

jfoug commented Aug 31, 2016

jfoug commented Aug 31, 2016

jfoug commented Sep 1, 2016

magnumripper commented Sep 1, 2016 • edited Loading

magnumripper commented Sep 4, 2016

magnumripper commented Sep 4, 2016 • edited Loading

jfoug commented Sep 9, 2016

jfoug commented Sep 10, 2016

magnumripper commented Sep 10, 2016

jfoug commented Sep 10, 2016

magnumripper commented Oct 7, 2016

jfoug commented Aug 31, 2016 •

edited by magnumripper

Loading

magnumripper commented Aug 31, 2016 •

edited

Loading

magnumripper commented Sep 1, 2016 •

edited

Loading

magnumripper commented Sep 4, 2016 •

edited

Loading