Unicode character groups

Mar 5, 2010 at 1:22 PM
Edited Mar 5, 2010 at 6:06 PM

I'm working with xVal with a Regex DataAnnotation that looks like this: "^[\w\s\p{P}]*$"

It works fine on the server side, and .NET recognizes the \p{P} to mean the Unicode character group "Punctuation". xVal doesn't seem to pick that up though... Interestingly, it WILL pick up this: "^[\w\s\p{Punctuation}]*$"

But then .NET won't recognize it. I'm assuming that this is because xVal is leveraging the native Javascript RegEx implementation for pattern matching, and thus is at the whims of whether or not certain RegEx standards are implemented in a given browser... In my case, I'm using FireFox 3.6. Interestingly, Google Chrome doesn't have this issue, as I suspect that it's javascript implementation supports the unicode character groups.

Has any thought been put into converting character groups to their unicode equivalents in the xVal library prior to putting them into the Javascript code?

Something like what this project is doing: http://www.codeproject.com/KB/dotnet/UnicodeCharCatHelper.aspx

/ Michael /

Mar 5, 2010 at 7:50 PM

I realize I may be asking a little much to have it fix Javascripts broken RegEx implementation... I want ahead and just used the app linked above to convert my requirements to a string of explicit Unicode character groups. It makes for one heck of an ugly Regular Expression, but it works...

Mar 5, 2010 at 8:35 PM

This is not really an xVal issue, but more than likely an issue due to the fact that .NET and JavaScript differ in their Regular Expression syntax.  Because of that, the only way I can see getting around this is to create your own custom attribute, as well as a custom provider for xVal on the server which will catch the regex and transform it to the appropriate output when emitting the validation rules in the script.

 

Mar 5, 2010 at 8:58 PM

Yeah, I kinda gathered that after looking around about Javascript's RegEx engine. Basically, it seems to be hit or miss depending on what browser you're running... Kind of annoying. I guess I could do something like this:

^[\w\s!@#\$%\^&\*\(\)_+-=\{\}|\[\]\\:";\'\<\>\?,\./]*$

which kind of has the same effect as this monstrosity:

^[\w\s\u0021-\u002f\u003a-\u0040\u005b-\u0060\u007b-\u007e\u00a1-\u00a9\u00ab-\u00b1\u00b4\u00b6-\u00b8\u00bb\u00bf\u00d7\u00f7\u02b9\u02ba\u02c2-\u02cf\u02d2-\u02df\u02e5-\u02ed\u0374\u0375\u037e\u0384\u0385\u0387\u0482\u055a-\u055f\u0589\u058a\u05be\u05c0\u05c3\u05f3\u05f4\u060c\u061b\u061f\u066a-\u066d\u06d4\u06e9\u06fd\u06fe\u0700-\u070d\u0964\u0965\u0970\u09f2\u09f3\u09fa\u0b70\u0df4\u0e3f\u0e4f\u0e5a\u0e5b\u0f01-\u0f17\u0f1a-\u0f1f\u0f34\u0f36\u0f38\u0f3a-\u0f3d\u0f85\u0fbe-\u0fc5\u0fc7-\u0fcc\u0fcf\u104a-\u104f\u10fb\u1361-\u1368\u166d\u166e\u169b\u169c\u16eb-\u16ed\u17d4-\u17dc\u1800-\u180a\u1fbd\u1fbf-\u1fc1\u1fcd-\u1fcf\u1fdd-\u1fdf\u1fed-\u1fef\u1ffd\u1ffe\u2010-\u2027\u2030-\u2046\u2048-\u204d\u207a-\u207e\u208a-\u208e\u20a0-\u20af\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116-\u2118\u211e-\u2123\u2125\u2127\u2129\u212e\u2132\u213a\u2190-\u21f3\u2200-\u22f1\u2300-\u237b\u237d-\u239a\u2400-\u2426\u2440-\u244a\u249c-\u24e9\u2500-\u2595\u25a0-\u25f7\u2600-\u2613\u2619-\u2671\u2701-\u2704\u2706-\u2709\u270c-\u2727\u2729-\u274b\u274d\u274f-\u2752\u2756\u2758-\u275e\u2761-\u2767\u2794\u2798-\u27af\u27b1-\u27be\u2800-\u28ff\u2e80-\u2e99\u2e9b-\u2ef3\u2f00-\u2fd5\u2ff0-\u2ffb\u3001-\u3004\u3008-\u3020\u3030\u3036\u3037\u303e\u303f\u309b\u309c\u30fb\u3190\u3191\u3196-\u319f\u3200-\u321c\u322a-\u3243\u3260-\u327b\u327f\u328a-\u32b0\u32c0-\u32cb\u32d0-\u32fe\u3300-\u3376\u337b-\u33dd\u33e0-\u33fe\ua490-\ua4a1\ua4a4-\ua4b3\ua4b5-\ua4c0\ua4c2-\ua4c4\ua4c6\ufb29\ufd3e\ufd3f\ufe30-\ufe44\ufe49-\ufe52\ufe54-\ufe66\ufe68-\ufe6b\uff01-\uff0f\uff1a-\uff20\uff3b-\uff40\uff5b-\uff5e\uff61-\uff65\uffe0-\uffe6\uffe8-\uffee\ufffc\ufffd]*$

Which is what I am using right now (and does in fact work). I just might be missing a few characters with the shorter one...

/ Michael /