|
I have a problem using your library with std::strings
containing utf-8 encoded text; the library escapes the
utf8 data, because it thinks it is non printable data.
Any way to get around this? (Besides using utf-16 in
std::wstrings) Maybe an option to disable the escaping
or setting the encoding?
|
|
|
|
|
I have never needed to use unicode so I am clearly no expert but I had envisaged people converting unicode strings/files to and from wstrings before using the library. As I say in the documentation "Note that there is no support for reading Unicode files and converting them to wstrings as this is not a task specific to JSON."
Clearly this is not possible if you need to read/write a stream of utf-8. I do not know what a good solution would be. Perhaps allowing the user to provide an "is_printable" function to the writer, or as you say by allowing the user to set the encoding but I don't know much about that.
I will have a think about the problem, and try a few things out. If you have any further thought on a solution I would be very happy to hear them. A code patch, with test cases, would be ideal.
Regards
John
|
|
|
|
|
Thinking about this a bit I think you do need wstrings for UTF-8 as UTF-8 uses one to four bytes per code point.
Regards
John
|
|
|
|
|
As long as you keep in mind that i>std::string.size() does not return the
number of bytes and not the number of display characters you can store utf8
encoded strings in a std::string.
I will try to make a patch in a few days. I think I will use the is_printable
approach you mentioned. The only problem is that when using multibyte encodings
such as utf8 you will need more than the current character. I can think of two
approaches:
1. using lookahead: when encountering a character that might be the beginning
of a multibyte utf8 character, scan forward to see if it is valid.
2. keep history of the encountered characters and throw an exception when
a multibyte character appears to be invalid.
Do you have any preferences? Because I don't feel like writing a patch that
won't be accepted
|
|
|
|
|
I would very much like the JSON Spirit library to avoid the intricacies of unicode encodings. The idea is for library to be presented with unicode text held in wchars ( strings or streams ). Is it possible for you to use boost::utf8_codecvt_facet or something similar to get from utf8 to wchars and vise versa?
http://www.boost.org/doc/libs/1_41_0/libs/serialization/doc/codecvt.html[^]
|
|
|
|
|
I am currently looking into using your library (thanks for providing it!), and thus are too concerned with the handling of UTF-8 encoded data.
It is fine that JSON_Spirit can handle wchars -- but generally speaking, using wchars for unicode can't be recommended unconditionally. It adds a lot of complexity (and maybe even code bloat) for a very questionable benefit. (Is wchar even capable to represent all characters of the newer unicode standard? I doubt that). Unless the goal is to do real word processing or similar, it is often the best solution just to pass-through utf-8 encoded data, without the application even being aware of handling anything beyond simple 8bit chars.
Reading/Writing JSON fits nicely with that approach; you don't need to add anything beyond a very basic awareness to your library: Within a string element in JSON, any 8bit char is allowed, with the exception of the chars '"' and '\' which end the string or start an escape sequence respectively.
There is no possibility that an utf-8 sequence collides with this requirement. These sequences are specifically crafted, such as to assure that the highest order bit is always set. Thus it's sufficient for your library just to pass them on unaltered.
JSON_Spirit, in the current version is already able to handle utf-8, using just the normal std::string. The only problem is that the generated JSON isn't well readable for humans, because of the created escape sequences. For my use case, this sort-of counterfeits the very reason why I'm looking into JSON as a compact but human readable and editable format to store serialised object structures. If the escaping of "non printable" chars could somehow be toggled off or controlled by the client, the problem would already be solved.
|
|
|
|
|
That's interesting, I hadn't realised that no change is required to handle the reading UTF-8. A flag to toggle the escaping of "non printable" chars wouldn't be a problem to add. Would something like the following be sufficient?
template< class String_type >
String_type add_esc_chars( const String_type& s )
{
typedef typename String_type::const_iterator Iter_type;
typedef typename String_type::value_type Char_type;
String_type result;
const Iter_type end( s.end() );
for( Iter_type i = s.begin(); i != end; ++i )
{
const Char_type c( *i );
if( add_esc_char( c, result ) ) continue;
#ifdef HANDLE_UTF8
result += c;
#else
const wint_t unsigned_c( ( c >= 0 ) ? c : 256 + c );
if( iswprint( unsigned_c ) )
{
result += c;
}
else
{
result += non_printable_to_string< String_type >( unsigned_c );
}
#endif
}
return result;
}
Regards
John
|
|
|
|
|
yes, this looks OK. The UTF-8 sequences are being passed through unaltered.
I take it that the function add_esc_char(c,result) will somehow manage to handle the normal escape sequences, like \\ \" \n \t etc. (This funktion would need to have some embedded state, or inspect the rightmost character(s) of result).
But basically that's another question not realted to UTF-8 handling.
Regards,
Hermann Vosseler
|
|
|
|
|
Hi,
I'm trying to convert a small json string to map,
but with no success .
This the code I wrote:
char* res = "{\"responseData\": {\"translatedText\":\"hola mundo\"}, \"responseDetails\": null, \"responseStatus\": 200}";
Value_type value;
String_type in_s( to_str( res ) );
read_or_throw( in_s, value );
if (value.type()==obj_type)
{
Object_type obj;
obj = value.get_obj();
map<String_type,Value_type> mp_obj;
for( typename Object_type::const_iterator i = obj.begin(); i != obj.end(); ++i )
mp_obj[ i->name_ ] = i->value_; <big></big>
}
but I get compilation error:
error C2039: 'name_' : is not a member of 'std::pair<_Ty1,_Ty2>'
Do u know what can cause the problem ?
I've added this code in
"json_spirit::test_reader, run_tests()"
10x for any help
|
|
|
|
|
I don't know what the exact type of your Object_type is. If it is a json_spirit::Object then you should not get the error. If it is a json_spirit::mObject then it is already a map and i->first is the pair's name, i->second is the pair's value.
Regards
John
|
|
|
|
|
thanks your code!
how to parse the string of {"a":1,"c" ,"dog":{"dogId":3,"isHungry" },"e":5,"exp":48810,"farmlandStatus":[{"a":34,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":26,"l":15,"m":15,"n":{"10177679":1,"16513950":1,"19783570":1,"20194034":1,"2175835":1,"25779663" ,"328410816":1,"3663649":1,"3783771":1,"41503729" ,"41892699" ,"43477953" ,"4469715" ,"532102" ,"538815525":1,"649460555" ,"8374675":1},"o" ,"p":[],"q":1260201288,"r":1260489476},{"a":55,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":28,"l":16,"m":16,"n":{"125462791":1,"13331147":1,"16513950":1,"2175835" ,"25779663":1,"294887051":1,"328410816" ,"3663649" ,"3783771" ,"41403882" ,"41503729" ,"41892699":1,"43477953":1,"4469715" ,"63322968":1,"649460555" ,"75031690":1,"8374675" ,"943310677":1},"o" ,"p":[],"q":1260321306,"r":1260492177},{"a":31,"b":1,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k" ,"l" ,"m" ,"n":[],"o" ,"p":[],"q":1260407625,"r":1260598769},{"a":34,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":30,"l":17,"m":17,"n":{"10177679" ,"16513950":1,"19783570":1,"20194034" ,"2175835":1,"25779663" ,"328410816" ,"3663649":1,"3783771":1,"41503729":1,"41892699":1,"43477953":1,"4469715" ,"532102" ,"538815525":1,"63322968":1,"649460555":1,"8374675":1},"o" ,"p":[],"q":1260201287,"r":1260490120},{"a":55,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":26,"l":15,"m":15,"n":{"16513950":1,"25779663" ,"3663649":1,"41503729":1,"41892699":1,"43477953":1,"4469715":1},"o" ,"p":[],"q":1260321278,"r":1260485140},{"a":31,"b":1,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k" ,"l" ,"m" ,"n":[],"o" ,"p":[],"q":1260407626,"r":1260595185},{"a":34,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":31,"l":18,"m":18,"n":{"10177679" ,"16513950":1,"19783570":1,"20194034":1,"2175835" ,"25779663":1,"328410816":1,"3663649":1,"3783771" ,"41503729":1,"41892699":1,"43477953" ,"4469715" ,"532102" ,"538815525" ,"63322968":1,"649460555":1,"8374675":1},"o" ,"p":[],"q":1260201287,"r":1260490120},{"a":55,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":25,"l":14,"m":14,"n":{"125462791" ,"13331147":1,"16513950":1,"2175835":1,"25779663" ,"328410816" ,"3663649":1,"3783771":1,"41503729":1,"41892699":1,"43477953" ,"4469715":1,"63322968":1,"649460555":1,"8374675" ,"943310677":1},"o" ,"p":[],"q":1260321277,"r":1260491057},{"a":31,"b":1,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k" ,"l" ,"m" ,"n":[],"o" ,"p":[],"q":1260407626,"r":1260595186},{"a":34,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":32,"l":19,"m":19,"n":{"10177679":1,"16513950":1,"19783570":1,"20194034":1,"25779663":1,"3663649":1,"3783771":1,"41503729":1,"41892699":1,"532102" ,"538815525":1,"649460555":1,"8374675":1},"o" ,"p":[],"q":1260201286,"r":1260483680},{"a":55,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":25,"l":14,"m":14,"n":{"16513950":1,"2175835":1,"25779663":1,"328410816":1,"3663649":1,"3783771" ,"41503729":1,"41892699":1,"43477953":1,"4469715":1,"63322968":1,"649460555" ,"8374675":1},"o" ,"p":[],"q":1260321276,"r":1260490120},{"a":33,"b":1,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k" ,"l" ,"m" ,"n":[],"o" ,"p":[],"q":1260407629,"r":1260595187},{"a":56,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j":2,"k":26,"l":15,"m":15,"n":{"10177679":1,"16513950":1,"260548988":1,"28364441":1,"361486206":1,"3783771":1,"41892699":1,"532102" ,"538815525":1,"649460555":1,"8374675":1},"o" ,"p":[],"q":1260346320,"r":1260461583},{"a":106,"b":6,"c" ,"d" ,"e":1,"f" ,"g" ,"h":1,"i":100,"j" ,"k":28,"l":16,"m":19,"n":{"110933010":1,"260548988":1,"3663649":1,"39206992":1,"41892699" ,"43477953":1,"63322968" ,"649460555" ,"75031690" },"o" ,"p":[],"q":1260407621,"r":1260600939}],"items":{"1":{"itemId":1},"2":{"itemId":98},"3":{"itemId":215},"4":{"itemId":100},"8":{"itemId":80003},"9":{"itemId":90001}},"user":{"healthMode":{"serverTime":1260601148,"set" ,"valid" },"pf":1}}
best regard!
robustwell
|
|
|
|
|
I am not sure what you are asking but -O , i.e. a capital letter O preceded by a minus sign which creates a smiley, is not a valid JSON object so cannot be parsed.
John
|
|
|
|
|
Thanks very much for making this useful library available.
I've found that JSON Spirit can't correctly handle the double numbers.
I have in JSON file:
{
"ground water volume flow rate" : 1234567890.123456789,
}
I read this value with:
...
jsonObj.at("ground water volume flow rate").get_real();
...
The number that I get is: 1.23457e+09
It looks like JSON Spirit handle the double numbers as float numbers (cuts till 6 decimal digits).
Is it possibe to get numbers as double?
Thanks, Tatjana
|
|
|
|
|
I don't get the same problem. Json Spirit has used doubles since verion 2.01 which was released about 2 years ago. It will read to 17 decimal places. The following code outputs:
1234567890.1234567
{"ground water volume flow rate":1234567890.123457}
mValue value;
read( "{\"ground water volume flow rate\" : 1234567890.123456789 }", value );
const mObject& o = value.get_obj();
const double d = o.find("ground water volume flow rate")->second.get_real();
const string s = write( o );
cout << setprecision( 18 ) << d << endl;
cout << s << endl;
John
|
|
|
|
|
Thanks John! It was my mistake. What I needed was setprecision(...), sorry I'm relatively new in C++.
Tatjana
|
|
|
|
|
Thanks very much for making this library available, I've found it extraordinarily useful.
I've found that JSON Spirit can't correctly handle multiple top-level objects in a stream unless they are separated by at least one character. I'm not sure if this is a limitation of JSON Spirit or Boost. Is this a known bug or requirement? Is there any easy way around it? I'm using the latest release of JSON Spirit (4.02) and Boost IOstreams 1.40.0.1.
Example program:
#include "json_spirit.h"
#include <sstream>
#include <string>
int main(int argc, char **argv) {
std::string s = argv[1];
std::istringstream str(s);
json_spirit::Value v;
bool succ = json_spirit::read(str, v);
assert(succ);
printf("Parsed: %s\n", json_spirit::write(v).c_str());
succ = json_spirit::read(str, v);
assert(succ);
printf("Parsed: %s\n", json_spirit::write(v).c_str());
}
Example output:
$ ./simple_test "{\"test\":1}{\"test\":2}"
Parsed: {"test":1}
Parsed: "test"
./simple_test "{\"test\":1} {\"test\":2}"
Parsed: {"test":1}
Parsed: {"test":2}
Only the second run parsed it correctly, presumably because there is a space between the two objects.
Any guidance would be much appreciated. Thanks!
Jeremy
|
|
|
|
|
Yes it definitely looks like a bug. Interestingly it only occurs when reading streams, reading multiple top level objects from a string works. I will look onto it.
John
|
|
|
|
|
Thanks John. I've been poking around and my working assumption is that it has something to do with multi-pass and looking ahead in the stream, but I haven't made much progress yet.
|
|
|
|
|
The problem does seem to be the multi_pass iterator. Basically, it buffers an extra character from the stream, and when the read_stream call returns, the multi_pass iterators get destroyed, and the character has permanently disappeared from the stream. Here's a simple patch that works for me, which just inserts the last character back into the stream:
<pre>
--- json_spirit_reader.cpp 2009-12-02 17:06:46.000000000 -0800
+++ json_spirit_reader.cpp.new 2009-12-02 17:06:28.000000000 -0800
@@ -621,7 +621,13 @@
{
Multi_pass_iters< Istream_type > mp_iters( is );
- return read_range( mp_iters.begin_, mp_iters.end_, value );
+ bool succ = read_range( mp_iters.begin_, mp_iters.end_, value );
+
+ if( mp_iters.begin_ != mp_iters.end_ ) {
+ is.putback(*(mp_iters.begin_));
+ }
+
+ return succ;
}
template< class Istream_type, class Value_type >
@@ -630,6 +636,11 @@
const Multi_pass_iters< Istream_type > mp_iters( is );
add_posn_iter_and_read_range_or_throw( mp_iters.begin_, mp_iters.end_, value );
+
+ if( mp_iters.begin_ != mp_iters.end_ ) {
+ is.putback(*(mp_iters.begin_));
+ }
+
}
}
</pre>
|
|
|
|
|
Thanks I appreciate you posting the patch. Unfortunately it doesn't work in all cases. Parsing "12 34 56" using the read or throw function results in a '1' being put back, the iterator must in this case be pointing one before the end of the object!
Interestingly multiple objects are parsed correctly if you create the multi_pass iterators then use them for the multiple parses, i.e. don't create and destroy the iterators on each parse.
I will continue looking for a complete fix. I assume your fix works for you if you are not using the read or throw functions. I have posted a question on the Spirit mailing list, http://sourceforge.net/mailarchive/forum.php?thread_name=190860.69208.qm%40web46015.mail.sp1.yahoo.com&forum_name=spirit-general
Regards
John
|
|
|
|
|
Thanks for the great library; it seems to have everything I need.
For your information, I am using boost 1.80, and everything works fine. (In case you want to update the opening paragraph). I have successfully compiled and linked it into a project (and I ran your test files, which came out successful.)
Edit: I'm using Visual Studio 2008, Express Edition.
|
|
|
|
|
Thanks for the info which I can confirm ( of course you mean 1.40 ). I am please you like the library.
Regards
John
|
|
|
|
|
>of course you mean 1.40
Yep, apologies for the typo. I also compiled it against 1.41.0, just for the sake of completeness. All is working smoothly.
My only question: when you check against types (.get_value()), why do you use assert() instead of throwing an exception on failure? It is difficult to react to asserts higher up in (my) code.
|
|
|
|
|
I suppose there is case for throwing an exception. For example if you were expecting data in a particular format you might not want to check a value's type before getting the value. It would be better just to receive an exception if the data was invalid. I have thought about parametrising the error reporting mechanism to allow users to specify that they want exceptions while maintaining backwards compatibility. Alternatively users could put a wrapper round the read functions throwing an exception if a value's type isn't the one expected.
John
|
|
|
|
|
The asserts aren't a big deal for me; I'll either re-write the relevant code or just re-define the assert macro. (Or wrap the read function, as you suggested. Hadn't thought of that earlier. )
In terms of library design, the decision is of course yours. Perhaps I am coming from too much of a Java mindset, where it's considered ok to just "try it and handle the general case of failure", but stopping the world is frowned upon. Actually, adding an exception-handling function as a parameter to the "read" function sounds pretty safe (that's what you mean when you said parametrising the error reporting mechanism, right?). The default exception-handling function could just be "assert(false)", which should preserve the current behavior.
Thanks again for the response. I'll work around the assert statement in my own way, but I'm looking forward to see how you implement a solution into the library's next release.
|
|
|
|
|