I had a take-home task. One of the aspects of the task was to create a fast json parser of coinbase feed. The parser should extract 3 json fields: sequence number, ask price and bid price.
I managed to achieve ≈39ns median time (reported by google benchmark), which is as good as the
best (single) L3-cache reference time, but apparently it was a considered as a fail. This was their feedback:
>... area of concern was the JSON parser; the search repetitions and the expense of conversions in methods like `toDouble()` could be optimized.
Can anyone tell me what is wrong with the approach below?
What I have tried:
Search
First of all, we have a bunch of json like this:
{"type":"ticker","sequence":952144829,"product_id":"BTC-USD","price":"17700","open_24h":"5102.64","volume_24h":"146.28196573","low_24h":"4733.36","high_24h":"1000000","volume_30d":"874209.06385166","best_bid":"17700.00","best_bid_size":"96.87946051","best_ask":"17840.24","best_ask_size":"0.00010000","side":"sell","time":"2023-06-09T22:13:08.331784Z","trade_id":65975402,"last_size":"0.0001"}
According to the task, we need to extract only these fields:
* SEQUENCE = "nce""; // "sequence"
* BID = "bid""; // "best\_bid"
* ASK = "ask""; // "best\_ask"
First observation: the position of "sequence" does not change (much) from one json to another. It means we do not need to look for the key from the beginning of the string. Instead I remember the position where the key was found last time, and next time, I start looking for the key from this position.
If I cannot find it at this position, I start looking at pos-1 (1 character to the left), pos+1 (1 character to the right), pos-2, pos+2, etc...
Second observation is that I can use the hash from "rolling hash" search approach. I also need only 4 characters to distinguish and identify necessary keys:
* "nce"" for "sequence"
* "bid"" for "best_bid"
* "ask"" for "best_ask"
So, "
to find a key" just means, precalculate an integer:
(str[pos] << 0) + (str[pos+1] << 5) + (str[pos+2] << 10) + (str[pos+3] << 15)
for the needle, calculate integer for certain position in the string and compare two integers.
toDouble() conversion
Pretty straightforward:
* get the number in `result` until we meet `.` or end of string.
* if there is `.`, continue with the `result`, but also calculate factor (as a power of 10), which we will then use to divide:
static Float toDouble(std::string_view str, StrPos start) {
int64_t result = 0;
int64_t factor = 1;
for(; start != str.size() && str[start] >= '0' && str[start] <= '9'; ++start)
result = result * 10 + (str[start] - '0');
if(start != str.size() && str[start] == '.') [[likely]] {
++start;
for(; start != str.size() && str[start] >= '0' && str[start] <= '9'; ++start) {
result = result * 10 + (str[start] - '0');
factor *= 10;
}
}
return (Float)result / (Float)factor;
}
Full code is here