From what I can tell, the C++ stream system presumes that files are sequences of bytes, not characters - even when you use wide streams - the 'wide' part of wide stream (AFAICT) indicates how the stream object interacts with C++, not the underlying file or whatever. Thus, your codecvt facet has to take in characters.
By changing the declaration of your codecvt facet to that shown below, I was able to get breakpoints in the replacement facet being set.
class utf16_codecvt : public std::codecvt<char16_t, char, std::mbstate_t>
{
typedef std::codecvt<char16_t, char, std::mbstate_t> Base;
typedef char16_t ElemT;
typedef char ByteT;
virtual result __CLR_OR_THIS_CALL do_in(std::mbstate_t& s,
const ByteT *_First1, const ByteT *_Last1, const ByteT *& _Mid1,
ElemT*_First2, ElemT* _Last2, ElemT *& _Mid2) const
{ return Base::do_in(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
}
virtual result __CLR_OR_THIS_CALL do_out(std::mbstate_t& s,
const ElemT*_First1, const ElemT*_Last1, const ElemT*& _Mid1,
ByteT*_First2, ByteT*_Last2, ByteT*& _Mid2) const
{ return Base::do_out(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
}
virtual result __CLR_OR_THIS_CALL do_unshift(std::mbstate_t& s,
ByteT*_First2, ByteT*_Last2, ByteT*&_Mid2) const
{ return Base::do_unshift(s, _First2, _Last2, _Mid2);
}
virtual int __CLR_OR_THIS_CALL do_length(const std::mbstate_t& s, const ByteT*_First1,
const ByteT*_Last1, size_t _Count) const
{ return Base::do_length(s, _First1, _Last1, _Count);
}
};
So, your replacement facet will have to know it needs two bytes read for every character (and vice versa, obviously). The best reference for that sort of information is probably
Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft[
^] - but even then, locales and facets are heavy going in C++ :(