Frustrated by simple code (bitwise not) (convert from C to C#)

Question

3.50/5 (2 votes)

See more:

Hello,

I've been struggling for a while trying to figure out something that I think should be simple. I'm not very familiar with C# (yet) though, so that could be part of the problem.

Basically, I just need to convert this C code to work in C#:

C

char chr;
FILE *in, *out;

// fopen blah blah (just omitting this part, not important)

while((chr = getc(in)) != EOF)
{
if(chr != '\r' && chr != '\n')
  chr = ~chr;

  putc(chr, out);
}

// fclose blah blah

I've tried something like this, but it doesn't work properly -- some of the characters aren't changed or are outputted incorrectly.

C#

StreamReader inStream = new StreamReader(inputFile, Encoding.GetEncoding(1252));
StreamWriter outStream = new StreamWriter(outputFile, true, Encoding.GetEncoding(1252));


while (!inStream.EndOfStream)
{
    char[] chr = new char[1];

    inStream.Read(chr, 0, 1);

    if (chr[0] != '\r' && chr[0] != '\n')
        chr[0] = (char)(byte)(~(int)(byte)chr[0]);

    outStream.Write(chr[0]);
}

outStream.Close();
inStream.Close();

I also think all the typecasts are a bit silly, but I'm not sure how I'm 'supposed' to do it.
I have a feeling it might have something to do with the encoding of the file (it's "western european", hence the encoding I had to use on streamreader) -- but is there a way I can do this without even worrying about the encoding? And to mimic the C code exactly?

Thanks for any help.

Posted 25-Jun-12 11:56am

gboost

Add a Solution

Comments

Sergey Alexandrovich Kryukov 25-Jun-12 19:10pm

What characters, exactly? Are you sure the input file is really 1252? Strictly speaking, everything which is not written in Unicode is potentially incorrect. The encoding to work with Unicode which give identical results with ASCII when all characters fit in ASCII range is UTF-8.
--SA

4 solutions

Solution 2

Probably will not like but here it is:

C#

var charValue = 'c';
var intValue = Convert.ToByte(charValue);
var invertedValue = intValue ^ 255;
var newCharValue = Convert.ToChar(invertedValue);

or just:

var intValue2 = (char) ((byte)charValue ^ 255);

Posted 25-Jun-12 12:48pm

Clifford Nelson

Updated 25-Jun-12 12:54pm

v2

Comments

gboost 25-Jun-12 19:28pm

This seems to work fine, except it's still not converting it as I expected. I wonder if it does have something to do with what lewax said (8 bit vs 16 bit)

For example, this "encoded" text:
‹šŒ‹–‘˜ ‘šßÂßÝŒ’š‹—–‘˜Ýß
‹šŒ‹–‘˜ ‹ˆßÂßÝÎÏÏÝ

Comes out as this (in C# -- looks like some of it converted ok):
Æ?Æìç#_oç? = "oæ?Æëìç#"
Æ?Æìç#_Æ9o = "100"

But it's supposed to be (and comes out as this with C):
testing_one = "something"
testing_two = "100"

Clifford Nelson 25-Jun-12 20:23pm

I would look at exactly what you are getting back from the C program. That way you can see the exact bit array out vs. the bit array in. That way you can determine exactly what it is doing.

Sergey Alexandrovich Kryukov 25-Jun-12 21:24pm

Sure, but you can see clearly: "100" and "=" comes out unmangled, which is weird.
Anyway, the whole purpose of it is ridiculous -- please see the discussion in the comments to my answer.
--SA

Solution 1

I don't see why

C#

chr[0] = (char)~c;

shouldn't work.

However, note that in C# characters are 16-bits, where to my knowledge they're usually 8 in C. That may or may not make a difference in your output.

Posted 25-Jun-12 12:46pm

lewax00

Comments

Sergey Alexandrovich Kryukov 25-Jun-12 19:30pm

It would work, but as a complement to a 16-bit word, which is not the same as in C, a complement to a 8-bit word. Please see my answer.
--SA

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Jorge Montoya · Accepted Answer · 2012-06-26T04:56:00

Solution 4

Why not just read the file as bytes and write it back out the same?

CSS

public Program(string[] args)
{
    byte[]  buf;
    int     i;

    buf = File.ReadAllBytes(@"C:\Test\InFile.txt");

    for (i = 0; i < buf.Length; i++)
    {
        if ((buf[i] != '\r') && (buf[i] != '\n'))
        {
            buf[i] = (byte) (~buf[i]);
        }
    }
    File.WriteAllBytes(@"c:\Test\OutFile.txt", buf);
}

Posted 26-Jun-12 4:56am

Jorge Montoya

Comments

Sergey Alexandrovich Kryukov 26-Jun-12 12:49pm

Right idea, my 5. (Even though the purpose does not worth it.)
--SA

gboost 26-Jun-12 18:37pm

Does exactly what I needed, thank you very much :)

Sergey Alexandrovich Kryukov 26-Jun-12 19:02pm

Accepting it formally was right thing. As I say, it should do what you expected.
--SA

Sergey Alexandrovich Kryukov · Accepted Answer · 2012-06-25T13:19:00

Not that your result is correct or not; it's better to say that the whole idea of this "translation" between languages makes no sense. You could make it meaningful it you explained to ultimate purpose of your character calculations (which look strange). Apart from some application context, the question does not makes any sense.

Here is why: you are doing seemingly similar operations of very different objects.

Your C characters are 8-bit objects. Moreover, you use signed characters, but if you do just bitwise operations, it does not matter. And you use complement operator '~'. The idea of complement would not make any sense without specification of "complement to what". You could typecase some object to a wider type and complement to its value corresponding to a value with all bits set and get a different result. With C char type, the complement means a bitwise complement to the value 0xFF. For example, if your character is blank space (char source = ' ';), the complement gets the value -33, which corresponds to 0xDF in the unsigned char form.

In .NET, a character is a Unicode character. In memory, it is represented using the encoding UTF-16LE, which uses 16-bit words to express a character in a Base Multilingual Place (BMP) and a pair of such words to express one character outside BMP. When you calculate a complement of the same very blank space, you get a "character" 0xFFDF, which is not standardized as a character:
http://www.unicode.org/charts/PDF/UFF00.pdf[^].

Please see:
http://www.unicode.org[^].

Now, you wrap all intermediate results to a byte, it will give identical result: 0xDF. So, up to this point everything is "correct" (if this is really what you want to get), and the problems could be somewhere else. What is your input file is not actually all "Western European". Or you interpret it incorrectly. So, to go further, let's see what exactly characters are "wrong". You could easily run this code under the debugger to see a calculation on some specific characters. Please see my comment to the question and answer my question.

As to your idea to "do this without even worrying about the encoding", it strongly resembles the thinking of monsieur Jourdain, a character of Molière's play Le Bourgeois gentilhomme. This guy was proud of the fact he could express himself in prose, after his teacher explained it to him. :-)
Please see:
http://en.wikipedia.org/wiki/Prose[^],
http://en.wikipedia.org/wiki/Le_Bourgeois_gentilhomme[^].

[EDIT]

Anyway, I decided to try it out. First of all, let me re-write the code is a literate way (but it does not mean is should work correctly):

C#

class Program {

    const string fileName = "input.txt";
    const string outfileName = "output.txt";

    static void Main(string[] args) {

        using (StreamReader reader = new StreamReader(fileName, Encoding.GetEncoding(1252))) {
            using (StreamWriter writer = new StreamWriter(outfileName, false, Encoding.GetEncoding(1252))) {
                while(true) {
                    int value = reader.Read();
                    if (value < 0) break;
                    char character = (char)value;
                    if (character != '\r' && character != '\n')
                        character = (char)(byte)(~(int)(byte)character);
                    writer.Write(character);
                } //loop
            }
        } //using

    } //Main

} //class Program

Your text sample is "converted" like this:

??????? ??ßÂßÝ????????Ý
??????? ??ßÂßÝÎÏÏÝß

where each question mark is really a question mark (code point 0x003f). The reason is this: it is incorrect to work with characters and encodings in principle. In this case, your complement function produces an image of a source character which does not fit into the range of the valid code range for the encoding, so it is replaced by a question mark.

Here is the background: C characters are not really characters, they are signed bytes and are processed in the bitwise manner, ignoring the cultural meaning of them. As to .NET, it follows Unicode standard.

Let me tell you that all your "1251", as well as the whole idea of "code page" do not exist anymore, in a way. They exist only in the form of some Microsoft legacy. Look at the result of System.Text.Encoding.GetEncoding — this is the real encoding object. Also, all non-Unicode encodings are only good for some legacy (such as ASCII, as a subset of Unicode). If you use any encoding except one of Unicode UTFs on an arbitrary text, a correct result is not guaranteed.

Now, to reproduce the effect of your C code, you need to work with binary bytes, as it is suggested in Solution 4. This is the only way.

Then again, this is a kind of "obfuscation" which makes no sense, whatsoever. If you needs encryption, use encryption (again, why?).

—SA

Frustrated by simple code (bitwise not) (convert from C to C#)

4 solutions

Solution 4

Solution 3

Solution 2

Solution 1

Add your solution here

Preview 0

Existing Members

...or Join us