Introduction: Non Bitwise XOR Curiosity for Encryption


I was searching the Internet for a string based XOR function and I was unable to come up with anything. There were people looking but a lot of  "experts" claimed that there was no such thing. I decided to take it upon myself to see if they were correct or not.

Step 1: What the Heck Is XOR


First for a little background. XOR (exclusive OR) is what's called a bit-wise operation. It breaks down into a truth table as follows:

A(in) B(in) C(out)
 0        0        0
 1        0        1
 0        1        1
 1        1        0

This may seem a little less than impressive but it has larger applications especially in the field of encryption. If you put two bytes together via bit-wise XOR the result and one of the original inputs will return the second input.

Example:
             If A=010101
         and B=101010 then an XOR operation would result
                 C=111111.

             If C=111111
         and B=101010 then an XOR operation would result
                      010101 which is A

the same is true of the XOR of C and A = B.

So there you have an overview of bit-wise XOR operation and the application that I am interested in.

.

Step 2: Non Binary, Non Bit-wise XOR


So the big question is can this type of thing work with digits and not just at a binary level. I believed that it could so I tried to work it out in my head and on paper. This is what I thought would be the result for base10.

A xor B = C --> C xor B = A --> C xor A = B

0123456789 --> 10 digits

10-9=1-3=-2 --> 3 xor 9 = 8 --> 8 xor 9 = 3
10-8=2-3=-1 --> 3 xor 8 = 9 --> 9 xor 8 = 3
10-7=3-3=0 --> 3 xor 7 = 0 --> 0 xor 7 = 3
10-6=4-3=1 --> 3 xor 6 = 1 --> 1 xor 6 = 3
10-5=5-3=2 --> 3 xor 5 = 2 --> 2 xor 5 = 3
10-4=6-3=3 --> 3 xor 4 = 3 --> 3 xor 4 = 3
10-3=7-3=4 --> 3 xor 3 = 4 --> 4 xor 3 = 3
10-2=8-3=5 --> 3 xor 2 = 5 --> 5 xor 2 = 3
10-1=9-3=6 --> 3 xor 1 = 6 --> 6 xor 1 = 3
10-0=10-3=7 --> 3 xor 0 = 7 --> 7 xor 0 = 3

Of course my math doesn't work very well for coding on the computer. Instead it should be positional based with a set. The set is the alphabet so-to-speak. The length of the alphabet and the position of the matches are the important factors for coding. Let's see if we can do a binary version to test the theory. So in binary, the alphabet is easy "0" and "1".

Set="0" "1"
Length=2
Max=1
A=1, B=1, C=?

  Max - position1 - postion2 = position3
     1   -       2          -       2         =    -3

Oh, oh! -3? I decided that the easiest solution to this is to check for a non-positive integer and rewrite the value like I was counting backwards with no carry-over or remainder.

In this binary example there can only be two positions "1" and "2".
If I add the length of the set which is 2 to -3 it comes to -1, add length again and we have 1. On this basis 0 and -2 would evaluate to position 2.

With these conditions in place this will work to emulate the truth table of a bit-wise XOR:

     A(in)   POS(a)    B(in)     POS(b)     C(out)    POS(c)
      0           1               0            1                0             1
      1           2               0            1                1             2
      0           1               1            2                1             2
      1           2               1            2                0             1

.

Step 3: Code for XOR Algorithm in Base[X]


I find that I can't work out all the math in my head fast enough so I decided to switch to the computer. I'm a LINUX guy so I went straight to AWK as fast as I could.

I wrote a short script to at least test the theory on a base[10] set. My original paper version seems to have some issues but in the end I got a truth table that I can live with.

1) I created a 10-line file, one number on each line starting with 0.
2) I wrote a routine to index these numbers, accept an input, run the algorithm, and spit out the result.

----
#!/bin/bash
# baseX-xor.sh

echo -e $1 $2 | awk 'BEGIN {
        CNT=1
        while(getline <  "wheel.txt" > 0)
        {
          n[CNT]=$0 ; i[$0]=CNT ; CNT++
        }
        mylen=CNT-1 ; mymax=CNT-2
        close("wheel.txt")
      }
{
        pos1=i[$1]+0 ; pos2=i[$2]+0

        if(pos1==0) pos1=1
        if(pos2==0) pos2=1

        result=mymax-pos1-pos2

        if(result<=0) result=result+mylen
        if(result<=0) result=result+mylen
        if(result<=0) result=result+mylen

        print n[result]
}'


----

I then ran a command line script to make a truth table matrix:

----

for X in 0 1 2 3 4 5 6 7 8 9
do
  for Y in 0 1 2 3 4 5 6 7 8 9
  do
    O=`./baseX-xor.sh $X $Y` 
    echo -n " $O"
  done
  echo 
done

----

       0 1 2 3 4 5 6 7 8 9

0     6 5 4 3 2 1 0 9 8 7
1     5 4 3 2 1 0 9 8 7 6
2     4 3 2 1 0 9 8 7 6 5
3     3 2 1 0 9 8 7 6 5 4
4     2 1 0 9 8 7 6 5 4 3
5     1 0 9 8 7 6 5 4 3 2
6     0 9 8 7 6 5 4 3 2 1
7     9 8 7 6 5 4 3 2 1 0
8     8 7 6 5 4 3 2 1 0 9
9     7 6 5 4 3 2 1 0 9 8

Step 4: Applying to 7 Bit Ascii (printable Charaters Only)


7-bit ASCII has 94 printable characters (95 if you include the space, which I intend to do). I am looking to use this Xor for encryption so the idea was to get all the characters that might appear in a test document.

I created a new alphabet file starting with space, !, ", #, etc. It created a huge matrix which I attached as a picture. The picture doesn't show the whole thing but the important thing is that the function is reversible as I had hoped.

Example: 
"A"  xor  "a"  =  "x"
"A"  xor  "x"  =  "a"
"a"  xor  "x"  =  "A"

Step 5: Testing the One-time Pad


The one-time pad is the simplest of all the encryption methods. It takes the input text (plain-text) and using bit-wise XOR combines it with another input (key).

If the key is random, as long or longer than the plain-text and the key is only ever used once then the encryption is, in theory, unbreakable.

Plain-text   xor   Key  ==>  Cipher-text

Cipher-text  xor  Key ==> Plain-text

Let's begin:

Plain-text = "Mmm. I love turtles. "

Key            = "This_is_for_me_to_know"
--------------------------------------------
Cipher       = "y E D y < h ( O E 5 C . A F 4 7 O J 9 }"

Key            = "This_is_for_me_to_know"
--------------------------------------------
Plain-text = "Mmm. I love turtles."

So it works for straight one-time pad encryption but it took a while. I had to run the program on each and every character of the key plus the plain-text and then I had to assemble it. I think it would be better if I wrote some code to do this for me.

----
#!/bin/bash

cat test.txt | awk 'BEGIN {
  CHK=0
  CNT=1
  while(getline <  "wheel.txt" > 0)
  {
    n[CNT]=$0 ; i[$0]=CNT ; CNT++
  }
  mylen=CNT-1 ; mymax=CNT-2
  close("wheel.txt")
}
{
  CNT=1
  while(CNT<=80)
  {
    if(CHK==0)
    {
      result[CNT]=substr($0,CNT,1)
    }
    if(CHK!=0)
    {
      Xin=substr($0,CNT,1)
      Yin=result[CNT]

      pos1=i[Xin]
      pos2=i[Yin]

      if(pos1==0) pos1=1
      if(pos2==0) pos2=1

      posx=95-pos1-pos2

      if(posx<=0) posx=posx+96
      if(posx<=0) posx=posx+96
      if(posx<=0) posx=posx+96

      result[CNT]=n[posx]
    }
    CNT++
  }
  CHK=1
}
END {
  OUT=""
  CNT=1
  while(CNT<=80)
  {
    OUT=OUT result[CNT]
    CNT++
  }
  print OUT
}'

----

My other issue is that one-time pad is too simple and has the restriction of not being able to reuse the key. I'd like a method where the same key won't give up it's secrets so easily.

.

Step 6: Development Continues (planning Out the Algorithm)


I'm quite happy with the results so far. I have always been interested in an encryption method that would produce printable cipher-text rather than binary garbage. This method seems like it would make for a fine email encryption utility.

Xor is a very nice start but if I really want a strong algorithm I need to create a cryptographic hash function and I need to tie it all together in some form of a stream or block cipher. There are lots of resources out there so I'm going to borrow some ideas here. Not because I'm lazy and can't think of anything myself. It's more that I'm not qualified as a cryptographer.

I'd like to use methods that are accepted as secure or at least incorporate similar methodology and hope that I don't make a bad choice (i.e. avoid methods already proven insecure). A good algorithm should be open for everyone to see and still stand up to attack. There is a community of "testers" out there that will find vulnerabilities in any algorithm publicly available.

Since my vision is text to text to text I'm going to stick with 7-bit printable ASCII as my alphabet. I'm deciding that a block is 80-characters long (as in one line) with an arbitrary number of rows. I will therefore base everything on this 80 character block including the hash function. For any purists in the audience this is equivalent to somewhere between 546 and 547 binary bits.

So what does a hash function do exactly? It processes an arbitrary amount of data and produces a fixed length output called a hash. For a hash to be secure there are a few requirements.

1) minuscule change in input should reflect a massive change out
2) output is unpredictable compared to input (i.e. irreversible)
3) output should be random in nature (i.e. noisy)
4) the same input must always produce the same output

.

Step 7: To Be Continued

I'm a bit busy these days but I would very much like to continue this as soon as I am able. I have several good ideas about creating a transposition algorithm, 7 bit ASCII "add" function, key hashing and others. Sorry for the tease.