Why does a Burrows-Wheeler Transformation work?
Asked by
Vincentt (
8094)
February 12th, 2008
I’ve read a few explanations of the principle of BWT, however, I still don’t get why it works, i.e. why does the output of BWT contain sequences of the same character that was originally only present at different locations in the input? Why aren’t the characters also scattered across the output?
Observing members:
0
Composing members:
0
4 Answers
Good question. Reading the Wikipedia article, I think this line gets close to your answer:
“So it can be seen that the success of this transform depends upon one value having a high probability of occurring before a sequence, so that in general it needs fairly long samples (a few kilobytes at least) of appropriate data (such as text).”
Basically I think it means that its most successful if a number of sequences repeat in the original sample. To get any deeper I think we’d be getting into some serious statistical mathematics which I’m not currently well versed in.
So the sequentials A’s and N’s in “Banana” would result from “NA” and “AN”, respectively? That sounds credible indeed, thanks!
Yes, by making all the rotations and then sorting them alphanumerically any often repeating patterns (AN or NA in BANANA) should expose themselves in strings of similar values or similar combinations of values. At least that’s my understanding.
Thanks for asking this question, I had never heard of this before and its really very interesting!
Thanks for answering :)
It’s for a project for school about data compression and indeed, it’s really interesting. You might also be interested in Huffman coding and Move-to-front transform, which I’ve also studied :)
Answer this question
This question is in the General Section. Responses must be helpful and on-topic.