Information Visualization
Arc Diagrams
The Program

The Program is based on the work of Martin Wattenberg, who's currently working at the IBM research center in Camebridge. Click here to view his paper about Arc Diagrams in .pdf format.


When to draw Arcs
This is a summary of the rules for repeating strings, which are linked by an arc. Basically there are 3 rules: An essential matching pair is a pair of substrings of S, X and Y, which are:

  • A maximal matching pair not contained in any repetition region,
  • Or, a maximal matching pair contained in the same fundamental substring of any repetition region that contains it
  • Or, two consecutive fundamental substrings for a repetition region.

  • Furtherer more you should know what maximal matching pairs and repetition regions are. A maximal matching pair is a pair of substrings of S, X and Y, which are:

  • Identical. X and Y consist of the same sequence of symbols.
  • Non-overlapping. X and Y do not intersect.
  • Consecutive. X occurs before Y, and there is no substring Z, identical to X and Y, whose beginning falls between the beginning of X and the beginning of Y.
  • Maximal. There do not exist longer identical non-overlapping subsequences X’ and Y’ with X’ containing X and Y’ containing Y’

  • A repetition region R is a substring R of S such that R is made up of a string P repeated two or more times in immediate succession. Each repetition of P is called a fundamental substring for R.
    The definition of repetition regions is fairly unclear if you think about it. Which of the following arcs will be correctly? The ones above or the ones beneath the string?



    So we decided to contact Mr. Wattenberg to get clear information about this special problem. When we got response we received an updated definition about repetition regions:

  • Whichever has the smallest fundamental pattern wins
  • When the fundamental patterns are the same lengths, whichever starts nearest to the beginning of the string wins


  • Implementation

    The program code was implementet in Java. For graphic visualization we used the Piccolo framework and Java Graphics 2D.
    The algorithm to analyze an input string uses several lists to find repeating substrings. Based on the rules mentioned above arcs are drawn to visualize a match.


    Features

    The following list of features were implemented in the Arc Diagrams Program:
  • analysis of strings and midi music
  • Setting the minimal and maximal size of substings that will be searched
  • Reading from a file or typing text directly into the program window
  • Searching only for matching words
  • Case sensivity on or off
  • Ignorance of whitespaces between words
  • Using the soundex algorithm which identifies words that are not equal but sound similar
  • Switching between several tracks of a midi file
  • Fuzzy Factor which draws arcs even if the matching notes do not actually have the same duration
  • Zooming to a selected arc to see more details
  • Displaying real notes instead of notes in midi interpretation format
  • Plugin based code to create new plugins easily (For example a DNA - Analysis plugin)