Spatial Trajectory Topic Modeling

Extracting Movement Patterns from Trajectory Data using NMF

Project Overview

This project implements the methodology from "Extracting Movement-based Topics for Analysis of Space Use" (G. Andrienko, N. Andrienko, D. Hecker, EuroVis Workshop on Visual Analytics, 2023). The implementation was developed for the course 193.166 Visualisierung (VU 4.0), TU Wien, 2025W.

Dataset

Instead of the Milan car movement data used in the original paper, this project uses the Beijing T-Drive taxi trajectory dataset (2008). The focus is on implementing the analytical pipeline and enabling interactive exploration of spatial and movement patterns in urban taxi trajectories.

Key Objectives

Space Tessellation

Extract characteristic points from trajectories and construct Voronoi cells to partition the urban space.

Topic Modeling

Apply Non-negative Matrix Factorization (NMF) to discover place-based and movement-based topics.

Interactive Visualization

Explore discovered patterns through an interactive web-based visualization built with D3.js.

How the Visualization Works

The visualization provides an interactive interface to explore the discovered spatial and movement topics. Built with D3.js and Leaflet, it allows users to examine patterns across different tessellation configurations and topic numbers.

Visualization Features

  • Dataset Selection
    Choose from multiple tessellation configurations (different radii) and topic numbers to explore various granularities of spatial patterns.
  • Dual View Modes
    Toggle between Places visualization (colored Voronoi polygons showing areas) and Moves visualization (curved arcs showing transitions between cells).
  • Topic Filtering
    View all topics at once or focus on a specific topic. Filter features by minimum topic weight to highlight the most significant patterns.
  • Interactive Map Controls
    Hover over features to highlight them, click for detailed information including topic distributions, and zoom/pan to explore different areas of Beijing.
  • Visual Customization
    Adjust opacity, stroke width, color schemes, and toggle hover effects to create the optimal visualization for your analysis needs.

Places Visualization

The places view displays Voronoi cells colored by their primary topic assignment. Each color represents a different spatial topic, revealing areas that tend to be used together in the taxi trajectory patterns.

Places Visualization showing Voronoi cells colored by topic

Moves Visualization

The moves view shows transitions between cells as curved arcs, colored by movement topic. This reveals common routes and movement patterns in the Beijing taxi network.

Moves Visualization showing movement arcs between cells

How the Backend Works

The backend consists of a Python-pipeline that processes raw trajectory data through multiple stages to extract meaningful spatial and movement patterns.

Pipeline Architecture

  • Data Preprocessing & Filtering
    Load raw taxi trajectory data from the T-Drive dataset. Apply spatial bounding box filtering, temporal filtering (date/time-of-day), and trajectory validation to clean the input data.
  • Characteristic Point Extraction
    Identify significant points in trajectories: stops (low speed, stationary periods), turns (significant direction changes), and transitions. These points capture the essential structure of movement patterns.
  • Spatial Clustering & Tessellation
    Group characteristic points with a configurable radius parameter. Construct Voronoi polygons around cluster centroids to create a tessellation of the urban space. Each cell represents a distinct spatial area.
  • Sequence Generation
    Transform trajectories into sequences of cell visits (places) and cell-to-cell transitions (moves). Create document-term matrices where documents are trajectories and terms are cells or transitions.
  • Topic Modeling with NMF
    Apply Non-negative Matrix Factorization independently to place sequences and move sequences. This discovers latent topics that represent frequently co-occurring places or movement patterns. Each topic is a probability distribution over cells or transitions.
  • Evaluation & Export
    Run ensemble NMF with multiple random seeds. Compute quality metrics (reconstruction error, sparsity). Export results as GeoJSON for visualization including topic assignments and weights for each cell/transition.

Technology Stack

NumPy Pandas SciPy Scikit-learn Shapely D3.js Leaflet.js OpenStreetMap

Key Files & Scripts

run_tessellation_pipeline.py

Main script for space tessellation. Configurable parameters for radius, subset size, and spatial/temporal filtering.

run_nmf.py

Runs NMF topic modeling on tessellation results. Supports both places and moves with evaluation metrics.

run_all_exp.py

Batch experiment runner that tests multiple parameter combinations for systematic exploration.

generate_datasets_list.py

Generates the dataset catalog JSON file consumed by the web visualization.

Getting Started

Quick Start

To run the visualization with existing results:

  1. Start a local web server from the project root: python -m http.server 8000
  2. Open your browser to: http://localhost:8000/visualization
  3. Select a dataset and explore the discovered topics

Running the Full Pipeline

To process your own trajectory data or experiment with different parameters:

  1. Set up the Python environment and install dependencies
  2. Download the T-Drive dataset or prepare your own trajectory data
  3. Run the tessellation pipeline: python run_tessellation_pipeline.py
  4. Run NMF topic modeling: python run_nmf.py
  5. Generate the dataset catalog: python generate_datasets_list.py
  6. Launch the visualization using a local web server