About SteamSifter

Daniel Yoo, 4th Year Computer Science Student currently enrolled at the University of Texas at Dallas.

The Problem

Steam games can collect up to thousands of reviews, mixing genuine bug reports, feature praise, jokes, off-topic rants, and review-bombing. Reading all of that manually to answer "What should we fix?" and "What do our players want more of?" is a slow and inconsistent task.

SteamSifter takes a game, pulls its reviews automatically, filters the noise, and returns two ranked dashboards: issues to fix (by impact) and praised features to double down on (by frequency and sentiment).

How It Works

Search and ingest: resolves a game name to its Steam app ID and pulls the most recent reviews from Steam's free public API, each with its recommend flag, helpful votes, playtime, and date.
Filter for signal: a relevance classifier separates constructive feedback from off-topic noise, jokes, and review-bomb spam.
Classify: each review is tagged with sentiment and a category (bug, performance, gameplay, cheating, monetization, UI/UX, content, community, praise) using structured model output. Batches run concurrently to keep it fast.
Theme: reviews are grouped into specific, named themes for each side, by an LLM pass for typical volumes or by embeddings + k-means clustering at scale (chosen automatically).
Rank by impact: themes are sorted by frequency plus behavioral weight, so issues raised by long-playtime, highly-upvoted reviewers rank above low-effort rage reviews.
Present: a dashboard with a summary scoreboard, a sentiment donut, a sentiment-over-time trend, and an Issues / Praise toggle with ranked theme cards and clickable example quotes, each showing the reviewer's avatar and name plus an English translation for non-English reviews.

"Impact" is an inferred heuristic (frequency, sentiment, playtime, helpful-votes), not ground truth. It is presented as an informed estimate.

Tech Stack

Reviews: Steam public appreviews API (free, no key), plus storesearch for name-to-app-ID lookups
AI: OpenAI (gpt-4.1-mini) with structured/JSON output, swappable to free-tier Gemini; concurrent batched classification, and theming that scales via OpenAI embeddings + k-means clustering
Frontend: Steam-styled, mobile-responsive UI with Chart.js sentiment/category charts and a summary scoreboard
Backend: Flask and gunicorn, background jobs with a live progress bar, per-game caching persisted to Redis

Current Limitations

SteamSifter is deployed and open to anyone, but it is a solo project tuned for light traffic: a single worker on a free Render instance sharing one AI key. A first-time analysis takes about one to a few minutes depending on volume, and scales up to ~1,500 reviews via an embedding-based theming path; results are cached so popular titles are only re-analyzed when their reviews grow. The main remaining limit is concurrency under many simultaneous users.

As of June 18, 2026, SteamSifter runs on the OpenAI API (gpt-4.1-mini) and can switch to free-tier Gemini when needed.