The Language of Legitimacy: A Clinical Breakdown of Modern Drum Corps Judging Now and in the Future
- Edward Francis
- 2 days ago
- 14 min read

by Edward Michael Francis (they/them/theirs)
The Marching Revolution
Beyond the Numbers: A Scientific Examination of Drum Corps Judging Systems
The world of drum corps competition is as thrilling as it is enigmatic, with highly trained performers pouring their lives into a single 12-minute performance, all to be evaluated by a series of numbers—sometimes decided in the span of moments. For many performers, designers, and educators, those numbers can feel like gospel. But how do they actually get made? What do the sheets ask for? And do they provide a fair, consistent, and meaningful measure of excellence?
We set out to investigate the core design of the DCI judging system, using the 2012 DCI Judging Sheets (the most recent publicly available version we could find) as our foundational reference point. These documents—though now over a decade old—still underpin many of the current adjudication philosophies in the activity. Despite the evolution of programming, technology, and pedagogy, DCI does not currently make their current sheets accessible via their website, making these 2012 sheets invaluable for our analysis.
To compare, we also examined the 2012 sheets from Drum Corps Associates (DCA), the now-defunct all-age drum corps circuit, whose judging model shared some similarities with DCI’s but diverged in some key ways. To contextualize the lived reality of these systems, we also analyzed judging commentary from multiple circuits, including standout examples from DCI and DCA championship events.

Boxed In: The Scoring Language Problem
One of the most immediate issues apparent in both DCI and DCA's scoring systems is the ambiguous language embedded in their box descriptors. These boxes, ranging from Box 1 (lowest achievement) to Box 5 (highest), include terms such as "rarely," "sometimes," "usually," and "consistently." These are not scientific metrics. They are vague linguistic placeholders with wildly varying interpretations between adjudicators.
How rarely is "rarely"? Once per rep? Once per show? Once per season?
Even with training and community calibration efforts, the subjectivity inherent in this language creates potential inconsistency in scoring. Some judges may interpret a single performance as belonging solidly in Box 4, while another may feel it brushes the edge of Box 5. Without standardized definitions or statistically-anchored thresholds, the box language can become little more than a
Rorschach test.
In addition to vague descriptors, another problematic layer comes from the use of profile numbers—informal or implied ranges judges are asked to stay within depending on the time of season or competitive tier. Especially in regional or local marching band circuits, it's common practice for a chief judge to request adjudicators "stay out of the lower boxes" at championship events, or to remain within a designated profile range to ensure comparability across shows. These instructions, whether explicitly stated or subtly implied, effectively override the scoring boxes' intended neutrality, reducing the autonomy of judges and masking potentially valuable critical differentiation between groups.
This can result in score compression, where every ensemble ends up bunched into the same narrow band, or score inflation, where ensembles are rewarded for being present at the right time of year rather than for specific qualitative achievement. This practice further highlights the subjective weakness of the current system—revealing how even ostensibly scientific score sheets can be bent by politics, scheduling, or strategic manipulation.
Performance order only complicates things further. Judges, knowing they must leave space for upcoming groups or avoid over-awarding early on, may subconsciously hold back numbers—especially early in a block. A corps performing in the first half of the evening often faces invisible ceilings, not because of any stated policy, but because of the unspoken psychology of numeric relativity. As the night progresses, those numbers tend to drift upward—regardless of comparative quality. This phenomenon, known as "slotting" by insiders, is as old as the activity itself and casts further doubt on the meritocratic facade of the sheets.

DCI vs. DCA (2012): A Tale of Two Paradigms
While both circuits in 2012 used caption-based sheets divided into achievement and content (or repertoire and excellence), DCA often placed a stronger emphasis on ensemble readability and performer contribution within more varied levels of talent and age. DCI's sheets—while similar in form—were tailored toward high-end, youth-centered performance where demand and achievement were often assumed to track closely.
DCA's Visual Effect sheet, for example, explicitly referenced coordination between all visual elements and overall continuity. DCI’s version placed more emphasis on the “emotional contour” of the program and its construction—but both remained broadly interpretive.
There was also a practical difference in adjudication philosophy. DCA’s model seemed to allow for more democratic evaluation: smaller ensembles with excellent communication could rise higher on the sheets. In contrast, DCI—especially at the World Class level—implicitly privileged scale, demand, and show complexity.
Commentary in Practice: Joe Allison, Doyle Gammill, and the DCA Judges
If the sheets are the skeleton of the system, the commentary tapes are its bloodstream—real-time thoughts captured during a performance that feed directly into score decisions. And the range of commentary style is… dramatic.
Joe Allison (DCI 2013, Music Effect): Allison’s now-legendary tape for Carolina Crown exemplifies what effect commentary can aspire to. His performance analysis is thorough, highly specific, emotionally engaged, and full of musical insight. He references sonority, texture, orchestration, pacing, audience reaction, and even his own tears—all in real time. The tape is passionate but never shallow, deeply invested in both design and performer achievement.
Quotes like “you make the impossible relevant” and “every ounce is a pound” point to his investment in both the analytical and emotional impacts of the performance. There are technical observations (“octave alignment,” “temporary presence is happening,” “clarity of vertical sonority”) paired with aesthetic responses (“layers of concept and achievement,” “suspension of disbelief”).
This is what the sheets hint at but cannot articulate: a fusion of artistry and critique.
Doyle Gammill (Unknown Band, Year Unknown): In contrast, a now-viral judge tape by Doyle Gammill represents the extreme other end. His comments range from folksy meandering (“I just keep losing my pencils”) to openly unhelpful (“That’s a tall tuba line”). While perhaps well-meaning, the tape is almost devoid of useful feedback.
Much of his commentary revolves around personal anecdotes, subjective emotional impressions, or irrelevant details like how the band looks or how tall members appear. The result is an unstructured, confusing audio track that offers little actionable insight to staff or performers.
This disparity highlights a critical flaw in the system: even with a shared sheet, two judges with vastly different approaches can offer commentaries that are wildly divergent in depth, tone, and value.
Unnamed DCA Judges (DCA Finals, Percussion): To complete the triangle, we also reviewed two final-year commentary tapes from DCA’s championship event. While the tapes feature different voices and styles, both were highly musical, enthusiastic, and densely packed with feedback.
The first judge emphasized “vertical orchestration,” “dynamic layering,” “unexpected rhythmic punctuation,” and “soundscape construction.” His vocabulary was contemporary and his engagement consistent, though he occasionally drifted into personal reaction (“tasty!”) territory.
The second judge leaned heavily into technical critique, commenting on “fortissimo strokes,” “dynamic slurs,” “intervallic tuning,” and “orchestration texture.” He referred to front-to-back alignment, player consistency, and rhythm density with specificity and musical awareness. He even quoted from the sheet's criteria in his summary, clearly showing he used the sheet as a rubric, not just a score range.
Both judges provided insight at a championship level and delivered clear, evaluative observations tied to performance metrics. Their professionalism and use of the adjudication language grounded their reactions and helped legitimize their scores. These tapes reinforce that commentary quality depends heavily on the training and philosophy of the individual judge—not the sheet alone.
From Policy to Practice: What DCI Actually Requires
For those wondering what formal expectations DCI places on its adjudicators, the 2024 Bylaws, Policies, and Procedures manual offers insight.
According to this document:
Judges must undergo orientation sessions and pass certification tests annually.
Each judge is required to operate within their assigned caption and remain "current" through continuous training.
Adjudicators are assigned by the DCI Judging Coordinator.
Language of impartiality, fairness, and performer respect is emphasized.
Judges are not allowed to compare corps across captions—each performance is to be evaluated independently within its own merit.
This formal structure provides a controlled foundation, but the real-world commentary examples make clear that enforcement and interpretation vary widely.
Why This Matters
This is not just a curiosity for fans. These systems materially affect:
Recruitment: Higher placement equals better visibility.
Funding: Sponsors and donors often respond to competitive success.
Program Design: Designers may alter creative decisions based on perceived judging trends.
Performer Experience: Especially in youth activity, scores can impact how students value their efforts.
When scoring systems are inconsistent, opaque, or dependent on personal style rather than shared standards, they risk undermining the very excellence they seek to reward.

A New Architecture: Decentralized Judging and the End of Ranked Competition
The problems we've cataloged so far—vague box language, profile number manipulation, performance order bias, wildly inconsistent commentary—are real. But they're symptoms, not the disease.
The disease is this: we've built an adjudication system that asks judges to be omniscient arbiters of artistic merit, forces them into false precision to justify rankings, and moralizes effect instead of measuring it. We've created a competitive structure that manufactures scarcity, escalates demand beyond artistic necessity, and punishes performers for staff anxiety they didn't create.
You can't fix that with better training. You can't patch it with clearer box descriptors. You have to redesign the architecture.
Here's how.
The Central Problem: Judges Can't Do What We're Asking Them To Do
Let's start with the obvious: no human being can be expected to recognize every literary reference, catch every pop culture allusion, understand every musical quotation, identify every visual motif, and parse every conceptual throughline embedded in a modern drum corps production—especially when shows are drawing from increasingly obscure source material, academic theory, niche subcultures, and hyper-specific artistic traditions.
And yet that's exactly what we ask judges to do when we tell them to evaluate effect. We expect them to assess whether a design "worked" emotionally and intellectually, to measure its impact, to determine if it achieved its artistic goals—all while having no shared vocabulary for what those goals even are, and no guarantee they'll recognize the references that unlock the show's meaning.
It's absurd. And it's burning people out.
Judges are human. They react first—emotionally, instinctively, physically—and then they translate those reactions into the vocabulary of the activity. That's not a flaw. That's how perception works. The problem is we've built a system that pretends otherwise, one that demands judges possess encyclopedic cultural knowledge they don't have and then blames them when the scores don't make sense.
The result? Judges resort to mental gymnastics to justify placements. They moralize effect—calling moments "good" or "bad," "worthy" or "unworthy"—instead of describing what actually happened. They're forced to compare the incomparable, rank the unrankable, and defend decisions that were always going to be subjective.
Meanwhile, staff escalate demand to chase an ever-receding horizon. If World Class is the top, and you're already there, where do you go? Intergalactic? So you add more notes, more tosses, more drill moves, more everything—not because the design needs it, but because the system demands differentiation. And performers absorb the cost: longer rehearsals, higher injury rates, more anxiety, less joy.

Reframing the Judge's Role: Reaction Translators, Not Omniscient Arbiters
So what should judges do?
Judge the craft. Judge the training systems. Judge the technique, the clarity, the consistency, the vocabulary control. Judge what can actually be observed, measured, and articulated within a specific caption's expertise.
Judges are reaction translators. They experience a performance, notice what happens in their bodies and minds, and then translate those reactions into the technical language of the marching arts. A brass judge hears a chord and feels it resonate—then identifies whether that resonance came from intonation, blend, dynamic shaping, or all three. A visual judge watches a rifle toss and registers whether it felt clean—then names whether the issue was timing, placement, or execution.
This is valuable work. It requires deep expertise, years of training, and disciplined observation. But it does not require judges to assess whether the show's effect "worked" in some universal, objective sense.
Because effect doesn't belong to the judges. It belongs to the audience.
And the audience—the actual humans in the stands who are experiencing the performance in real time, across hundreds of different vantage points, with hundreds of different backgrounds and reference pools—are the only ones who can tell you what they felt, what they understood, and what landed.
The Box System Without Hierarchy: Classification, Not Competition
Before we talk about the audience, we need to dismantle ranked competition.
Here's the move: keep the box system. Lose the rankings.
Boxes are descriptive tools. They describe levels of achievement, consistency, and mastery. Box 1 describes rarely-achieved vocabulary control. Box 5 describes consistently-achieved mastery. These are useful distinctions. They help organizations, educators, and performers understand where they are and what growth looks like.
But boxes don't need to create a ladder. They can create categories.
Here's the proposal:

World Class is defined by mastery and control, not escalation. It's not "harder than Open Class"—it's more consistent. The vocabulary is clearer, the execution is cleaner, the training systems are more refined. That's it. That's the destination.
Multiple groups can occupy the same box simultaneously. Multiple groups can be World Class in the same season. Achievement is contextualized, not weaponized. There is no first place. There is no second place. There is only: "This is where you are. This is what you've mastered. This is what you're working toward."
Groups move between classes season-to-season based on their box placement, not their prestige or their budget or their brand. Classification remains functional without becoming a moral ladder.
Excellence exists without domination. Scarcity logic is removed without lowering standards.
The ladder ends. The horizon appears.
Audience as Effect Architects: The Grandma Esther Principle
Now we get to the good part.

If effect belongs to the audience, then the audience should be part of the evaluation system. Not as passive observers whose reactions judges try to intuit from the press box—but as active participants whose feedback is collected, aggregated, and analyzed.
Here's how it works:
Audience members register before the event. They indicate their area of expertise or experience—former brass player, current guard instructor, visual designer, front ensemble educator, casual fan, first-timer. They complete a short training module: a 30-minute video covering caption expectations, how to recognize technical excellence, and what kind of feedback is constructive. Not "louder trumpets," but "melodic blend in the ballad fades behind battery in the second impact."
After that, they're eligible to provide feedback—not on the entire show, but on the caption they know best. If you spun rifle in high school, maybe you comment on timing clarity and toss height in the flag feature. If you're a former front ensemble player, you might flag ensemble clarity issues in the pit. If you're a show designer, you're perfectly positioned to analyze pacing, transitions, and staging.
And if you're Grandma Esther, who came to watch her granddaughter spin in a show inspired by Georgia O'Keeffe's desert paintings, and you studied modern art for 40 years? You have insight too. Maybe not on toss technique, but on whether that production honored O'Keeffe's vision, whether the staging evoked the desert's vastness, whether the emotional arc landed. General Effect is for everyone with a heart and a brain. Your experience counts.
This kind of audience engagement has been suggested before, but the technology now exists to make it real. AI systems can identify repeated patterns of feedback, cluster similar responses, and flag potential manipulation attempts. Multiple viewers note "pit ensemble off from battery"? That's flagged as a trending issue. Excessively negative or inappropriate comments? Filtered or rephrased before submission.
Caption-based audience panels provide specific, focused insight—giving performers a fuller picture of their success across multiple expert perspectives, not just one judge's interpretation.
Engagement Heat Maps and Spatialized Feedback: Measuring What Actually Happens
Here's where it gets revolutionary.
Instead of asking judges to guess whether a moment "worked," we can measure audience engagement in real time and map it spatially across the venue.
Imagine this: as the show progresses, audience members submit reactions through a simple interface. Not scores. Not rankings. Just raw feedback: engaged, excited, confused, bored, moved. The system timestamps each reaction and maps it to the audience member's location in the venue.
The result is an engagement heat map—a time-based visualization showing when engagement rises, when it drops, where confusion appears, and which moments land hardest.
This isn't punitive. It's diagnostic. Boredom isn't a moral judgment—it's a design signal. If engagement drops at minute 4:32 across sections 3 through 7, that's actionable information. Maybe the staging doesn't read from those angles. Maybe the sound balance shifts. Maybe the pacing lags. Whatever the cause, the data points to where and when the issue occurs, giving designers concrete information to work with.
And here's the key: the system uses a universal, venue-agnostic spatial mapping model. Not specific seat numbers, but a normalized perception grid:
Horizontal axis: Side 1 → 50-yard line → Side 2
Vertical axis: near (front rows) → far (back rows)
This works in stadiums, gymnasiums, performing arts centers, any venue. It focuses on relative position and sightlines, not arbitrary seating charts. The system can identify which zones reacted most or least, revealing side-specific clarity problems, diagonal sightline issues, depth-dependent effects, and staging imbalances that a single judge in a press box would never catch.
This is spatialized audience insight. It validates what effect judges might miss. It shows where design reads differently depending on vantage point. And it treats the audience as the distributed perceptual network they actually are.
Operationalizing Effect: From Moralization to Measurement
This shift—from judges guessing at effect to audiences reporting their actual experience—fundamentally changes how we talk about what "works" in a show.
Right now, when a moment doesn't land, the language is moralizing: "You weren't committed." "The performers didn't sell it." "It felt empty." This puts the burden on the performers to somehow generate a feeling the judge didn't have—a feeling that may have been impossible given the design, the staging, or the judge's own reference pool.
In the new system, the language is operational: "Engagement dropped in zones 2, 5, and 8 between minutes 4:15 and 4:50." "Audience members in the back third of the venue reported confusion during the transition." "Side 1 registered significantly higher engagement during the ballad than Side 2."
This is information designers can act on. It's not about trying harder or being more committed. It's about editing the design, adjusting the staging, rethinking the pacing. Problems are solved through craft, not pressure.
Boredom becomes a design signal, not a moral failure.
Dynamic Caption Awards: Organic Recognition of Real Impact
And here's where the system becomes truly responsive.
Caption awards don't have to be static. They don't have to be predetermined by a committee six months before the season starts. They can be customized in real time—shaped by the audience itself, emerging from the actual moments that mattered most.
If a performer takes a catastrophic spill and recovers with such grace and presence that it lights up the feedback stream? Suddenly there's a Best Recovery Award. If a ballad hits so devastatingly hard that twenty people in section H start crying? Maybe that corps walks away with Most Emotionally Devastating Moment.
The algorithm doesn't just track performance—it listens. It reflects. It adapts to what the crowd cares about. These aren't participation trophies. They're signal flares. Organic recognitions of the real moments that made that night unforgettable.
The audience becomes not just an observer, but a responsive, feeling part of the adjudication fabric.
What This Solves: Benefits Across the Ecosystem
Let's be clear about what this architecture accomplishes.
For judges: Less burnout. More honesty. Higher credibility. They're no longer asked to be omniscient experts on every cultural reference or to justify placements through mental gymnastics. They judge what they can actually see, hear, and articulate. Their expertise is respected, not weaponized.
For staff: Clearer diagnostics. Less anxiety-driven escalation. Instead of chasing an ever-receding "World Class" horizon by piling on more demand, designers get concrete feedback about what's working and what isn't—spatially, temporally, and emotionally. They can make informed creative decisions instead of guessing what judges want.
For performers: Less punishment. More transparency. Safer culture. The system stops blaming them for design problems or moralizing their commitment. Achievement is celebrated without domination. Multiple groups can be excellent simultaneously.
For audiences: Active participation rather than passive consumption. Their reactions matter. Their expertise is valued. They're no longer props in someone else's competitive drama—they're co-creators of meaning.
For the art: Clearer. Kinder. Braver. More honest. The work can take risks without fear of arbitrary penalization. Design innovation is rewarded with real feedback, not suppressed by judges who don't understand the references. The art grows because the system finally measures what actually happens instead of pretending to measure what should happen.

Conclusion: The Future Is Watching
The judging systems we've inherited are built on tradition, vocabulary, and best intentions—but they're not unbreakable, and they're certainly not sacred. The inconsistencies are real.
The frustrations are valid.
But we're no longer stuck in the press box. Not anymore.
We have the tools, the will, and the people. Not just a few at the top with clipboards and headsets, but thousands of passionate, informed, diverse individuals who live and breathe this art form. They want to be involved. They want to give back. And they're already sitting in the stands.
The system we've outlined here—decentralized judging, box-based classification without rankings, audience-driven effect measurement, spatialized feedback, engagement heat maps, dynamic caption awards—isn't science fiction. The technology exists. The expertise exists. The desire exists.
All we have to do is stop pretending that a handful of judges in a press box can capture the totality of a performance's impact. All we have to do is trust the audience to tell us what they actually experienced. All we have to do is shift from moralization to measurement, from hierarchy to honesty, from the ladder to the horizon.
Let's make everyone a judge. Let's create the smartest, kindest, most electrifying adjudication system the marching arts has ever seen.
And let's start with the simplest truth:
They're not just watching. They're ready.



Comments