about / lucid

Notes on a system that tries to make the scroll legible.

LUCID is a proof-of-concept tool that scores short-form video for six research-grounded manipulation tactics. This page explains why it exists, how the rubric was built, and what it can and cannot tell you.

§ 01 — overview

Something is being done to your attention, and you don’t have a vocabulary for it.

Open TikTok at 11 p.m., close it at 1 a.m., and try to name five things you watched. Most people can’t. The posts blur. What’s left is the feeling of having been acted on rather than the feeling of having chosen.

Short-form video platforms rank posts by engagement, and creators have adapted their craft to the specific psychological levers those rankings reward. The levers are real, and they’ve been studied for decades. Curiosity gaps, variable-ratio reinforcement, outrage-based sharing, scarcity framing. But they’re usually invisible at the post level. A single TikTok isn’t labeled as manipulative, and most viewers don’t have the vocabulary to describe which lever is being pulled on them at which moment.

LUCID is a small attempt at that vocabulary. Paste a TikTok URL, and it returns a 0–100 Scroll Trap Score with a per-dimension breakdown: how much of what you’re about to watch is outrage bait, how much is a curiosity gap, how much is surface-level dopamine design. It will not tell you a creator’s intent. It will not tell you whether a post is true. It will tell you, as a statistical estimate over a rubric rooted in peer-reviewed research, what rhetorical moves the post is making.

§ 02 — the landmark case

This isn’t only an academic question. It’s being argued in court right now.

Meta, TikTok / ByteDance, Snap, YouTube, and Alphabet are all defendants in a consolidated multidistrict litigation in the Northern District of California: In re: Social Media Adolescent Addiction / Personal Injury Products Liability Litigation, MDL No. 3047, before Judge Yvonne Gonzalez Rogers (case 4:22-md-03047-YGR). The MDL consolidates thousands of individual personal-injury suits, hundreds of school-district actions, and attorney-general filings from more than forty states. The plaintiffs’ core theory is straightforward: the products were designed to maximize engagement in a way the defendants knew produced addictive use patterns in minors, and were marketed as safe anyway.

On October 24, 2023, a multi-state coalition of forty-two attorneys general filed parallel actions against Meta alleging that Instagram and Facebook were deliberately engineered to addict young users while the company publicly denied doing so. Thirty-three AGs joined a joint federal complaint in the Northern District of California; the remaining states filed in their own state courts. The filings allege violations of the federal Children’s Online Privacy Protection Act and state consumer-protection statutes, and they name specific product mechanics (infinite scroll, push notifications, recommendation-driven feeds) as the designed features causing harm (NJ AG press release, 2023; NY AG press release, 2023; Allyn, NPR, 2023).

The record the AGs are drawing on was not discovered in court. In September 2021, the Wall Street Journalpublished the “Facebook Files” based on internal Meta research leaked by a former employee, Frances Haugen. One slide from a 2019 internal presentation read, verbatim: “We make body image issues worse for one in three teen girls.” Another reported that thirty-two percent of teen girls said that when they felt bad about their bodies, Instagram made them feel worse. Haugen identified herself publicly on 60 Minutes on October 3, 2021, and testified before the U.S. Senate Commerce Subcommittee two days later. Meta contested the framing of the research but not, by and large, its existence (Vanian, CNBC, 2021; Wells, NYT, 2021; Allyn, NPR, 2021).

I’m flagging this up front because the concept underneath LUCID is not a researcher’s hypothesis anymore. The manipulation of attention at the post level is measurable, and platforms have internal knowledge of the machinery. It’s a live federal case with more than forty state governments on one side.

§ 03 — the six dimensions

How the rubric was built, and why it has six axes rather than one.

The rubric is the part of the project I took most seriously. A score is only as meaningful as the taxonomy underneath it, and a single “manipulation score” collapses distinctions that matter. Outrage bait and a curiosity gap both raise engagement, but they do it by pressing on entirely different cognitive mechanisms, and a viewer who can name which lever a post is pulling is in a different relationship to the post than one who can’t. So LUCID scores six dimensions, each of which traces to at least one established line of behavioral research.

01 · Outrage Baitdim.outrage
Framing designed primarily to provoke anger, moral indignation, or tribal reaction rather than to inform.
Crockett, 2017; Brady et al., 2017 / 2021
02 · FOMO Triggerdim.fomo
Manufactured urgency, scarcity, or social comparison used to make the viewer feel like opting out is a loss.
Przybylski et al., 2013; Cialdini, 2009
03 · Engagement Baitdim.engagement
Explicit prompts to tag, comment, share, or follow whose purpose is to inflate algorithmic signals, not to host a real conversation.
Meta, 2017; Munger, 2020; Mathur et al., 2019
04 · Emotional Manipulationdim.emotional
Guilt, pity, or shame used as a substitute for evidence. Emotional pressure that stands in for an argument.
Small, Loewenstein, & Slovic, 2007; Kramer et al., 2014
05 · Curiosity Gapdim.curiosity
Deliberate withholding of a key referent or outcome to force the viewer to click, scroll, or watch through.
Loewenstein, 1994; Blom & Hansen, 2015
06 · Dopamine Designdim.dopamine
Surface-level salience hooks (ALL CAPS, rapid cuts, emoji spam, variable-reward pacing) that capture attention before the content is evaluated.
Skinner, 1953; Alter, 2017; Montag et al., 2019

Two design choices in the scoring that I want to be explicit about.

The severity scale is ordinal, not binary.Each dimension is scored 0 / 1 / 2 (absent, moderate, severe), and the composite Scroll Trap Score is a 0–100 aggregation of the six. Manipulation is gradient; a post using a single mild outrage hook is doing something qualitatively different from one stacking outrage, scarcity, and guilt in the same ten seconds. Binary labels would hide that. Three levels are coarse enough to be teachable to both a human labeler and a language-model judge, and fine enough to distinguish rhetorical intensity.

The score is a property of the text, not of the creator. LUCID evaluates what the post is doingat the level of fused caption, audio transcript, and on-screen overlay. It does not claim to measure intent, and it doesn’t try to. A nonprofit using emotional appeals to recruit foster parents and a con artist using the same techniques for an info-product will both score high on Emotional Manipulation, because the rhetorical move is the same on the page. The judgment about intent is the reader’s.

Finally, a note on why the rubric is fixed rather than learned. A clustering approach would surface whatever structure the data happens to have; a fixed rubric commits up front to a set of categories that are defensible to a non-ML reader. For a tool intended to help people articulate what a post is doing to them, the second property matters more. The taxonomy is one defensible cut of the space, not the only one.

§ 04 — how the labels were made

And why I sat down and hand-labeled a hundred of them myself.

The deployed model is a fine-tuned DistilBERT, which is a small (66M-parameter) encoder transformer. It needs labeled training data in its target format. The issue is that no one has ever labeled three and a half thousand short-form-video captions on a six-dimension ordinal rubric that I made up. The data doesn’t exist.

The pragmatic solution is what the literature has started calling LLM-as-a-judge. Claude Sonnet 4.5 is given the full rubric (the one in §03 above) along with eight few-shot examples spanning 0 / 1 / 2 severity per dimension, and it labels every item in the training corpus. DistilBERT is then trained on those labels. The framing is borrowed from Anthropic’s Constitutional AI work and formalized for evaluation use by Zheng et al. (2023) on MT-Bench. The idea, in plain terms: a larger language model trained on human-written principles can act as a consistent labeler at scale for a smaller model to learn from.

The honest worry about this approach is that it’s circular. You defined manipulation one way; you gave that definition to an LLM; the LLM produced labels that reflect your definition; a model trained on those labels ends up saying what you already believed. If all you do is train and ship, you’re not measuring anything about the world. You’re measuring the consistency of your own rubric applied by a proxy.

The way out is to treat the Claude labels as a noisy oracle, not ground truth, and to calibrate them against something external. Which is why I hand-labeled 100 items sampled from the corpus with a fixed seed of 42, through a small Gradio interface I built for the purpose. Same rubric, same severity levels, no Claude output visible during labeling. The point of the exercise is not that one person’s labels are the truth. The point is that if you’re going to build a labeling pipeline at scale, you need to do the labeling yourself at least once, on a representative sample, and see whether the pipeline agrees with you in places that are easy and disagrees with you in places that are hard. Otherwise you don’t actually know what you shipped.

The metrics that come out of that comparison are per-dimension Spearman rank correlation (how well the two labelers order severity) and Krippendorff’s α (an ordinal agreement coefficient), alongside exact-match and within-one-step accuracy. Here is what the 100-item pass produced.

Claude-vs-human agreement on the 100-item gold set, per dimension.
Dimension	Spearman ρ	Krippendorff α	Exact	Within 1
Outrage Bait	+0.463	+0.388	0.81	0.90
FOMO Trigger	+0.019	+0.019	0.82	0.94
Engagement Bait	+0.403	+0.332	0.81	0.95
Emotional Manipulation	−0.056	−0.053	0.89	0.96
Curiosity Gap	+0.238	+0.239	0.68	0.96
Dopamine Design	+0.176	+0.166	0.84	1.00
Macro average	+0.207	+0.182	0.808	0.952

Two things stand out. Within-one-step agreement averages 0.95, which means that on a 0 / 1 / 2 scale Claude and I rarely disagreed by more than a single severity level. At the coarse “present or absent” decision, we agree most of the time. At the same time, the rank-correlation numbers on the rarer dimensions sit close to zero. That is a class-imbalance artifact, not a labeling failure. When only 7 of 100 items are non-zero on Emotional Manipulation, a single borderline disagreement drags Spearman toward zero because the formula has almost no variance to work with. The dimensions with real variance behave as expected: Outrage Bait and Engagement Bait both land around ρ = 0.4, which is a moderate correlation typical of one-human ordinal agreement with an LLM judge on a multi-dimensional rubric.

A third pattern is that Claude is systematically more conservative than I was. On almost every dimension, Claude fires non-zero roughly half as often as I do. That is consistent with the “when in doubt, pick the lower score” instruction in the labeling prompt, and it pushes the downstream model toward precision at the cost of recall. Read the agreement table as a lower bound on true agreement with a less conservative labeler prior.

The broader point: If LUCID’s labels came solely from a language model this would be a scalable-oversight move with a known failure mode, and the human-validation pass is what makes it defensible. Where the agreement numbers come back weak, that is useful information. It shows which part of the rubric needs to be tightened or dropped.

§ 05 — why a model, not a lecture

What the pipeline actually does when you paste a URL.

Most media-literacy work is essays. Essays describe the machinery in the abstract and leave you to spot it in the wild, which is the hard part. A model that scores a specific post on specific dimensions turns a vague intuition into something you can point at.

Here’s what happens when you paste a TikTok URL, described without equations:

The video is downloaded and its caption pulled from metadata.
The audio is transcribed to text by Whisper, a widely used speech-recognition model.
Four evenly-spaced keyframes are pulled from the video, and a vision-language model (Claude Vision) reads any on-screen overlay text from those frames. This is the closest thing in the stack to “watching” the video.
The three streams (caption, transcript, overlay) are concatenated into one fused text blob. From the model’s perspective, a TikTok is just that blob.
The fused text is passed to a fine-tuned DistilBERTclassifier with a multi-output head: six per-dimension probabilities plus a composite. That’s the Scroll Trap Score you see.

The model was trained on 3,491 items sampled from two established clickbait corpora and a small TikTok scrape, all relabeled against the six-dimension rubric. That sample size is explicitly small; it would not satisfy a commercial T&S team. The full technical report, with metrics, confusion matrices, a noise-robustness experiment, and an error analysis, lives in the project repository on GitHub. The fine-tuned model weights are on the Hugging Face Hub.

§ 06 — ethics & limitations

Four things this tool is not, stated plainly.

It is not ground truth.The training labels come from a language model applying a rubric I wrote. A different labeler with a different taxonomy would produce different numbers. The human-validation pass in §04 bounds how much to trust the labels, but it doesn’t make them authoritative. Treat the scores as an informed estimate, not a measurement.

It does not read minds.The model scores surface features of a post’s fused text. It says nothing about what the creator intended, whether the underlying claim is true, or how any specific viewer will feel after watching. A documentary, a fundraiser, and a scam can all score high on Emotional Manipulation if they use the same rhetorical moves. The score is a signal, not a verdict.

It is a small research dataset.3,491 items, heavily weighted toward English-language clickbait headlines plus a modest TikTok scrape. This is enough to compare naive, classical, and deep approaches on a fixed rubric. It is not enough to underwrite a commercial moderation product, and I don’t claim otherwise.

It is one cut of the space.The six-dimension taxonomy is defensible, since every axis is grounded in at least one line of peer-reviewed research, but it is not the only defensible taxonomy. A researcher working primarily in misinformation or in persuasion studies might carve up the same content space differently. The rubric is a starting point for making the invisible legible, not the final word on what “manipulation” means.

§ 07 — references

Every claim above sourced.

Grouped by section. News and court filings first, then behavioral-research citations underpinning the rubric, then the machine-learning literature.

§ 02 — legal & journalism

U.S. District Court, Northern District of California. In re: Social Media Adolescent Addiction / Personal Injury Products Liability Litigation, Case 4:22-md-03047-YGR (MDL No. 3047). cand.uscourts.gov.
New Jersey Office of the Attorney General (2023, October 24). AG Platkin, 41 other attorneys general sue Meta for harms to youth from Instagram, Facebook. njoag.gov.
Office of the New York State Attorney General (2023, October 24). Attorney General James and multistate coalition sue Meta for harming youth. ag.ny.gov.
Allyn, B. (2023, October 24). States sue Meta, claiming Instagram, Facebook fueled youth mental health crisis. NPR. npr.org.
Vanian, J. (2021, September 14). Facebook documents show how toxic Instagram is for teens, Wall Street Journal reports. CNBC. cnbc.com.
Wells, G. (2021, October 5). Teenage girls say Instagram’s mental-health impacts are no surprise. The New York Times. nytimes.com.
Allyn, B. (2021, October 5). Whistleblower’s testimony has resurfaced Facebook’s Instagram problem. NPR. npr.org.

§ 03 — rubric, behavioral research

Crockett, M. J. (2017). Moral outrage in the digital age. Nature Human Behaviour, 1(11), 769–771. doi:10.1038/s41562-017-0213-3.
Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Van Bavel, J. J. (2017). Emotion shapes the diffusion of moralized content in social networks. PNAS, 114(28), 7313–7318. doi:10.1073/pnas.1618923114.
Brady, W. J., McLoughlin, K., Doan, T. N., & Crockett, M. J. (2021). How social learning amplifies moral outrage expression in online social networks. Science Advances, 7(33), eabe5641. doi:10.1126/sciadv.abe5641.
Przybylski, A. K., Murayama, K., DeHaan, C. R., & Gladwell, V. (2013). Motivational, emotional, and behavioral correlates of fear of missing out. Computers in Human Behavior, 29(4), 1841–1848. doi:10.1016/j.chb.2013.02.014.
Cialdini, R. B. (2009). Influence: Science and Practice (5th ed.). Pearson.
Meta Newsroom (2017, December 18). Fighting engagement bait on Facebook. about.fb.com.
Munger, K. (2020). All the news that’s fit to click: The economics of clickbait media. Political Communication, 37(3), 376–397. doi:10.1080/10584609.2019.1687626.
Mathur, A., Acar, G., Friedman, M. J., Lucherini, E., Mayer, J., Chetty, M., & Narayanan, A. (2019). Dark patterns at scale: Findings from a crawl of 11K shopping websites. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW). doi:10.1145/3359183.
Small, D. A., Loewenstein, G., & Slovic, P. (2007). Sympathy and callousness: The impact of deliberative thought on donations to identifiable and statistical victims. Organizational Behavior and Human Decision Processes, 102(2), 143–153. doi:10.1016/j.obhdp.2006.01.005.
Kramer, A. D. I., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. PNAS, 111(24), 8788–8790. doi:10.1073/pnas.1320040111.
Loewenstein, G. (1994). The psychology of curiosity: A review and reinterpretation. Psychological Bulletin, 116(1), 75–98. doi:10.1037/0033-2909.116.1.75.
Blom, J. N., & Hansen, K. R. (2015). Click bait: Forward-reference as lure in online news headlines. Journal of Pragmatics, 76, 87–100. doi:10.1016/j.pragma.2014.11.010.
Skinner, B. F. (1953). Science and Human Behavior. Macmillan.
Alter, A. (2017). Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked. Penguin Press.
Montag, C., Lachmann, B., Herrlich, M., & Zweig, K. (2019). Addictive features of social media/messenger platforms and freemium games against the background of psychological and economic theories. International Journal of Environmental Research and Public Health, 16(14), 2612. doi:10.3390/ijerph16142612.

§ 04–05 — machine learning

Bai, Y., Kadavath, S., Kundu, S., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073. arxiv.org/abs/2212.08073.
Zheng, L., Chiang, W.-L., Sheng, Y., et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. NeurIPS Datasets and Benchmarks Track. arxiv.org/abs/2306.05685.
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. NeurIPS EMC² Workshop. arxiv.org/abs/1910.01108.