18
@0xdrej
Andrej
Skipped detailed analysis: Personal account of a founder/team member speaking in first person, not a project or protocol account.
AI Analysisneutral
Confidence
30%
Skipped detailed analysis: Personal account of a founder/team member speaking in first person, not a project or protocol account.
Recent tweetsSee all on 𝕏 →
wait a second we did this in a week
Imagine telling someone in 2024 that not over-relying on synthetic data is a good thing.
The reality is LLM outputs are biased towards the high-density regions of the training distribution.
Synthetic data is a lossy reconstruction of reality. That's great for augmentation and distillation. But if models increasingly train on the outputs of other models, we're effectively recompressing a compression. The first thing that disappears is the long tail (rare observations and genuinely novel information), and much of what we call intelligence is the ability to reason about those exact things.
Training data is still widely misunderstood. Some notes on what we're seeing in practice in the post below.
fun fact: the entire AI economy right now relies on the ability to access the web by scraping a search engine.
the only way to do this today is through residential proxy networks like @grass (or others, which do not always get users’ consent to route through their IPs)
Just saw a data labeling company calling itself a “human cloud”. What a time to be alive
Signal Timeline
DE
@DefiIgnas followed
Score breakdown0–100
🎯Scout quality
+16.200000000000003 / 25
📚Signal stack
0 / 30
🪪Profile
+14 / 15
✍️Content
+3 / 10
🤖AI verdict
+10 / 20
⚠️Penalties
-25 / 20
18
Below threshold (70)
Watching for additional signals.
Watching for additional signals.
Followers
18.0K
Account age
4.3y
Scouts
0
First seen
1d ago