30 January 2022
What is the best Wordle starting word?
You might be sick of hearing about Wordle at this point, but here’s how I found the best starting word with a simple python script and “HTML view source”. We’ll also dive into game design decisions and some gray area analysis.
tl;dr - you might think AEROS, but after analyzing just the solution set of words, it’s ROATE (jump ahead if you dare!)
The research
For starters, I wanted to see how the game processes the guesses and if I could find the game’s dictionary. While viewing the network panel in the developer console, I kind of expected to see AJAX requests, but instead it’s a gorgeously minimal static website—mostly just an HTML doc and a single 177 kB JavaScript. I opened the JS file and scrolled through to see what I could find. While Wordle is described as a love story, it created a brief conflict in my own relationship, as my partner happened to see the current day’s answer on the screen and then another word after it, which was the following answer! Without looking too closely at the contents, I identified the answer list and the remaining dictionary words in the source code.
With the entire dictionary of 5-letter words at my disposal, I could see there are 12,972 words total and 2315 answers (6.33 years of answers). The next step was to analyze.
First step was to score a guess against an answer. Funny enough, while I was at the Recurse Center, I partook in and later ran a pair programming workshop with Alicia—one of the facilitators—where she had suggested we do a Mastermind prompt. We had our own example code (along with some fun slides) and since Wordle is basically Mastermind with words, I grabbed the evaluate_guess code and modified it to work with letters.
I had my script run through each word in the dictionary, score it against all others, and sort by score. The score is a 3-tuple:
- the sum of all green & yellow squares for each guess against the dictionary
- the sum of just greens
- the sum of just yellows
I’d love to hear other ideas on optimal scoring heuristics. Also, the code isn’t really optimized for performance and it’s an N^2 running time problem, but the script processes the whole dataset in < 6 minutes on my laptop. This is sufficient enough for something that needs to be run once, while any testing was done against a small sample of the data.
The results
Top Wordle starting words based on the full dictionary:
word Σ all Σ green Σ yellow
1. aeros 24791 8219 16572
2. soare 24791 7138 17653
3. arose 24791 4708 20083
4. reais 24469 8330 16139
5. raise 24469 5980 18489
6. serai 24469 5745 18724
7. arise 24469 4766 19703
8. aesir 24469 4451 20018
9. aloes 23996 8714 15282
10. lares 23994 10323 13671
The winner AEROS—plural noun: (Relating to) aircraft or aeronautics)—doesn’t appear in a lot of dictionaries, but it is in the Collins Scrabble Words list, which is used in the UK (of course the URL for the game ends in .co.uk).
Anagramming in at number two is SOARE—noun: (Animals) obsolete a young hawk—is another fringe word missing from Scrabble dictionaries, other than Collins.
Rounding out the mix of letters of AROSE—verb: past tense of arise—is a much more common word that was able to rise to the challenge. However, we can see the greens hit rate is *significantly° lower.
Also, honorable mention to the absolute worst-scoring word from the 12,972 words list: FUFFY.
Digging deeper
There are a number of articles/posts from the past couple weeks dissecting what your Wordle play style and starting word choice says about you.
Maybe this says something about me… but I think it’s more helpful to think less about me and more about what decisions the game’s author made while designing it. The answer list is a small subset of the entire dictionary, what decisions were made in selecting that list? Is it possible to answer these questions without cheating? This is kind of a gray area, so only read on below the screenshot of articles du jour if you’re OK with seeing a little analysis of the answers list.
↑ Lol, this screenshot looks like an ad
Analysis of the answers list
Rerunning the script, but this time comparing every word in the dictionary (all valid guesses) against only the correct answers, we get a different result.
Top Wordle starting words when comparing the full dictionary to valid answers only:
word Σ all Σ green Σ yellow
1. roate 4142 1254 2888
2. orate 4142 1178 2964
3. oater 4142 986 3156
4. realo 4123 874 3249
5. taler 4117 1095 3022
6. later 4117 1033 3084
7. ratel 4117 994 3123
8. artel 4117 993 3124
9. alter 4117 983 3134
10. alert 4117 924 3193
The winner, ROATE—verb: to learn by repetition, also ROTE—is another Collins Scrabble Words list-only term.
These words aren’t guaranteed to be correct answers since they can be from anywhere in the dictionary, but—since they were only scored against valid answers—they’ll provide the greatest coverage on average for a first guess. Spoiler alert: “ROATE” is not in the answers list. So if you’re hoping for that fanciful correct first guess, then it’s not for you. It seems that more obscure words were left out in order to make the game more playable/enjoyable for people (our first insight into game design choices!)
Another game design insight is that “S” is conspicuously missing from all the top answers above. Without looking too closely, it appears all pluralizations of 4-letter words are excluded from the answer set. This is a huge knock to the importance of S, which becomes even more apparent when we analyze the occurrence rate of each letter in the two different lists.
Occurrence rate of letters:
full_list answers_list
1. S 10.28% E 10.65%
2. E 10.27% A 8.46%
3. A 9.24% R 7.77%
4. O 6.84% O 6.51%
5. R 6.41% T 6.30%
6. I 5.80% L 6.21%
7. L 5.20% I 5.80%
8. T 5.08% S 5.78%
9. N 4.55% N 4.97%
10. U 3.87% C 4.12%
11. D 3.78% U 4.03%
12. Y 3.20% Y 3.67%
13. C 3.13% D 3.40%
14. P 3.11% H 3.36%
15. M 3.05% P 3.17%
16. H 2.71% M 2.73%
17. G 2.53% G 2.69%
18. B 2.51% B 2.43%
19. K 2.32% F 1.99%
20. F 1.72% K 1.81%
21. W 1.60% W 1.68%
22. V 1.07% V 1.32%
23. Z 0.67% Z 0.35%
24. J 0.45% X 0.32%
25. X 0.44% Q 0.25%
26. Q 0.17% J 0.23%
With that information in hand, we can inform follow-on guesses, both with the information that is gained from how our first guess is scored and then considering the next most common letters:
L 6.21%; I 5.80%; S 5.78%; N 4.97%; C 4.12%; U 4.03%; Y 3.67%; D 3.40%; H 3.36%; P 3.17%
There could be more game design decisions hidden in the answers, but I don’t want to actually read them and this is far enough for now!
I also created a visualization of Wordle emoji matches for various guesses against all past answers (through Jan 30, 2022):
ROATE
- #1- my top pick given the answers distributionAEROS
- #18 - the top pick only if the entire dictionary were usedTRAIN
- #576 - another common choiceSAINT
- #1464 - I saw this mentioned in another postFUFFY
- #12,966 - our lovable worst pick from the first analysisIMMIX
- #12,972 - a new worst guess after doing the answers-only analysis
Where to go next?
The current list of answers will keep us busy until October 25, 2027. You can go with my pick and run ROATE or maybe you found an insight that can lead to faster solutions. It also doesn’t really matter and is fun to try out different words to keep the game feeling fresh.
Then again, if it’s getting too easy and you want to add a little chaos to your gameplay, try this Wordle starting word randomizer.
Source code on GitHub.
Other projects:
- jliszka/wordle - a Wordle solver that selects each word with the highest information entropy
- feel free to add more in the comments!