30 January 2022

What is the best Wordle starting word?

WORDLE

You might be sick of hearing about Wordle at this point, but here’s how I found the best starting word with a simple python script and “HTML view source”. We’ll also dive into game design decisions and some gray area analysis.

tl;dr - you might think AEROS, but after analyzing just the solution set of words, it’s ROATE (jump ahead if you dare!)

The research

For starters, I wanted to see how the game processes the guesses and if I could find the game’s dictionary. While viewing the network panel in the developer console, I kind of expected to see AJAX requests, but instead it’s a gorgeously minimal static website—mostly just an HTML doc and a single 177 kB JavaScript. I opened the JS file and scrolled through to see what I could find. While Wordle is described as a love story, it created a brief conflict in my own relationship, as my partner happened to see the current day’s answer on the screen and then another word after it, which was the following answer! Without looking too closely at the contents, I identified the answer list and the remaining dictionary words in the source code.

With the entire dictionary of 5-letter words at my disposal, I could see there are 12,972 words total and 2315 answers (6.33 years of answers). The next step was to analyze.

First step was to score a guess against an answer. Funny enough, while I was at the Recurse Center, I partook in and later ran a pair programming workshop with Alicia—one of the facilitators—where she had suggested we do a Mastermind prompt. We had our own example code (along with some fun slides) and since Wordle is basically Mastermind with words, I grabbed the evaluate_guess code and modified it to work with letters.

I had my script run through each word in the dictionary, score it against all others, and sort by score. The score is a 3-tuple:

  1. the sum of all green & yellow squares for each guess against the dictionary
  2. the sum of just greens
  3. the sum of just yellows

I’d love to hear other ideas on optimal scoring heuristics. Also, the code isn’t really optimized for performance and it’s an N^2 running time problem, but the script processes the whole dataset in < 6 minutes on my laptop. This is sufficient enough for something that needs to be run once, while any testing was done against a small sample of the data.

The results

Top Wordle starting words based on the full dictionary:

    word    Σ all   Σ green Σ yellow
1.  aeros   24791   8219    16572
2.  soare   24791   7138    17653
3.  arose   24791   4708    20083
4.  reais   24469   8330    16139
5.  raise   24469   5980    18489
6.  serai   24469   5745    18724
7.  arise   24469   4766    19703
8.  aesir   24469   4451    20018
9.  aloes   23996   8714    15282
10. lares   23994   10323   13671

The winner AEROS—plural noun: (Relating to) aircraft or aeronautics)—doesn’t appear in a lot of dictionaries, but it is in the Collins Scrabble Words list, which is used in the UK (of course the URL for the game ends in .co.uk).

Anagramming in at number two is SOARE—noun: (Animals) obsolete a young hawk—is another fringe word missing from Scrabble dictionaries, other than Collins.

Rounding out the mix of letters of AROSE—verb: past tense of arise—is a much more common word that was able to rise to the challenge. However, we can see the greens hit rate is *significantly° lower.

Also, honorable mention to the absolute worst-scoring word from the 12,972 words list: FUFFY.

Digging deeper

There are a number of articles/posts from the past couple weeks dissecting what your Wordle play style and starting word choice says about you.

Maybe this says something about me… but I think it’s more helpful to think less about me and more about what decisions the game’s author made while designing it. The answer list is a small subset of the entire dictionary, what decisions were made in selecting that list? Is it possible to answer these questions without cheating? This is kind of a gray area, so only read on below the screenshot of articles du jour if you’re OK with seeing a little analysis of the answers list.

search results for what does your wordle play style say about you

↑ Lol, this screenshot looks like an ad

Analysis of the answers list

Rerunning the script, but this time comparing every word in the dictionary (all valid guesses) against only the correct answers, we get a different result.

Top Wordle starting words when comparing the full dictionary to valid answers only:

    word    Σ all   Σ green Σ yellow
1.  roate   4142    1254    2888
2.  orate   4142    1178    2964
3.  oater   4142    986     3156
4.  realo   4123    874     3249
5.  taler   4117    1095    3022
6.  later   4117    1033    3084
7.  ratel   4117    994     3123
8.  artel   4117    993     3124
9.  alter   4117    983     3134
10. alert   4117    924     3193

The winner, ROATE—verb: to learn by repetition, also ROTE—is another Collins Scrabble Words list-only term.

These words aren’t guaranteed to be correct answers since they can be from anywhere in the dictionary, but—since they were only scored against valid answers—they’ll provide the greatest coverage on average for a first guess. Spoiler alert: “ROATE” is not in the answers list. So if you’re hoping for that fanciful correct first guess, then it’s not for you. It seems that more obscure words were left out in order to make the game more playable/enjoyable for people (our first insight into game design choices!)

Another game design insight is that “S” is conspicuously missing from all the top answers above. Without looking too closely, it appears all pluralizations of 4-letter words are excluded from the answer set. This is a huge knock to the importance of S, which becomes even more apparent when we analyze the occurrence rate of each letter in the two different lists.

Occurrence rate of letters:

    full_list   answers_list    
1.  S   10.28%  E   10.65%
2.  E   10.27%  A   8.46%
3.  A   9.24%   R   7.77%
4.  O   6.84%   O   6.51%
5.  R   6.41%   T   6.30%
6.  I   5.80%   L   6.21%
7.  L   5.20%   I   5.80%
8.  T   5.08%   S   5.78%
9.  N   4.55%   N   4.97%
10. U   3.87%   C   4.12%
11. D   3.78%   U   4.03%
12. Y   3.20%   Y   3.67%
13. C   3.13%   D   3.40%
14. P   3.11%   H   3.36%
15. M   3.05%   P   3.17%
16. H   2.71%   M   2.73%
17. G   2.53%   G   2.69%
18. B   2.51%   B   2.43%
19. K   2.32%   F   1.99%
20. F   1.72%   K   1.81%
21. W   1.60%   W   1.68%
22. V   1.07%   V   1.32%
23. Z   0.67%   Z   0.35%
24. J   0.45%   X   0.32%
25. X   0.44%   Q   0.25%
26. Q   0.17%   J   0.23%

With that information in hand, we can inform follow-on guesses, both with the information that is gained from how our first guess is scored and then considering the next most common letters:

L 6.21%; I 5.80%; S 5.78%; N 4.97%; C 4.12%; U 4.03%; Y 3.67%; D 3.40%; H 3.36%; P 3.17%

There could be more game design decisions hidden in the answers, but I don’t want to actually read them and this is far enough for now!

I also created a visualization of Wordle emoji matches for various guesses against all past answers (through Jan 30, 2022):

  1. ROATE - #1- my top pick given the answers distribution
  2. AEROS - #18 - the top pick only if the entire dictionary were used
  3. TRAIN - #576 - another common choice
  4. SAINT - #1464 - I saw this mentioned in another post
  5. FUFFY - #12,966 - our lovable worst pick from the first analysis
  6. IMMIX - #12,972 - a new worst guess after doing the answers-only analysis

Where to go next?

The current list of answers will keep us busy until October 25, 2027. You can go with my pick and run ROATE or maybe you found an insight that can lead to faster solutions. It also doesn’t really matter and is fun to try out different words to keep the game feeling fresh.

Then again, if it’s getting too easy and you want to add a little chaos to your gameplay, try this Wordle starting word randomizer.

Source code on GitHub.

Other projects:

  • jliszka/wordle - a Wordle solver that selects each word with the highest information entropy
  • feel free to add more in the comments!



Did you find this helpful or fun? paypal.me/mrcoles
comments powered by Disqus

Peter Coles

Peter Coles

is a software engineer living in NYC who is building Superset 💪 and also created GoFullPage 📸
more »

github · soundcloud · rss