08 August 2023

Inverting parse_qs in Python

urllib.parse photo

Incorrectly escaping characters is one of the more common errors I see developers run into. If you’re parsing, manipulating, or serializing URLs in code, you should try to use standard libraries that cover a lot of these nuances for you.

Python 3 has urllib.parse. That page covers really all you need to know about it, but sometimes one might think they can find an answer without reading the docs. For me, I had parsed a query string using parse_qs and was looking for the inverse function to turn such an object back into a string. Here’s the answer…

Inverting parse_qs

Parse a query string with parse_qs. This function expects no "?":

from urllib.parse import parse_qs

parse_qs('foo=bar&foo=baz&bing=bong')

# {'foo': ['bar', 'baz'], 'bing': ['bong']}

Wrong way: convert it back using urlencode (it serializes each array of values as a single value):

from urllib.parse import urlencode

urlencode({'foo': ['bar', 'baz'], 'bing': ['bong']})

# 'foo=%5B%27bar%27%2C+%27baz%27%5D&bing=%5B%27bong%27%5D'

Correct way: convert it back using urlencode with doseq=True:

from urllib.parse import urlencode

urlencode({'foo': ['bar', 'baz'], 'bing': ['bong']}, doseq=True)

# 'foo=bar&foo=baz&bing=bong'

Via help(urlencode):

urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=<function quote_plus>)

Encode a dict or sequence of two-element tuples into a URL query string.

If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.

Further reading

  1. urllib.parse.parse_qsl - this alternative to parse_qs makes a list of tuples instead of a dict, which means duplicate keys can safely exist at the top level. As a result, using urlencode with this format works as you’d expect regardless of whether you set doseq or not, since none of the values in the tuples are sequences.

  2. urllib.parse.urlparse - this is a nice way to parse an entire URL. The ParseResult object is immutable, but you can use _replace and geturl to create a new URL: parsed._replace(query=new_query).geturl()

  3. URLSearchParams - if you’re working with query strings in the browser, then this JavaScript interface is your friend and it has good browser support too.

  4. mrcoles.com/urlparse - an old blog post where you can paste in any URL and it will pretty-print the URL, query string, and hash separately.




Did you find this helpful or fun? paypal.me/mrcoles
comments powered by Disqus

Peter Coles

Peter Coles

is a software engineer living in NYC who is building Superset 💪 and also created GoFullPage 📸
more »

github · soundcloud · rss