Pyro

A dynamically-typed, garbage-collected scripting language.

Version 0.19.2

Strings



A string, str, is an immutable array of bytes.

$str(arg: any) -> str

Stringifies the argument — i.e. returns its default string representation. If the argument has a $str() method, the output of this method will be returned.

Note that calling $str() on an f64 prints its value to 6 decimal digits of precision, stripping trailing zeros after the decimal point.

Pyro strings have methods that let you manipulate them as ASCII or as UTF-8 but the string type itself is agnostic about its encoding — a string can contain any sequence of byte values including null bytes or invalid UTF-8.

String Literals

String literals come in two flavours — regular (double-quoted) and raw (backticked).

Regular string literals use double quotes:

var foo = "a string";

var bar = "a string
with multiple
linebreaks";

Regular string literals process the following backslashed escapes:

\\ backslash
\0 null byte
\" double quote
\' single quote
\$ dollar symbol
\b backspace
\e escape
\n newline
\r carriage return
\t tab
\x## 8-bit hex-encoded byte value
\u#### 16-bit hex-encoded unicode code point (output as UTF-8)
\U######## 32-bit hex-encoded unicode code point (output as UTF-8)

Raw string literals use backticks:

var foo = `a raw string`;

var bar = `a raw string
with multiple
linebreaks`;

Raw string literals ignore backslashed escapes. The only character a raw string literal can't contain is a backtick as this would end the string.

String Interpolation

You can interpolate the value of an expression into a double-quoted string using ${}, e.g.

var value = "xyz";
assert "abc ${value} def" == `abc xyz def`;
assert "abc ${value:to_upper()} def" == `abc XYZ def`;

You can interpolate the value of any expression into a string using ${}. If the value of the expression isn't a string, it will be automatically stringified — this is equivalent to calling $str() on the value, e.g.

var value = 123;
assert "abc ${value} def" == `abc 123 def`;
assert "abc ${value + 1} def" == `abc 124 def`;

You can backslash-escape a $ symbol in a double-quoted string to prevent it being treated as the opening of an interpolated expression, e.g.

var value = 123;
assert "abc \${value} def" == `abc ${value} def`;

Interpolated expressions can be nested arbitrarily — i.e. an interpolated expression can contain a double-quoted string containing an interpolated expression containing a double-quoted string containing an interpolated expression, etc.

You can format the value of an interpolated expression by supplying a format-specifier after a semicolon, e.g.

var value = 123;
assert "${value;05d}" == `00123`;

See the string formatting documentation for the syntax of format-specifiers.

Equality

Strings compare as equal using the == operator if they have the same content, e.g.

var foo = "foobar";
var bar = "foobar";
assert foo == bar;

Comparisons

You can compare strings using the comparison operators, <, <=, >, >=, e.g.

assert "abc" < "def";

Strings are compared lexicographically by byte value, e.g.

assert "a" < "aa";
assert "aa" < "aaa";

Concatenating

You can concatenate two strings using the + operator, e.g.

assert "abc" + "def" == "abcdef";

You can multiply a string by an integer n to produce a new string containing n copies of the original, e.g.

assert "foo" * 3 == "foofoofoo"

Iterating

A string is an immutable sequence of bytes. You can iterate over this sequence in three different ways.

  1. You can iterate over a string directly. This iterates over the individual byte values, returning each value as a single-byte string, e.g.

>>> for char in "foo" {
...     echo $debug(char);
... }
"f"
"o"
"o"
  1. You can iterate over the string's byte values as integers using the :bytes() method, e.g.

>>> for byte in "foo":bytes() {
...     echo $debug(byte);
... }
102
111
111
  1. You can iterate over the string's rune values, i.e. UTF-8 encoded Unicode code points, using the :runes() method, e.g.

>>> for rune in "foo":runes() {
...     echo $debug(rune);
... }
'f'
'o'
'o'

Indexing

You can index into a string to get (but not set) individual byte values. Each byte value is returned as a single-byte string, e.g.

assert "foobar"[0] == "f";
assert "foobar"[1] == "o";

Indices are zero-based. A negative index counts backwards from the end of the string, e.g.

assert "foobar"[-1] == "r";
assert "foobar"[-2] == "a";

Use the :byte() method to access individual byte values as integers, e.g.

assert "foobar":byte(0) == 102;
assert "foobar":byte(1) == 111;

Use the :rune() method to access individual UTF-8 encoded code points, e.g.

assert "foobar":rune(0) == 'f';
assert "foobar":rune(1) == 'o';

Containment

You can check if a string contains a substring using the in operator:

assert "foo" in "foobar";

You can also use the in operator to check if a string contains a rune:

assert 'b' in "foobar";

This is equivalent to calling the string's :contains() method.

Methods

:byte(index: i64) -> i64

Returns the byte value at index as an integer in the range [0, 255].

A negative index counts backwards from the end of the string.

:byte_count() -> i64

Returns the number of bytes in the string.

:bytes() -> iter[i64]

Returns an iterator over the string's individual byte values, returning each value as an integer.

:contains(target: str|rune) -> bool

Returns true if the string contains the substring or (UTF-8 encoded) rune target.

(Note that every string contains the empty string "" as the empty string is a valid substring of every string.)

:count() -> i64

Returns the number of bytes in the string. This method is an alias for :byte_count().

:ends_with(suffix: str) -> bool

Returns true if the string ends with the string suffix, otherwise false.

:index_of(target: str) -> i64|err
:index_of(target: str, start_index: i64) -> i64|err

Returns the byte index of the next matching instance of the string target. Starts searching at start_index, which defaults to 0 if not specified. Returns an err if target is not found.

:is_ascii() -> bool

Returns true if the string contains only byte values in the range [0, 127].

Returns false if the string is empty.

:is_ascii_alpha() -> bool

Returns true if the string contains only ASCII characters in the range [a-z] or [A-Z].

Returns false if the string is empty.

:is_ascii_decimal() -> bool

Returns true if the string contains only ASCII decimal digits, i.e. characters in the range [0-9].

Returns false if the string is empty.

:is_ascii_hex() -> bool

Returns true if the string contains only ASCII hexadecimal digits, i.e. characters in the range [0-9], [a-f], or [A-F].

Returns false if the string is empty.

:is_ascii_octal() -> bool

Returns true if the string contains only ASCII octal digits, i.e. characters in the range [0-7].

Returns false if the string is empty.

:is_ascii_printable() -> bool

Returns true if the string contains only printable ASCII characters, i.e. byte values in the range [32, 126].

Returns false if the string is empty.

:is_ascii_ws() -> bool

Returns true if the string contains only ASCII whitespace characters.

Returns false if the string is empty.

:is_empty() -> bool

Returns true if the string is empty, i.e. if its length is zero.

:is_utf8() -> bool

Returns true if the string contains only valid UTF-8. (This is a potentially expensive method as it needs to traverse the string.)

Returns false if the string is empty.

:is_utf8_ws() -> bool

Returns true if the string contains only UTF-8 encoded whitespace characters, as defined by the Unicode standard.

Returns false if the string is empty.

:iter() -> iter[str]

Returns an iterator over the string's individual byte values, returning each value as a single-byte string.

:join(items: iterable) -> str

Creates a new string by joining the stringified elements of the iterable argument using the receiver string as the separator. Elements are automatically stringified — this is equivalent to calling $str() on each element.

Returns an empty string if the iterator is empty or exhausted.

:lines() -> iter[str]

Returns an iterator over the string's lines. Recognised line breaks are \n and \r\n. Strips the line breaks from the returned strings.

If this method is called on an empty string, the iterator will be empty, i.e. will return zero elements.

:match(target: str, index: i64) -> bool

Returns true if the string target matches at byte index index.

:replace(old: str, new: str) -> str

Returns a new string with all instances of the string old replaced by the string new.

:rune(index: i64) -> rune

Returns the rune at index, where index is a zero-based offset into the sequence of UTF-8 encoded Unicode code points in the string.

Does not support negative indexing.

This is a potentially expensive method as it needs to seek forward from the beginning of the string.

This method will panic if it encounters invalid UTF-8.

:rune_count() -> i64

Returns the number of UTF-8 encoded Unicode code points in the string.

This is a potentially expensive method as it needs to traverse the entire string.

This method will panic if the string contains invalid UTF-8.

:runes() -> iter[rune]

Returns an iterator over the string's rune values, i.e. UTF-8 encoded Unicode code points.

This method will panic if the string contains invalid UTF-8.

:slice(start_index: i64) -> str
:slice(start_index: i64, length: i64) -> str

Copies a slice of the source string and returns it as a new string. The source string is left unchanged. The slice starts at byte index start_index and contains length bytes.

If start_index is negative, counts backwards from the end of the string — i.e. a start_index of -1 refers to to the last byte in the string.

If length is omitted, copies to the end of the source string.

Panics if either argument is out of range.

:split() -> vec[str]
:split(sep: str) -> vec[str]

Splits the string on instances of the delimiter string sep. Returns a vector of strings.

If no argument is specified, this method acts as an alias for :split_on_ascii_ws().

If this method is called on an empty string, it will return a vector containing a single empty string.

:split_on_ascii_ws() -> vec[str]

This method splits the string on contiguous sequences of ASCII whitespace characters. Leading and trailing whitespace is ignored. Returns a vector of strings.

If this method is called on an empty string, it will return a vector containing a single empty string.

:starts_with(prefix: str) -> bool

Returns true if the string starts with the string prefix, otherwise false.

:strip() -> str
:strip(bytes: str) -> str

When called with no arguments, this method returns the new string formed by stripping leading and trailing ASCII whitespace characters from the string. (In this case it functions as an alias for :strip_ascii_ws().)

When called with a single string argument, this method returns the new string formed by stripping any leading or trailing bytes that occur in bytes. (In this case it functions as an alias for :strip_bytes().)

:strip_ascii_ws() -> str

Returns the new string formed by stripping leading and trailing ASCII whitespace characters.

:strip_bytes(bytes: str) -> str

Returns the new string formed by stripping any leading or trailing bytes that occur in bytes.

:strip_runes(runes: str) -> str

Returns the new string formed by stripping any leading or trailing UTF-8 encoded codepoints that occur in runes.

:strip_prefix(prefix: str) -> str

Returns a new string with the leading string prefix stripped if present. (Only a single instance of prefix will be stripped.)

:strip_prefix_bytes(bytes: str) -> str

Returns the new string formed by stripping any leading bytes that occur in bytes.

:strip_prefix_runes(runes: str) -> str

Returns the new string formed by stripping any leading UTF-8 encoded codepoints that occur in runes.

:strip_suffix(suffix: str) -> str

Returns a new string with the trailing string suffix stripped if present. (Only a single instance of suffix will be stripped.)

:strip_suffix_bytes(bytes: str) -> str

Returns the new string formed by stripping any trailing bytes that occur in bytes.

:strip_suffix_runes(runes: str) -> str

Returns the new string formed by stripping any trailing UTF-8 encoded codepoints that occur in runes.

:strip_utf8_ws() -> str

Returns the new string formed by stripping leading and trailing UTF-8 encoded whitespace characters, as defined by the Unicode standard.

:to_ascii_lower() -> str

Returns a new string with all ASCII uppercase characters converted to lowercase.

:to_ascii_upper() -> str

Returns a new string with all ASCII lowercase characters converted to uppercase.

:to_hex() -> str

Returns a new string containing the hex-escaped byte values from the original string — e.g. `foo` becomes `\x66\x6F\x6F`. Useful for inspecting and debugging Unicode.