Mojo struct
StringSlice
@register_passable(trivial)
struct StringSlice[mut: Bool, //, origin: Origin[mut]]
A non-owning view to encoded string data.
This type is guaranteed to have the same ABI (size, alignment, and field
layout) as the llvm::StringRef
type.
Notes: TODO: The underlying string data is guaranteed to be encoded using UTF-8.
Parameters
- mut (
Bool
): Whether the slice is mutable. - origin (
Origin[mut]
): The origin of the underlying string data.
Implemented traits
AnyType
,
Boolable
,
CollectionElement
,
CollectionElementNew
,
Copyable
,
EqualityComparable
,
ExplicitlyCopyable
,
FloatableRaising
,
Hashable
,
IntableRaising
,
Movable
,
PathLike
,
Representable
,
Sized
,
Stringable
,
UnknownDestructibility
,
Writable
,
_CurlyEntryFormattable
Methods
__init__
@implicit
__init__(lit: StringLiteral) -> StringSlice[StaticConstantOrigin]
Construct a new StringSlice
from a StringLiteral
.
Args:
- lit (
StringLiteral
): The literal to construct thisStringSlice
from.
__init__(*, owned unsafe_from_utf8: Span[SIMD[uint8, 1], origin]) -> Self
Construct a new StringSlice
from a sequence of UTF-8 encoded bytes.
Safety:
unsafe_from_utf8
MUST be valid UTF-8 encoded data.
Args:
- unsafe_from_utf8 (
Span[SIMD[uint8, 1], origin]
): ASpan[Byte]
encoded in UTF-8.
__init__(*, unsafe_from_utf8_strref: StringRef) -> Self
Construct a new StringSlice from a StringRef
pointing to UTF-8 encoded bytes.
Safety:
- unsafe_from_utf8_strref
MUST point to data that is valid for
origin
.
- unsafe_from_utf8_strref
MUST be valid UTF-8 encoded data.
Args:
- unsafe_from_utf8_strref (
StringRef
): AStringRef
of bytes encoded in UTF-8.
__init__(*, ptr: UnsafePointer[SIMD[uint8, 1]], length: UInt) -> Self
Construct a StringSlice
from a pointer to a sequence of UTF-8 encoded bytes and a length.
Safety:
- ptr
MUST point to at least length
bytes of valid UTF-8 encoded
data.
- ptr
must point to data that is live for the duration of
origin
.
Args:
- ptr (
UnsafePointer[SIMD[uint8, 1]]
): A pointer to a sequence of bytes encoded in UTF-8. - length (
UInt
): The number of bytes of encoded data.
@implicit
__init__[O: ImmutableOrigin, //](ref [O] value: String) -> StringSlice[O]
Construct an immutable StringSlice.
Parameters:
- O (
ImmutableOrigin
): The immutable origin.
Args:
- value (
String
): The string value.
__bool__
__bool__(self) -> Bool
Check if a string slice is non-empty.
Returns:
True if a string slice is non-empty, False otherwise.
__getitem__
__getitem__[I: Indexer](self, idx: I) -> String
Gets the character at the specified position.
Parameters:
- I (
Indexer
): A type that can be used as an index.
Args:
- idx (
I
): The index value.
Returns:
A new string containing the character at the specified position.
__lt__
__lt__(self, rhs: StringSlice[origin]) -> Bool
Verify if the StringSlice
bytes are strictly less than the input in overlapping content.
Args:
- rhs (
StringSlice[origin]
): The otherStringSlice
to compare against.
Returns:
If the StringSlice
bytes are strictly less than the input in overlapping content.
__eq__
__eq__(self, rhs_same: Self) -> Bool
Verify if a StringSlice
is equal to another StringSlice
with the same origin.
Args:
- rhs_same (
Self
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is equal to the input in length and contents.
__eq__(self, rhs: StringSlice[origin]) -> Bool
Verify if a StringSlice
is equal to another StringSlice
.
Args:
- rhs (
StringSlice[origin]
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is equal to the input in length and contents.
__ne__
__ne__(self, rhs_same: Self) -> Bool
Verify if a StringSlice
is not equal to another StringSlice
with the same origin.
Args:
- rhs_same (
Self
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is not equal to the input in length and contents.
__ne__(self, rhs: StringSlice[origin]) -> Bool
Verify if span is not equal to another StringSlice
.
Args:
- rhs (
StringSlice[origin]
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is not equal to the input in length and contents.
__contains__
__contains__(ref self, substr: StringSlice[origin]) -> Bool
Returns True if the substring is contained within the current string.
Args:
- substr (
StringSlice[origin]
): The substring to check.
Returns:
True if the string contains the substring.
__mul__
__mul__(self, n: Int) -> String
Concatenates the string n
times.
Args:
- n (
Int
): The number of times to concatenate the string.
Returns:
The string concatenated n
times.
copy
copy(self) -> Self
Explicitly construct a deep copy of the provided StringSlice
.
Returns:
A copy of the value.
from_utf8
static from_utf8(from_utf8: Span[SIMD[uint8, 1], origin]) -> Self
Construct a new StringSlice
from a buffer containing UTF-8 encoded data.
Args:
- from_utf8 (
Span[SIMD[uint8, 1], origin]
): A span of bytes containing UTF-8 encoded data.
Returns:
A new validated StringSlice
pointing to the provided buffer.
Raises:
An exception is raised if the provided buffer byte values do not form valid UTF-8 encoded codepoints.
__str__
__str__(self) -> String
Convert this StringSlice to a String.
Notes: This will allocate a new string that copies the string contents from the provided string slice.
Returns:
A new String.
__repr__
__repr__(self) -> String
Return a Mojo-compatible representation of this string slice.
Returns:
Representation of this string slice as a Mojo string literal input form syntax.
__len__
__len__(self) -> Int
Get the string length in bytes.
This function returns the number of bytes in the underlying UTF-8 representation of the string.
To get the number of Unicode codepoints in a string, use
len(str.chars())
.
Examples
Query the length of a string, in bytes and Unicode codepoints:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(len(s), 21)
assert_equal(len(s.chars()), 7)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(len(s), 21)
assert_equal(len(s.chars()), 7)
Strings containing only ASCII characters have the same byte and Unicode codepoint length:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(len(s), 3)
assert_equal(len(s.chars()), 3)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(len(s), 3)
assert_equal(len(s.chars()), 3)
.
Returns:
The string length in bytes.
write_to
write_to[W: Writer](self, mut writer: W)
Formats this string slice to the provided Writer
.
Parameters:
- W (
Writer
): A type conforming to theWritable
trait.
Args:
- writer (
W
): The object to write to.
__hash__
__hash__(self) -> UInt
Hash the underlying buffer using builtin hash.
Returns:
A 64-bit hash value. This value is not suitable for cryptographic uses. Its intended usage is for data structures. See the hash
builtin documentation for more details.
__fspath__
__fspath__(self) -> String
Return the file system path representation of this string.
Returns:
The file system path representation as a string.
__iter__
__iter__(self) -> _StringSliceIter[origin]
Iterate over the string, returning immutable references.
Returns:
An iterator of references to the string elements.
__reversed__
__reversed__(self) -> _StringSliceIter[origin, False]
Iterate backwards over the string, returning immutable references.
Returns:
A reversed iterator of references to the string elements.
__int__
__int__(self) -> Int
Parses the given string as a base-10 integer and returns that value. If the string cannot be parsed as an int, an error is raised.
Returns:
An integer value that represents the string, or otherwise raises.
__float__
__float__(self) -> SIMD[float64, 1]
Parses the string as a float point number and returns that value. If the string cannot be parsed as a float, an error is raised.
Returns:
A float value that represents the string, or otherwise raises.
strip
strip(self, chars: StringSlice[origin]) -> Self
Return a copy of the string with leading and trailing characters removed.
Examples:
print("himojohi".strip("hi")) # "mojo"
print("himojohi".strip("hi")) # "mojo"
.
Args:
- chars (
StringSlice[origin]
): A set of characters to be removed. Defaults to whitespace.
Returns:
A copy of the string with no leading or trailing characters.
strip(self) -> Self
Return a copy of the string with leading and trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e"
.
Examples:
print(" mojo ".strip()) # "mojo"
print(" mojo ".strip()) # "mojo"
.
Returns:
A copy of the string with no leading or trailing whitespaces.
rstrip
rstrip(self, chars: StringSlice[origin]) -> Self
Return a copy of the string with trailing characters removed.
Examples:
print("mojohi".strip("hi")) # "mojo"
print("mojohi".strip("hi")) # "mojo"
.
Args:
- chars (
StringSlice[origin]
): A set of characters to be removed. Defaults to whitespace.
Returns:
A copy of the string with no trailing characters.
rstrip(self) -> Self
Return a copy of the string with trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e"
.
Examples:
print("mojo ".strip()) # "mojo"
print("mojo ".strip()) # "mojo"
.
Returns:
A copy of the string with no trailing whitespaces.
lstrip
lstrip(self, chars: StringSlice[origin]) -> Self
Return a copy of the string with leading characters removed.
Examples:
print("himojo".strip("hi")) # "mojo"
print("himojo".strip("hi")) # "mojo"
.
Args:
- chars (
StringSlice[origin]
): A set of characters to be removed. Defaults to whitespace.
Returns:
A copy of the string with no leading characters.
lstrip(self) -> Self
Return a copy of the string with leading whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e"
.
Examples:
print(" mojo".strip()) # "mojo"
print(" mojo".strip()) # "mojo"
.
Returns:
A copy of the string with no leading whitespaces.
chars
chars(self) -> CharsIter[origin]
Returns an iterator over the Char
s encoded in this string slice.
Examples
Print the characters in a string:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
assert_equal(iter.__next__(), Char.ord("b"))
assert_equal(iter.__next__(), Char.ord("c"))
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
assert_equal(iter.__next__(), Char.ord("b"))
assert_equal(iter.__next__(), Char.ord("c"))
assert_equal(iter.__has_next__(), False)
chars()
iterates over Unicode codepoints, and supports multibyte
codepoints:
from collections.string import StringSlice
from testing import assert_equal
# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
# U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal
# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)
var iter = s.chars()
assert_equal(iter.__next__(), Char.ord("a"))
# U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)
.
Returns:
An iterator type that returns successive Char
values stored in this string slice.
char_slices
char_slices(self) -> _StringSliceIter[origin]
Iterate over the string, returning immutable references.
Returns:
An iterator of references to the string elements.
as_bytes
as_bytes(self) -> Span[SIMD[uint8, 1], origin]
Get the sequence of encoded bytes of the underlying string.
Returns:
A slice containing the underlying sequence of encoded bytes.
unsafe_ptr
unsafe_ptr(self) -> UnsafePointer[SIMD[uint8, 1], mut=mut, origin=origin]
Gets a pointer to the first element of this string slice.
Returns:
A pointer pointing at the first element of this string slice.
byte_length
byte_length(self) -> Int
Get the length of this string slice in bytes.
Returns:
The length of this string slice in bytes.
char_length
char_length(self) -> UInt
Returns the length in Unicode codepoints.
This returns the number of Char
codepoint values encoded in the UTF-8
representation of this string.
Note: To get the length in bytes, use StringSlice.byte_length()
.
Examples
Query the length of a string, in bytes and Unicode codepoints:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)
Strings containing only ASCII characters have the same byte and Unicode codepoint length:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)
The character length of a string with visual combining characters is the length in Unicode codepoints, not grapheme clusters:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)
.
Returns:
The length in Unicode codepoints.
get_immutable
get_immutable(self) -> StringSlice[(muttoimm origin._mlir_origin)]
Return an immutable version of this string slice.
Returns:
A string slice covering the same elements, but without mutability.
startswith
startswith(self, prefix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool
Verify if the StringSlice
starts with the specified prefix between start and end positions.
Args:
- prefix (
StringSlice[origin]
): The prefix to check. - start (
Int
): The start offset from which to check. - end (
Int
): The end offset from which to check.
Returns:
True if the self[start:end]
is prefixed by the input prefix.
endswith
endswith(self, suffix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool
Verify if the StringSlice
end with the specified suffix between start and end positions.
Args:
- suffix (
StringSlice[origin]
): The suffix to check. - start (
Int
): The start offset from which to check. - end (
Int
): The end offset from which to check.
Returns:
True if the self[start:end]
is suffixed by the input suffix.
format
format[*Ts: _CurlyEntryFormattable](self, *args: *Ts) -> String
Format a template with *args
.
Examples:
# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world
# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world
.
Parameters:
- *Ts (
_CurlyEntryFormattable
): The types of substitution values that implementRepresentable
andStringable
(to be changed and made more flexible).
Args:
- *args (
*Ts
): The substitution values.
Returns:
The template with the given values substituted.
find
find(ref self, substr: StringSlice[origin], start: Int = 0) -> Int
Finds the offset of the first occurrence of substr
starting at start
. If not found, returns -1
.
Args:
- substr (
StringSlice[origin]
): The substring to find. - start (
Int
): The offset from which to find.
Returns:
The offset of substr
relative to the beginning of the string.
rfind
rfind(self, substr: StringSlice[origin], start: Int = 0) -> Int
Finds the offset of the last occurrence of substr
starting at start
. If not found, returns -1
.
Args:
- substr (
StringSlice[origin]
): The substring to find. - start (
Int
): The offset from which to find.
Returns:
The offset of substr
relative to the beginning of the string.
isspace
isspace(self) -> Bool
Determines whether every character in the given StringSlice is a python whitespace String. This corresponds to Python's universal separators: " \t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029"
.
Examples:
Check if a string contains only whitespace:
from collections.string import StringSlice
from testing import assert_true, assert_false
# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())
# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice(" ").isspace())
# Contains non-space characters
assert_false(StringSlice(" abc ").isspace())
from collections.string import StringSlice
from testing import assert_true, assert_false
# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())
# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice(" ").isspace())
# Contains non-space characters
assert_false(StringSlice(" abc ").isspace())
.
Returns:
True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.
isnewline
isnewline[single_character: Bool = False](self) -> Bool
Determines whether every character in the given StringSlice is a python newline character. This corresponds to Python's universal newlines: "\r\n"
and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029"
.
Parameters:
- single_character (
Bool
): Whether to evaluate the stringslice as a single unicode character (avoids overhead when already iterating).
Returns:
True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.
splitlines
splitlines[O: ImmutableOrigin, //](self: StringSlice[O], keepends: Bool = False) -> List[StringSlice[O]]
Split the string at line boundaries. This corresponds to Python's universal newlines: "\r\n"
and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029"
.
Parameters:
- O (
ImmutableOrigin
): The immutable origin.
Args:
- keepends (
Bool
): If True, line breaks are kept in the resulting strings.
Returns:
A List of Strings containing the input split by line boundaries.
count
count(self, substr: StringSlice[origin]) -> Int
Return the number of non-overlapping occurrences of substring substr
in the string.
If sub is empty, returns the number of empty strings between characters which is the length of the string plus one.
Args:
- substr (
StringSlice[origin]
): The substring to count.
Returns:
The number of occurrences of substr
.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!