Mojo struct
StringSlice
@register_passable(trivial)
struct StringSlice[mut: Bool, //, origin: Origin[mut]]
A non-owning view to encoded string data.
This type is guaranteed to have the same ABI (size, alignment, and field
layout) as the llvm::StringRef
type.
Notes: TODO: The underlying string data is guaranteed to be encoded using UTF-8.
Parameters
- mut (
Bool
): Whether the slice is mutable. - origin (
Origin[mut]
): The origin of the underlying string data.
Implemented traits
AnyType
,
Boolable
,
CollectionElement
,
CollectionElementNew
,
Copyable
,
EqualityComparable
,
ExplicitlyCopyable
,
FloatableRaising
,
Hashable
,
IntableRaising
,
Movable
,
PathLike
,
Representable
,
Sized
,
Stringable
,
UnknownDestructibility
,
Writable
,
_CurlyEntryFormattable
Methods
__init__
@implicit
__init__(lit: StringLiteral) -> StringSlice[StaticConstantOrigin]
Construct a new StringSlice
from a StringLiteral
.
Args:
- lit (
StringLiteral
): The literal to construct thisStringSlice
from.
__init__(*, owned unsafe_from_utf8: Span[SIMD[uint8, 1], origin]) -> Self
Construct a new StringSlice
from a sequence of UTF-8 encoded bytes.
Safety:
unsafe_from_utf8
MUST be valid UTF-8 encoded data.
Args:
- unsafe_from_utf8 (
Span[SIMD[uint8, 1], origin]
): ASpan[Byte]
encoded in UTF-8.
__init__(*, unsafe_from_utf8_ptr: UnsafePointer[SIMD[uint8, 1]]) -> Self
Construct a new StringSlice from a UnsafePointer[Byte]
pointing to null-terminated UTF-8 encoded bytes.
Safety:
- unsafe_from_utf8_ptr
MUST point to data that is valid for
origin
.
- unsafe_from_utf8_ptr
MUST be valid UTF-8 encoded data.
- unsafe_from_utf8_ptr
MUST be null terminated.
Args:
- unsafe_from_utf8_ptr (
UnsafePointer[SIMD[uint8, 1]]
): AnUnsafePointer[Byte]
of null-terminated bytes encoded in UTF-8.
__init__(*, unsafe_from_utf8_cstr_ptr: UnsafePointer[SIMD[int8, 1]]) -> Self
Construct a new StringSlice from a UnsafePointer[c_char]
pointing to null-terminated UTF-8 encoded bytes.
Safety:
- unsafe_from_utf8_ptr
MUST point to data that is valid for
origin
.
- unsafe_from_utf8_ptr
MUST be valid UTF-8 encoded data.
- unsafe_from_utf8_ptr
MUST be null terminated.
Args:
- unsafe_from_utf8_cstr_ptr (
UnsafePointer[SIMD[int8, 1]]
): AnUnsafePointer[c_char]
of null-terminated bytes encoded in UTF-8.
__init__(*, ptr: UnsafePointer[SIMD[uint8, 1]], length: UInt) -> Self
Construct a StringSlice
from a pointer to a sequence of UTF-8 encoded bytes and a length.
Safety:
- ptr
MUST point to at least length
bytes of valid UTF-8 encoded
data.
- ptr
must point to data that is live for the duration of
origin
.
Args:
- ptr (
UnsafePointer[SIMD[uint8, 1]]
): A pointer to a sequence of bytes encoded in UTF-8. - length (
UInt
): The number of bytes of encoded data.
@implicit
__init__[O: ImmutableOrigin, //](ref [O] value: String) -> StringSlice[O]
Construct an immutable StringSlice.
Parameters:
- O (
ImmutableOrigin
): The immutable origin.
Args:
- value (
String
): The string value.
__bool__
__bool__(self) -> Bool
Check if a string slice is non-empty.
Returns:
True if a string slice is non-empty, False otherwise.
__getitem__
__getitem__(self, span: Slice) -> Self
Gets the sequence of characters at the specified positions.
Args:
- span (
Slice
): A slice that specifies positions of the new substring.
Returns:
A new StringSlice containing the substring at the specified positions.
__getitem__[I: Indexer](self, idx: I) -> String
Gets the character at the specified position.
Parameters:
- I (
Indexer
): A type that can be used as an index.
Args:
- idx (
I
): The index value.
Returns:
A new string containing the character at the specified position.
__lt__
__lt__(self, rhs: StringSlice[origin]) -> Bool
Verify if the StringSlice
bytes are strictly less than the input in overlapping content.
Args:
- rhs (
StringSlice[origin]
): The otherStringSlice
to compare against.
Returns:
If the StringSlice
bytes are strictly less than the input in overlapping content.
__eq__
__eq__(self, rhs_same: Self) -> Bool
Verify if a StringSlice
is equal to another StringSlice
with the same origin.
Args:
- rhs_same (
Self
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is equal to the input in length and contents.
__eq__(self, rhs: StringSlice[origin]) -> Bool
Verify if a StringSlice
is equal to another StringSlice
.
Args:
- rhs (
StringSlice[origin]
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is equal to the input in length and contents.
__ne__
__ne__(self, rhs_same: Self) -> Bool
Verify if a StringSlice
is not equal to another StringSlice
with the same origin.
Args:
- rhs_same (
Self
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is not equal to the input in length and contents.
__ne__(self, rhs: StringSlice[origin]) -> Bool
Verify if span is not equal to another StringSlice
.
Args:
- rhs (
StringSlice[origin]
): TheStringSlice
to compare against.
Returns:
If the StringSlice
is not equal to the input in length and contents.
__contains__
__contains__(ref self, substr: StringSlice[origin]) -> Bool
Returns True if the substring is contained within the current string.
Args:
- substr (
StringSlice[origin]
): The substring to check.
Returns:
True if the string contains the substring.
__mul__
__mul__(self, n: Int) -> String
Concatenates the string n
times.
Args:
- n (
Int
): The number of times to concatenate the string.
Returns:
The string concatenated n
times.
copy
copy(self) -> Self
Explicitly construct a deep copy of the provided StringSlice
.
Returns:
A copy of the value.
from_utf8
static from_utf8(from_utf8: Span[SIMD[uint8, 1], origin]) -> Self
Construct a new StringSlice
from a buffer containing UTF-8 encoded data.
Args:
- from_utf8 (
Span[SIMD[uint8, 1], origin]
): A span of bytes containing UTF-8 encoded data.
Returns:
A new validated StringSlice
pointing to the provided buffer.
Raises:
An exception is raised if the provided buffer byte values do not form valid UTF-8 encoded codepoints.
__str__
__str__(self) -> String
Convert this StringSlice to a String.
Notes: This will allocate a new string that copies the string contents from the provided string slice.
Returns:
A new String.
__repr__
__repr__(self) -> String
Return a Mojo-compatible representation of this string slice.
Returns:
Representation of this string slice as a Mojo string literal input form syntax.
__len__
__len__(self) -> Int
Get the string length in bytes.
This function returns the number of bytes in the underlying UTF-8 representation of the string.
To get the number of Unicode codepoints in a string, use
len(str.codepoints())
.
Examples
Query the length of a string, in bytes and Unicode codepoints:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(len(s), 21)
assert_equal(len(s.codepoints()), 7)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(len(s), 21)
assert_equal(len(s.codepoints()), 7)
Strings containing only ASCII characters have the same byte and Unicode codepoint length:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(len(s), 3)
assert_equal(len(s.codepoints()), 3)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(len(s), 3)
assert_equal(len(s.codepoints()), 3)
.
Returns:
The string length in bytes.
write_to
write_to[W: Writer](self, mut writer: W)
Formats this string slice to the provided Writer
.
Parameters:
- W (
Writer
): A type conforming to theWritable
trait.
Args:
- writer (
W
): The object to write to.
__hash__
__hash__(self) -> UInt
Hash the underlying buffer using builtin hash.
Returns:
A 64-bit hash value. This value is not suitable for cryptographic uses. Its intended usage is for data structures. See the hash
builtin documentation for more details.
__hash__[H: _Hasher](self, mut hasher: H)
Updates hasher with the underlying bytes.
Parameters:
- H (
_Hasher
): The hasher type.
Args:
- hasher (
H
): The hasher instance.
__fspath__
__fspath__(self) -> String
Return the file system path representation of this string.
Returns:
The file system path representation as a string.
__iter__
__iter__(self) -> CodepointSliceIter[origin]
Iterate over the string, returning immutable references.
Returns:
An iterator of references to the string elements.
__reversed__
__reversed__(self) -> CodepointSliceIter[origin, False]
Iterate backwards over the string, returning immutable references.
Returns:
A reversed iterator of references to the string elements.
__int__
__int__(self) -> Int
Parses the given string as a base-10 integer and returns that value. If the string cannot be parsed as an int, an error is raised.
Returns:
An integer value that represents the string, or otherwise raises.
__float__
__float__(self) -> SIMD[float64, 1]
Parses the string as a float point number and returns that value. If the string cannot be parsed as a float, an error is raised.
Returns:
A float value that represents the string, or otherwise raises.
split
split[sep_mut: Bool, sep_origin: Origin[$0], //](self, sep: StringSlice[sep_origin], maxsplit: Int = -1) -> List[String]
Split the string by a separator.
Examples:
# Splitting a space
_ = StringSlice("hello world").split(" ") # ["hello", "world"]
# Splitting adjacent separators
_ = StringSlice("hello,,world").split(",") # ["hello", "", "world"]
# Splitting with maxsplit
_ = StringSlice("1,2,3").split(",", 1) # ['1', '2,3']
# Splitting a space
_ = StringSlice("hello world").split(" ") # ["hello", "world"]
# Splitting adjacent separators
_ = StringSlice("hello,,world").split(",") # ["hello", "", "world"]
# Splitting with maxsplit
_ = StringSlice("1,2,3").split(",", 1) # ['1', '2,3']
.
Parameters:
- sep_mut (
Bool
): Mutability of thesep
string slice. - sep_origin (
Origin[$0]
): Origin of thesep
string slice.
Args:
- sep (
StringSlice[sep_origin]
): The string to split on. - maxsplit (
Int
): The maximum amount of items to split from String. Defaults to unlimited.
Returns:
A List of Strings containing the input split by the separator.
Raises:
If the separator is empty.
split(self, sep: NoneType = NoneType(None), maxsplit: Int = -1) -> List[StringSlice[origin]]
Split the string by every Whitespace separator.
Examples:
# Splitting an empty string or filled with whitespaces
_ = StringSlice(" ").split() # []
_ = StringSlice("").split() # []
# Splitting a string with leading, trailing, and middle whitespaces
_ = StringSlice(" hello world ").split() # ["hello", "world"]
# Splitting adjacent universal newlines:
_ = StringSlice(
"hello \t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029world"
).split() # ["hello", "world"]
# Splitting an empty string or filled with whitespaces
_ = StringSlice(" ").split() # []
_ = StringSlice("").split() # []
# Splitting a string with leading, trailing, and middle whitespaces
_ = StringSlice(" hello world ").split() # ["hello", "world"]
# Splitting adjacent universal newlines:
_ = StringSlice(
"hello \t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029world"
).split() # ["hello", "world"]
.
Args:
- sep (
NoneType
): None. - maxsplit (
Int
): The maximum amount of items to split from String. Defaults to unlimited.
Returns:
A List of Strings containing the input split by the separator.
strip
strip(self, chars: StringSlice[origin]) -> Self
Return a copy of the string with leading and trailing characters removed.
Examples:
print("himojohi".strip("hi")) # "mojo"
print("himojohi".strip("hi")) # "mojo"
.
Args:
- chars (
StringSlice[origin]
): A set of characters to be removed. Defaults to whitespace.
Returns:
A copy of the string with no leading or trailing characters.
strip(self) -> Self
Return a copy of the string with leading and trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e"
.
Examples:
print(" mojo ".strip()) # "mojo"
print(" mojo ".strip()) # "mojo"
.
Returns:
A copy of the string with no leading or trailing whitespaces.
rstrip
rstrip(self, chars: StringSlice[origin]) -> Self
Return a copy of the string with trailing characters removed.
Examples:
print("mojohi".strip("hi")) # "mojo"
print("mojohi".strip("hi")) # "mojo"
.
Args:
- chars (
StringSlice[origin]
): A set of characters to be removed. Defaults to whitespace.
Returns:
A copy of the string with no trailing characters.
rstrip(self) -> Self
Return a copy of the string with trailing whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e"
.
Examples:
print("mojo ".strip()) # "mojo"
print("mojo ".strip()) # "mojo"
.
Returns:
A copy of the string with no trailing whitespaces.
lstrip
lstrip(self, chars: StringSlice[origin]) -> Self
Return a copy of the string with leading characters removed.
Examples:
print("himojo".strip("hi")) # "mojo"
print("himojo".strip("hi")) # "mojo"
.
Args:
- chars (
StringSlice[origin]
): A set of characters to be removed. Defaults to whitespace.
Returns:
A copy of the string with no leading characters.
lstrip(self) -> Self
Return a copy of the string with leading whitespaces removed. This only takes ASCII whitespace into account: " \t\n\v\f\r\x1c\x1d\x1e"
.
Examples:
print(" mojo".strip()) # "mojo"
print(" mojo".strip()) # "mojo"
.
Returns:
A copy of the string with no leading whitespaces.
codepoints
codepoints(self) -> CodepointsIter[origin]
Returns an iterator over the Codepoint
s encoded in this string slice.
Examples
Print the characters in a string:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
var iter = s.codepoints()
assert_equal(iter.__next__(), Codepoint.ord("a"))
assert_equal(iter.__next__(), Codepoint.ord("b"))
assert_equal(iter.__next__(), Codepoint.ord("c"))
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
var iter = s.codepoints()
assert_equal(iter.__next__(), Codepoint.ord("a"))
assert_equal(iter.__next__(), Codepoint.ord("b"))
assert_equal(iter.__next__(), Codepoint.ord("c"))
assert_equal(iter.__has_next__(), False)
codepoints()
iterates over Unicode codepoints, and supports multibyte
codepoints:
from collections.string import StringSlice
from testing import assert_equal
# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)
var iter = s.codepoints()
assert_equal(iter.__next__(), Codepoint.ord("a"))
# U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)
from collections.string import StringSlice
from testing import assert_equal
# A visual character composed of a combining sequence of 2 codepoints.
var s = StringSlice("á")
assert_equal(s.byte_length(), 3)
var iter = s.codepoints()
assert_equal(iter.__next__(), Codepoint.ord("a"))
# U+0301 Combining Acute Accent
assert_equal(iter.__next__().to_u32(), 0x0301)
assert_equal(iter.__has_next__(), False)
.
Returns:
An iterator type that returns successive Codepoint
values stored in this string slice.
codepoint_slices
codepoint_slices(self) -> CodepointSliceIter[origin]
Iterate over the string, returning immutable references.
Returns:
An iterator of references to the string elements.
as_bytes
as_bytes(self) -> Span[SIMD[uint8, 1], origin]
Get the sequence of encoded bytes of the underlying string.
Returns:
A slice containing the underlying sequence of encoded bytes.
unsafe_ptr
unsafe_ptr(self) -> UnsafePointer[SIMD[uint8, 1], mut=mut, origin=origin]
Gets a pointer to the first element of this string slice.
Returns:
A pointer pointing at the first element of this string slice.
byte_length
byte_length(self) -> Int
Get the length of this string slice in bytes.
Returns:
The length of this string slice in bytes.
char_length
char_length(self) -> UInt
Returns the length in Unicode codepoints.
This returns the number of Codepoint
codepoint values encoded in the UTF-8
representation of this string.
Note: To get the length in bytes, use StringSlice.byte_length()
.
Examples
Query the length of a string, in bytes and Unicode codepoints:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("ನಮಸ್ಕಾರ")
assert_equal(s.char_length(), 7)
assert_equal(len(s), 21)
Strings containing only ASCII characters have the same byte and Unicode codepoint length:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("abc")
assert_equal(s.char_length(), 3)
assert_equal(len(s), 3)
The character length of a string with visual combining characters is the length in Unicode codepoints, not grapheme clusters:
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)
from collections.string import StringSlice
from testing import assert_equal
var s = StringSlice("á")
assert_equal(s.char_length(), 2)
assert_equal(s.byte_length(), 3)
.
Returns:
The length in Unicode codepoints.
is_codepoint_boundary
is_codepoint_boundary(self, index: UInt) -> Bool
Returns True if index
is the position of the first byte in a UTF-8 codepoint sequence, or is at the end of the string.
A byte position is considered a codepoint boundary if a valid subslice
of the string would end (noninclusive) at index
.
Positions 0
and len(self)
are considered to be codepoint boundaries.
Positions beyond the length of the string slice will return False.
Examples
Check if particular byte positions are codepoint boundaries:
from collections.string import StringSlice
from testing import assert_equal, assert_true
var abc = StringSlice("abc")
assert_equal(len(abc), 3)
assert_true(abc.is_codepoint_boundary(0))
assert_true(abc.is_codepoint_boundary(1))
assert_true(abc.is_codepoint_boundary(2))
assert_true(abc.is_codepoint_boundary(3))
from collections.string import StringSlice
from testing import assert_equal, assert_true
var abc = StringSlice("abc")
assert_equal(len(abc), 3)
assert_true(abc.is_codepoint_boundary(0))
assert_true(abc.is_codepoint_boundary(1))
assert_true(abc.is_codepoint_boundary(2))
assert_true(abc.is_codepoint_boundary(3))
Only the index of the first byte in a multi-byte codepoint sequence is considered a codepoint boundary:
var thumb = StringSlice("👍")
assert_equal(len(thumb), 4)
assert_true(thumb.is_codepoint_boundary(0))
assert_false(thumb.is_codepoint_boundary(1))
assert_false(thumb.is_codepoint_boundary(2))
assert_false(thumb.is_codepoint_boundary(3))
var thumb = StringSlice("👍")
assert_equal(len(thumb), 4)
assert_true(thumb.is_codepoint_boundary(0))
assert_false(thumb.is_codepoint_boundary(1))
assert_false(thumb.is_codepoint_boundary(2))
assert_false(thumb.is_codepoint_boundary(3))
Visualization showing which bytes are considered codepoint boundaries, within a piece of text that includes codepoints whose UTF-8 representation requires, respectively, 1, 2, 3, and 4-bytes. The codepoint boundary byte indices are indicated by a vertical arrow (↑).
For example, this diagram shows that a slice of bytes formed by the
half-open range starting at byte 3 and extending up to but not including
byte 6 ([3, 6)
) is a valid UTF-8 sequence.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ a©➇𝄞 ┃ String
┣━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┫
┃97┃ 169 ┃ 10119 ┃ 119070 ┃ Unicode Codepoints
┣━━╋━━━┳━━━╋━━━┳━━━┳━━━╋━━━┳━━━┳━━━┳━━━┫
┃97┃194┃169┃226┃158┃135┃240┃157┃132┃158┃ UTF-8 Bytes
┗━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┛
0 1 2 3 4 5 6 7 8 9 10
↑ ↑ ↑ ↑ ↑
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ a©➇𝄞 ┃ String
┣━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┫
┃97┃ 169 ┃ 10119 ┃ 119070 ┃ Unicode Codepoints
┣━━╋━━━┳━━━╋━━━┳━━━┳━━━╋━━━┳━━━┳━━━┳━━━┫
┃97┃194┃169┃226┃158┃135┃240┃157┃132┃158┃ UTF-8 Bytes
┗━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┻━━━┛
0 1 2 3 4 5 6 7 8 9 10
↑ ↑ ↑ ↑ ↑
The following program verifies the above diagram:
from collections.string import StringSlice
from testing import assert_true, assert_false
var text = StringSlice("a©➇𝄞")
assert_true(text.is_codepoint_boundary(0))
assert_true(text.is_codepoint_boundary(1))
assert_false(text.is_codepoint_boundary(2))
assert_true(text.is_codepoint_boundary(3))
assert_false(text.is_codepoint_boundary(4))
assert_false(text.is_codepoint_boundary(5))
assert_true(text.is_codepoint_boundary(6))
assert_false(text.is_codepoint_boundary(7))
assert_false(text.is_codepoint_boundary(8))
assert_false(text.is_codepoint_boundary(9))
assert_true(text.is_codepoint_boundary(10))
from collections.string import StringSlice
from testing import assert_true, assert_false
var text = StringSlice("a©➇𝄞")
assert_true(text.is_codepoint_boundary(0))
assert_true(text.is_codepoint_boundary(1))
assert_false(text.is_codepoint_boundary(2))
assert_true(text.is_codepoint_boundary(3))
assert_false(text.is_codepoint_boundary(4))
assert_false(text.is_codepoint_boundary(5))
assert_true(text.is_codepoint_boundary(6))
assert_false(text.is_codepoint_boundary(7))
assert_false(text.is_codepoint_boundary(8))
assert_false(text.is_codepoint_boundary(9))
assert_true(text.is_codepoint_boundary(10))
.
Args:
- index (
UInt
): An index into the underlying byte representation of the string.
Returns:
A boolean indicating if index
gives the position of the first byte in a UTF-8 codepoint sequence, or is at the end of the string.
get_immutable
get_immutable(self) -> StringSlice[(muttoimm origin._mlir_origin)]
Return an immutable version of this string slice.
Returns:
A string slice covering the same elements, but without mutability.
startswith
startswith(self, prefix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool
Verify if the StringSlice
starts with the specified prefix between start and end positions.
The start
and end
positions must be offsets given in bytes, and
must be codepoint boundaries.
Args:
- prefix (
StringSlice[origin]
): The prefix to check. - start (
Int
): The start offset in bytes from which to check. - end (
Int
): The end offset in bytes from which to check.
Returns:
True if the self[start:end]
is prefixed by the input prefix.
endswith
endswith(self, suffix: StringSlice[origin], start: Int = 0, end: Int = -1) -> Bool
Verify if the StringSlice
end with the specified suffix between start and end positions.
The start
and end
positions must be offsets given in bytes, and
must be codepoint boundaries.
Args:
- suffix (
StringSlice[origin]
): The suffix to check. - start (
Int
): The start offset in bytes from which to check. - end (
Int
): The end offset in bytes from which to check.
Returns:
True if the self[start:end]
is suffixed by the input suffix.
format
format[*Ts: _CurlyEntryFormattable](self, *args: *Ts) -> String
Format a template with *args
.
Examples:
# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world
# Manual indexing:
print("{0} {1} {0}".format("Mojo", 1.125)) # Mojo 1.125 Mojo
# Automatic indexing:
print("{} {}".format(True, "hello world")) # True hello world
.
Parameters:
- *Ts (
_CurlyEntryFormattable
): The types of substitution values that implementRepresentable
andStringable
(to be changed and made more flexible).
Args:
- *args (
*Ts
): The substitution values.
Returns:
The template with the given values substituted.
find
find(ref self, substr: StringSlice[origin], start: Int = 0) -> Int
Finds the offset in bytes of the first occurrence of substr
starting at start
. If not found, returns -1
.
Args:
- substr (
StringSlice[origin]
): The substring to find. - start (
Int
): The offset in bytes from which to find. Must be a codepoint boundary.
Returns:
The offset in bytes of substr
relative to the beginning of the string.
rfind
rfind(self, substr: StringSlice[origin], start: Int = 0) -> Int
Finds the offset in bytes of the last occurrence of substr
starting at start
. If not found, returns -1
.
Args:
- substr (
StringSlice[origin]
): The substring to find. - start (
Int
): The offset in bytes from which to find. Must be a valid codepoint boundary.
Returns:
The offset in bytes of substr
relative to the beginning of the string.
isspace
isspace(self) -> Bool
Determines whether every character in the given StringSlice is a python whitespace String. This corresponds to Python's universal separators: " \t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029"
.
Examples:
Check if a string contains only whitespace:
from collections.string import StringSlice
from testing import assert_true, assert_false
# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())
# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice(" ").isspace())
# Contains non-space characters
assert_false(StringSlice(" abc ").isspace())
from collections.string import StringSlice
from testing import assert_true, assert_false
# An empty string is not considered to contain only whitespace chars:
assert_false(StringSlice("").isspace())
# ASCII space characters
assert_true(StringSlice(" ").isspace())
assert_true(StringSlice(" ").isspace())
# Contains non-space characters
assert_false(StringSlice(" abc ").isspace())
.
Returns:
True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.
isnewline
isnewline[single_character: Bool = False](self) -> Bool
Determines whether every character in the given StringSlice is a python newline character. This corresponds to Python's universal newlines: "\r\n"
and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029"
.
Parameters:
- single_character (
Bool
): Whether to evaluate the stringslice as a single unicode character (avoids overhead when already iterating).
Returns:
True if the whole StringSlice is made up of whitespace characters listed above, otherwise False.
splitlines
splitlines[O: ImmutableOrigin, //](self: StringSlice[O], keepends: Bool = False) -> List[StringSlice[O]]
Split the string at line boundaries. This corresponds to Python's universal newlines: "\r\n"
and "\t\n\v\f\r\x1c\x1d\x1e\x85\u2028\u2029"
.
Parameters:
- O (
ImmutableOrigin
): The immutable origin.
Args:
- keepends (
Bool
): If True, line breaks are kept in the resulting strings.
Returns:
A List of Strings containing the input split by line boundaries.
count
count(self, substr: StringSlice[origin]) -> Int
Return the number of non-overlapping occurrences of substring substr
in the string.
If sub is empty, returns the number of empty strings between characters which is the length of the string plus one.
Args:
- substr (
StringSlice[origin]
): The substring to count.
Returns:
The number of occurrences of substr
.
is_ascii_digit
is_ascii_digit(self) -> Bool
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
Note that this currently only works with ASCII strings.
Returns:
True if all characters are digits and it's not empty else False.
isupper
isupper(self) -> Bool
Returns True if all cased characters in the string are uppercase and there is at least one cased character.
Returns:
True if all cased characters in the string are uppercase and there is at least one cased character, False otherwise.
islower
islower(self) -> Bool
Returns True if all cased characters in the string are lowercase and there is at least one cased character.
Returns:
True if all cased characters in the string are lowercase and there is at least one cased character, False otherwise.
lower
lower(self) -> String
Returns a copy of the string with all cased characters converted to lowercase.
Returns:
A new string where cased letters have been converted to lowercase.
upper
upper(self) -> String
Returns a copy of the string with all cased characters converted to uppercase.
Returns:
A new string where cased letters have been converted to uppercase.
is_ascii_printable
is_ascii_printable(self) -> Bool
Returns True if all characters in the string are ASCII printable.
Note that this currently only works with ASCII strings.
Returns:
True if all characters are printable else False.
rjust
rjust(self, width: Int, fillchar: StringLiteral = " ") -> String
Returns the string right justified in a string of specified width.
Args:
- width (
Int
): The width of the field containing the string. - fillchar (
StringLiteral
): Specifies the padding character.
Returns:
Returns right justified string, or self if width is not bigger than self length.
ljust
ljust(self, width: Int, fillchar: StringLiteral = " ") -> String
Returns the string left justified in a string of specified width.
Args:
- width (
Int
): The width of the field containing the string. - fillchar (
StringLiteral
): Specifies the padding character.
Returns:
Returns left justified string, or self if width is not bigger than self length.
center
center(self, width: Int, fillchar: StringLiteral = " ") -> String
Returns the string center justified in a string of specified width.
Args:
- width (
Int
): The width of the field containing the string. - fillchar (
StringLiteral
): Specifies the padding character.
Returns:
Returns center justified string, or self if width is not bigger than self length.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!