io — 處理資料串流的核心工具

原始碼:Lib/io.py


總覽

io 模組替 Python 提供處理各種類型 IO 的主要工具。有三種主要的 IO 類型: 文字 I/O (text I/O)二進位 I/O (binary I/O) 以及原始 I/O (raw I/O)。這些均為泛用 (generic) 類型,且每種類型都可以使用各式後端儲存 (backing store)。任一種屬於這些類型的具體物件稱為 file object。其它常見的名詞還有資料串流 (stream) 以及類檔案物件 (file-like objects)

無論其類型為何,每個具體的資料串流物件也將具有各種能力:唯讀的、只接受寫入的、或者讀寫兼具的。它還允許任意的隨機存取(向前或向後尋找至任意位置),或者只能依順序存取(例如 socket 或 pipe 的情形下)。

所有的資料串流都會謹慎處理你所提供的資料的型別。舉例來說,提供一個 str 物件給二進位資料串流的 write() 方法將會引發 TypeError。同樣地,若提供一個 bytes 物件給文字資料串流的 write() 方法,也會引發同樣的錯誤。

在 3.3 版的變更: 原本會引發 IOError 的操作,現在將改成引發 OSError。因為 IOError 現在是 OSError 的別名。

文字 I/O

文字 I/O 要求和產出 str 物件。這意味著每當後端儲存為原生 bytes 時(例如在檔案的情形下),資料的編碼與解碼會以清楚易懂的方式進行,也可選擇同時轉換特定於平台的換行字元。

建立文字資料串流最簡單的方法是使用 open(),可選擇性地指定編碼:

f = open("myfile.txt", "r", encoding="utf-8")

記憶體內的文字資料串流也可以使用 StringIO 物件建立:

f = io.StringIO("some initial text data")

文字資料串流 API 的詳細說明在 TextIOBase 文件當中。

二進位 (Binary) I/O

二進位 I/O(也稱為緩衝 I/O (buffered I/O))要求的是類位元組物件 (bytes-like objects) 且產生 bytes 物件。不進行編碼、解碼或者換行字元轉換。這種類型的資料串流可用於各種非文字資料,以及需要手動控制對文字資料的處理時。

建立二進位資料串流最簡單的方法是使用 open(),並在 mode 字串中加入 'b'

f = open("myfile.jpg", "rb")

記憶體內的二進位資料串流也可以透過 BytesIO 物件來建立:

f = io.BytesIO(b"some initial binary data: \x00\x01")

二進位資料串流 API 的詳細說明在 BufferedIOBase 文件當中。

其它函式庫模組可能提供額外的方法來建立文字或二進位資料串流。例如 socket.socket.makefile()

原始 (Raw) I/O

原始 I/O(也稱為無緩衝 I/O (unbuffered I/O))通常作為二進位以及文字資料串流的低階 building-block 使用;在使用者程式碼中直接操作原始資料串流很少有用。然而,你可以透過以無緩衝的二進位模式開啟一個檔案來建立一個原始資料串流:

f = open("myfile.jpg", "rb", buffering=0)

原始串流 API 在 RawIOBase 文件中有詳細描述。

文字編碼

TextIOWrapperopen() 預設編碼是根據區域設定的 (locale-specific) (locale.getencoding())。

然而,許多開發人員在開啟以 UTF-8 編碼的文字檔案(例如:JSON、TOML、Markdown等)時忘記指定編碼,因為多數 Unix 平台預設使用 UTF-8 區域設定。這會導致錯誤,因為對於大多數 Windows 使用者來說,預設地區編碼並非 UTF-8。舉例來說:

# May not work on Windows when non-ASCII characters in the file.
with open("README.md") as f:
    long_description = f.read()

因此,強烈建議在開啟文字檔案時,明確指定編碼。若你想使用 UTF-8 編碼,請傳入 encoding="utf-8"。若想使用目前的地區編碼,Python 3.10 以後的版本支援使用 encoding="locale"

也參考

Python UTF-8 模式

在 Python UTF-8 模式下,可以將預設編碼從特定地區編碼改為 UTF-8。

PEP 686

Python 3.15 將預設使用 Python UTF-8 模式

選擇性加入的編碼警告

Added in version 3.10: 更多資訊請見 PEP 597

要找出哪些地方使用到預設的地區編碼,你可以啟用 -X warn_default_encoding 命令列選項,或者設定環境變數 PYTHONWARNDEFAULTENCODING。當使用到預設編碼時,會引發 EncodingWarning

如果你正在提供一個使用 open()TextIOWrapper 且傳遞 encoding=None 作為參數的 API,你可以使用 text_encoding()。如此一來如果 API 的呼叫方沒有傳遞 encoding,呼叫方就會發出一個 EncodingWarning。然而,對於新的 API,請考慮預設使用 UTF-8(即 encoding="utf-8")。

高階模組介面

io.DEFAULT_BUFFER_SIZE

一個包含模組中緩衝 I/O 類別所使用的預設緩衝區大小的整數。若可能的話,open() 會使用檔案的 blksize (透過 os.stat() 取得)。

io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

這是內建函式 open() 的別名。

引發一個附帶引數 pathmodeflags稽核事件 open

io.open_code(path)

'rb' 模式開啟提供的檔案。此函式應用於意圖將內容視為可執行的程式碼的情況下。

path 應該要屬於 str 類別,且是個絕對路徑。

這個函式的行為可能會被之前對 PyFile_SetOpenCodeHook() 的呼叫覆寫。然而,假設 path 是個 str 且為絕對路徑,則 open_code(path) 總是與 open(path, 'rb') 有相同行為。覆寫這個行為是為了對檔案進行額外驗證或預處理。

Added in version 3.8.

io.text_encoding(encoding, stacklevel=2, /)

這是個輔助函數,適用於使用 open()TextIOWrapper 且具有 encoding=None 參數的可呼叫物件。

encoding 不為 None,此函式將回傳 encoding。否則,將根據 UTF-8 Mode 回傳 "locale""utf-8"

sys.flags.warn_default_encoding 為真,且 encodingNone,此函式會發出一個 EncodingWarningstacklevel 指定警告在哪層發出。範例:

def read_text(path, encoding=None):
    encoding = io.text_encoding(encoding)  # stacklevel=2
    with open(path, encoding) as f:
        return f.read()

在此範例中,對於 read_text() 的呼叫方會引發一個 EncodingWarning

更多資訊請見 文字編碼

Added in version 3.10.

在 3.11 版的變更: 當 UTF-8 模式啟用且 encodingNone 時,text_encoding() 會回傳 "utf-8"。

exception io.BlockingIOError

這是內建的 BlockingIOError 例外的相容性別名。

exception io.UnsupportedOperation

當在資料串流上呼叫不支援的操作時,會引發繼承自 OSErrorValueError 的例外。

也參考

sys

包含標準的 IO 資料串流:sys.stdinsys.stdout 以及 sys.stderr

類別階層

I/O 串流的實作是由多個類別組合成的階層結構所構成。首先是 abstract base classes (抽象基底類別,ABCs),它們被用來規範各種不同類型的串流,接著具體類別會提供標準串流的實作。

備註

為了協助具體串流類別的實作,抽象基底類別提供了某些方法的預設實作。舉例來說,BufferedIOBase 提供未經最佳化的 readinto()readline() 實作。

I/O 階層結構的最上層是抽象基底類別 IOBase。它定義了串流的基礎的介面。然而,請注意,讀取串流與寫入串流之間並沒有分離;若不支援給定的操作,實作是允許引發 UnsupportedOperation 例外的。

抽象基底類別 RawIOBase 繼承 IOBase。此類別處理對串流的位元組讀寫。FileIO 則繼承 RawIOBase 來提供一個介面以存取機器檔案系統內的檔案。

抽象基底類別 BufferedIOBase 繼承 IOBase。此類別緩衝原始二進位串流 (RawIOBase)。它的子類別 BufferedWriterBufferedReaderBufferedRWPair 分別緩衝可寫、可讀、可讀也可寫的的原始二進位串流。類別 BufferedRandom 則提供一個對可搜尋串流 (seekable stream) 的緩衝介面。另一個類別 BufferedIOBase 的子類別 BytesIO,是一個記憶體內位元組串流。

抽象基底類別 TextIOBase 繼承 IOBase。此類別處理文本位元組串流,並處理字串的編碼和解碼。類別 TextIOWrapper 繼承自 TextIOBase,這是個對緩衝原始串流 (BufferedIOBase) 的緩衝文本介面。最後,StringIO 是個文字記憶體內串流。

引數名稱不是規範的一部份,只有 open() 的引數將作為關鍵字引數。

以下表格總結了 io 模組提供的抽象基底類別 (ABC):

抽象基底類別 (ABC)

繼承

Stub 方法

Mixin 方法與屬性

IOBase

filenoseektruncate

closeclosed__enter____exit__flushisatty__iter____next__readablereadlinereadlinesseekabletellwritablewritelines

RawIOBase

IOBase

readintowrite

繼承自 IOBase 的方法,readreadall

BufferedIOBase

IOBase

detachreadread1write

繼承自 IOBase 的方法,readintoreadinto1

TextIOBase

IOBase

detachreadreadlinewrite

繼承自 IOBase 的方法,encodingerrorsnewlines

I/O 基礎類別

class io.IOBase

所有 I/O 類別的抽象基礎類別。

為許多方法提供了空的抽象實作,衍生類別可以選擇性地覆寫這些方法;預設的實作代表一個無法讀取、寫入或搜尋的檔案。

即使 IOBase 因為實作的簽名差異巨大而沒有宣告 read()write() 方法,實作與用戶端應把這些方法視為介面的一部份。此外,當呼叫不被它們支援的操作時,可能會引發 ValueError (或 UnsupportedOperation)例外。

The basic type used for binary data read from or written to a file is bytes. Other bytes-like objects are accepted as method arguments too. Text I/O classes work with str data.

請注意,在一個已經關閉的串流上呼叫任何方法(即使只是查詢)都是未定義的。在這種情況下,實作可能會引發 ValueError 例外。

IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings). See readline() below.

IOBase 也是個情境管理器,因此支援 with 陳述式。在這個例子中,file 會在 with 陳述式執行完畢後關閉——即使發生了異常。

with open('spam.txt', 'w') as file:
    file.write('Spam and eggs!')

IOBase 提供這些資料屬性與方法:

close()

清除並關閉這個串流。若檔案已經關閉,則此方法沒有作用。一旦檔案被關閉,任何對檔案的操作(例如讀取或寫入)將引發 ValueError 異常。

為了方便起見,允許多次呼叫這個方法;然而,只有第一次呼叫會有效果。

closed

如果串流已關閉,則為 True

fileno()

如果串流存在,則回傳其底層的檔案描述器(一個整數)。如果 IO 物件不使用檔案描述器,則會引發一個 OSError 例外。

flush()

如果適用,清空串流的寫入緩衝區。對於唯讀和非阻塞串流,此操作不會執行任何操作。

isatty()

如果串流是互動式的(即連接到終端機/tty 設備),則回傳 True

readable()

如果串流可以被讀取,則回傳 True。如果是 Falseread() 將會引發 OSError 例外。

readline(size=-1, /)

從串流讀取並回傳一行。如果指定了 size,則最多讀取 size 個位元組。

對於二進位檔案,行結束符總是 b'\n';對於文字檔案,可以使用 open() 函式的 newline 引數來選擇識別的行結束符號。

readlines(hint=-1, /)

從串流讀取並回傳一個含有一或多行的 list。可以指定 hint 來控制讀取的行數:如果到目前為止所有行的總大小(以位元組/字元計)超過 hint,則不會再讀取更多行。

hint 值為 0 或更小,以及 None,都被視為沒有提供 hint。

請注意,已經可以使用 for line in file: ... 在檔案物件上進行疊代,而不一定需要呼叫 file.readlines()

seek(offset, whence=os.SEEK_SET, /)

將串流位置改變到給定的位元組 offset,此位置是相對於由 whence 指示的位置解釋的,並回傳新的絕對位置。whence 的值可為:

  • os.SEEK_SET0 -- 串流的起點(預設值);offset 應為零或正數

  • os.SEEK_CUR1 -- 目前串流位置;offset 可以是負數

  • os.SEEK_END2 -- 串流的結尾;offset 通常是負數

Added in version 3.1: SEEK_* 常數。

Added in version 3.3: 某些作業系統可以支援額外的值,例如 os.SEEK_HOLEos.SEEK_DATA。檔案的合法值取決於它是以文字模式還是二進位模式開啟。

seekable()

如果串流支援隨機存取,則回傳 True。如果是 False,則 seek()tell()truncate() 會引發 OSError

tell()

回傳目前串流的位置。

truncate(size=None, /)

將串流的大小調整為指定的 size 位元組(如果沒有指定 size,則調整為目前位置)。目前串流位置不會改變。這種調整可以擴展或縮減當前檔案大小。在擴展的情況下,新檔案區域的內容取決於平台(在大多數系統上,額外的位元組會被填充為零)。回傳新的檔案大小。

在 3.5 版的變更: Windows 現在在擴展時會對檔案進行零填充 (zero-fill)。

writable()

如果串流支援寫入,則回傳 True。如果是 Falsewrite()truncate() 將會引發 OSError

writelines(lines, /)

將一個包含每一行的 list 寫入串流。這不會新增行分隔符號,因此通常提供的每一行末尾都有一個行分隔符號。

__del__()

為物件銷毀做準備。IOBase 提供了這個方法的預設實作,該實作會呼叫實例的 close() 方法。

class io.RawIOBase

原始二進位串流的基底類別。它繼承自 IOBase

原始二進位串流通常提供對底層作業系統設備或 API 的低階存取,並不嘗試將其封裝在高階基元 (primitive) 中(這項功能在緩衝二進位串流和文字串流中的更高階層級完成,後面的頁面會有描述)。

RawIOBase 除了 IOBase 的方法外,還提供以下這些方法:

read(size=-1, /)

從物件中讀取最多 size 個位元組並回傳。方便起見,如果 size 未指定或為 -1,則回傳直到檔案結尾 (EOF) 的所有位元組。否則,只會進行一次系統呼叫。如果作業系統呼叫回傳的位元組少於 size,則可能回傳少於 size 的位元組。

如果回傳了 0 位元組,且 size 不是 0,這表示檔案結尾 (end of file)。如果物件處於非阻塞模式且沒有可用的位元組,則回傳 None

預設的實作會遵守 readall()readinto() 的實作。

readall()

讀取並回傳串流中直到檔案結尾的所有位元組,必要時使用多次對串流的呼叫。

readinto(b, /)

將位元組讀入一個預先分配的、可寫的 bytes-like object (類位元組物件) b 中,並回傳讀取的位元組數量。例如,b 可能是一個 bytearray。如果物件處於非阻塞模式且沒有可用的位元組,則回傳 None

write(b, /)

將給定的 bytes-like object (類位元組物件),b,寫入底層的原始串流,並回傳寫入的位元組大小。根據底層原始串流的具體情況,這可能少於 b 的位元組長度,尤其是當它處於非阻塞模式時。如果原始串流設置為非阻塞且無法立即寫入任何單一位元組,則回傳 None。呼叫者在此方法回傳後可以釋放或變更 b,因此實作應該只在方法呼叫期間存取 b

class io.BufferedIOBase

支援某種緩衝的二進位串流的基底類別。它繼承自 IOBase

RawIOBase 的主要差異在於,read()readinto()write() 方法將分別嘗試讀取所請求的盡可能多的輸入,或消耗所有給定的輸出,即使可能需要進行多於一次的系統呼叫。

此外,如果底層的原始串流處於非阻塞模式且無法提供或接收足夠的資料,這些方法可能會引發 BlockingIOError 例外;與 RawIOBase 不同之處在於,它們永遠不會回傳 None

此外,read() 方法不存在一個遵從 readinto() 的預設實作。

一個典型的 BufferedIOBase 實作不應該繼承自一個 RawIOBase 的實作,而是應該改用包裝的方式,像 BufferedWriterBufferedReader 那樣的作法。

BufferedIOBase 除了提供或覆寫來自 IOBase 的資料屬性和方法以外,還包含了這些:

raw

底層的原始串流(一個 RawIOBase 實例),BufferedIOBase 處理的對象。這不是 BufferedIOBase API 的一部分,且在某些實作可能不存在。

detach()

將底層的原始串流從緩衝區中分離出來,並回傳它。

在原始串流被分離後,緩衝區處於一個不可用的狀態。

某些緩衝區,如 BytesIO,沒有單一原始串流的概念可從此方法回傳。它們會引發 UnsupportedOperation

Added in version 3.1.

read(size=-1, /)

讀取並回傳最多 size 個位元組。如果引數被省略、為 None 或為負值,將讀取並回傳資料直到達到 EOF 為止。如果串流已經處於 EOF,則回傳一個空的 bytes 物件。

如果引數為正數,且底層原始串流不是互動式的,可能會發出多次原始讀取來滿足位元組數量(除非首先達到 EOF)。但對於互動式原始串流,最多只會發出一次原始讀取,且短少的資料不表示 EOF 即將到來。

如果底層原始串流處於非阻塞模式,且當前沒有可用資料,則會引發 BlockingIOError

read1(size=-1, /)

讀取並回傳最多 size 個位元組,最多呼叫一次底層原始串流的 read() (或 readinto()) 方法。如果你正在 BufferedIOBase 物件之上實作自己的緩衝區,這可能會很有用。

如果 size-1 (預設值),則會回傳任意數量的位元組(除非達到 EOF,否則會超過零)。

readinto(b, /)

讀取位元組到一個預先分配的、可寫的 bytes-like object b 當中,並回傳讀取的位元組數量。例如,b 可能是一個 bytearray

類似於 read(),除非後者是互動式的,否則可能會對底層原始串流發出多次讀取。

如果底層原始串流處於非阻塞模式,且當前沒有可用資料,則會引發 BlockingIOError

readinto1(b, /)

讀取位元組到一個預先分配的、可寫的 bytes-like object b 中,最多呼叫一次底層原始串流的 read() (或 readinto())方法。此方法回傳讀取的位元組數量。

如果底層原始串流處於非阻塞模式,且當前沒有可用資料,則會引發 BlockingIOError

Added in version 3.5.

write(b, /)

寫入給定的 bytes-like objectb,並回傳寫入的位元組數量(總是等於 b 的長度,以位元組計,因為如果寫入失敗將會引發 OSError)。根據實際的實作,這些位元組可能會立即寫入底層串流,或出於性能和延遲的緣故而被留在緩衝區當中。

當處於非阻塞模式時,如果需要將資料寫入原始串流,但它無法接受所有資料而不阻塞,則會引發 BlockingIOError

呼叫者可以在此方法回傳後釋放或變更 b,因此實作應該僅在方法呼叫期間存取 b

原始檔案 I/O

class io.FileIO(name, mode='r', closefd=True, opener=None)

一個代表包含位元組資料的 OS 層級檔案的原始二進制串流。它繼承自 RawIOBase

name 可以是兩種事物之一:

  • 代表將要打開的檔案路徑的一個字元串或 bytes 物件。在這種情況下,closefd 必須是 True (預設值),否則將引發錯誤。

  • an integer representing the number of an existing OS-level file descriptor to which the resulting FileIO object will give access. When the FileIO object is closed this fd will be closed as well, unless closefd is set to False.

The mode can be 'r', 'w', 'x' or 'a' for reading (default), writing, exclusive creation or appending. The file will be created if it doesn't exist when opened for writing or appending; it will be truncated when opened for writing. FileExistsError will be raised if it already exists when opened for creating. Opening a file for creating implies writing, so this mode behaves in a similar way to 'w'. Add a '+' to the mode to allow simultaneous reading and writing.

The read() (when called with a positive argument), readinto() and write() methods on this class will only make one system call.

A custom opener can be used by passing a callable as opener. The underlying file descriptor for the file object is then obtained by calling opener with (name, flags). opener must return an open file descriptor (passing os.open as opener results in functionality similar to passing None).

The newly created file is non-inheritable.

See the open() built-in function for examples on using the opener parameter.

在 3.3 版的變更: The opener parameter was added. The 'x' mode was added.

在 3.4 版的變更: The file is now non-inheritable.

FileIO provides these data attributes in addition to those from RawIOBase and IOBase:

mode

The mode as given in the constructor.

name

The file name. This is the file descriptor of the file when no name is given in the constructor.

Buffered Streams

Buffered I/O streams provide a higher-level interface to an I/O device than raw I/O does.

class io.BytesIO(initial_bytes=b'')

A binary stream using an in-memory bytes buffer. It inherits from BufferedIOBase. The buffer is discarded when the close() method is called.

The optional argument initial_bytes is a bytes-like object that contains initial data.

BytesIO provides or overrides these methods in addition to those from BufferedIOBase and IOBase:

getbuffer()

Return a readable and writable view over the contents of the buffer without copying them. Also, mutating the view will transparently update the contents of the buffer:

>>> b = io.BytesIO(b"abcdef")
>>> view = b.getbuffer()
>>> view[2:4] = b"56"
>>> b.getvalue()
b'ab56ef'

備註

As long as the view exists, the BytesIO object cannot be resized or closed.

Added in version 3.2.

getvalue()

Return bytes containing the entire contents of the buffer.

read1(size=-1, /)

In BytesIO, this is the same as read().

在 3.7 版的變更: The size argument is now optional.

readinto1(b, /)

In BytesIO, this is the same as readinto().

Added in version 3.5.

class io.BufferedReader(raw, buffer_size=DEFAULT_BUFFER_SIZE)

A buffered binary stream providing higher-level access to a readable, non seekable RawIOBase raw binary stream. It inherits from BufferedIOBase.

When reading data from this object, a larger amount of data may be requested from the underlying raw stream, and kept in an internal buffer. The buffered data can then be returned directly on subsequent reads.

The constructor creates a BufferedReader for the given readable raw stream and buffer_size. If buffer_size is omitted, DEFAULT_BUFFER_SIZE is used.

BufferedReader provides or overrides these methods in addition to those from BufferedIOBase and IOBase:

peek(size=0, /)

Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested.

read(size=-1, /)

Read and return size bytes, or if size is not given or negative, until EOF or if the read call would block in non-blocking mode.

read1(size=-1, /)

Read and return up to size bytes with only one call on the raw stream. If at least one byte is buffered, only buffered bytes are returned. Otherwise, one raw stream read call is made.

在 3.7 版的變更: The size argument is now optional.

class io.BufferedWriter(raw, buffer_size=DEFAULT_BUFFER_SIZE)

A buffered binary stream providing higher-level access to a writeable, non seekable RawIOBase raw binary stream. It inherits from BufferedIOBase.

When writing to this object, data is normally placed into an internal buffer. The buffer will be written out to the underlying RawIOBase object under various conditions, including:

The constructor creates a BufferedWriter for the given writeable raw stream. If the buffer_size is not given, it defaults to DEFAULT_BUFFER_SIZE.

BufferedWriter provides or overrides these methods in addition to those from BufferedIOBase and IOBase:

flush()

Force bytes held in the buffer into the raw stream. A BlockingIOError should be raised if the raw stream blocks.

write(b, /)

Write the bytes-like object, b, and return the number of bytes written. When in non-blocking mode, a BlockingIOError is raised if the buffer needs to be written out but the raw stream blocks.

class io.BufferedRandom(raw, buffer_size=DEFAULT_BUFFER_SIZE)

A buffered binary stream providing higher-level access to a seekable RawIOBase raw binary stream. It inherits from BufferedReader and BufferedWriter.

The constructor creates a reader and writer for a seekable raw stream, given in the first argument. If the buffer_size is omitted it defaults to DEFAULT_BUFFER_SIZE.

BufferedRandom is capable of anything BufferedReader or BufferedWriter can do. In addition, seek() and tell() are guaranteed to be implemented.

class io.BufferedRWPair(reader, writer, buffer_size=DEFAULT_BUFFER_SIZE, /)

A buffered binary stream providing higher-level access to two non seekable RawIOBase raw binary streams---one readable, the other writeable. It inherits from BufferedIOBase.

reader and writer are RawIOBase objects that are readable and writeable respectively. If the buffer_size is omitted it defaults to DEFAULT_BUFFER_SIZE.

BufferedRWPair implements all of BufferedIOBase's methods except for detach(), which raises UnsupportedOperation.

警告

BufferedRWPair does not attempt to synchronize accesses to its underlying raw streams. You should not pass it the same object as reader and writer; use BufferedRandom instead.

文字 I/O

class io.TextIOBase

Base class for text streams. This class provides a character and line based interface to stream I/O. It inherits from IOBase.

TextIOBase provides or overrides these data attributes and methods in addition to those from IOBase:

encoding

The name of the encoding used to decode the stream's bytes into strings, and to encode strings into bytes.

errors

The error setting of the decoder or encoder.

newlines

A string, a tuple of strings, or None, indicating the newlines translated so far. Depending on the implementation and the initial constructor flags, this may not be available.

buffer

The underlying binary buffer (a BufferedIOBase instance) that TextIOBase deals with. This is not part of the TextIOBase API and may not exist in some implementations.

detach()

Separate the underlying binary buffer from the TextIOBase and return it.

After the underlying buffer has been detached, the TextIOBase is in an unusable state.

Some TextIOBase implementations, like StringIO, may not have the concept of an underlying buffer and calling this method will raise UnsupportedOperation.

Added in version 3.1.

read(size=-1, /)

Read and return at most size characters from the stream as a single str. If size is negative or None, reads until EOF.

readline(size=-1, /)

Read until newline or EOF and return a single str. If the stream is already at EOF, an empty string is returned.

If size is specified, at most size characters will be read.

seek(offset, whence=SEEK_SET, /)

Change the stream position to the given offset. Behaviour depends on the whence parameter. The default value for whence is SEEK_SET.

  • SEEK_SET or 0: seek from the start of the stream (the default); offset must either be a number returned by TextIOBase.tell(), or zero. Any other offset value produces undefined behaviour.

  • SEEK_CUR or 1: "seek" to the current position; offset must be zero, which is a no-operation (all other values are unsupported).

  • SEEK_END or 2: seek to the end of the stream; offset must be zero (all other values are unsupported).

Return the new absolute position as an opaque number.

Added in version 3.1: SEEK_* 常數。

tell()

Return the current stream position as an opaque number. The number does not usually represent a number of bytes in the underlying binary storage.

write(s, /)

Write the string s to the stream and return the number of characters written.

class io.TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)

A buffered text stream providing higher-level access to a BufferedIOBase buffered binary stream. It inherits from TextIOBase.

encoding gives the name of the encoding that the stream will be decoded or encoded with. It defaults to locale.getencoding(). encoding="locale" can be used to specify the current locale's encoding explicitly. See 文字編碼 for more information.

errors is an optional string that specifies how encoding and decoding errors are to be handled. Pass 'strict' to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass 'ignore' to ignore errors. (Note that ignoring encoding errors can lead to data loss.) 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data. 'backslashreplace' causes malformed data to be replaced by a backslashed escape sequence. When writing, 'xmlcharrefreplace' (replace with the appropriate XML character reference) or 'namereplace' (replace with \N{...} escape sequences) can be used. Any other error handling name that has been registered with codecs.register_error() is also valid.

newline controls how line endings are handled. It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If newline is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If newline has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

If line_buffering is True, flush() is implied when a call to write contains a newline character or a carriage return.

If write_through is True, calls to write() are guaranteed not to be buffered: any data written on the TextIOWrapper object is immediately handled to its underlying binary buffer.

在 3.3 版的變更: The write_through argument has been added.

在 3.3 版的變更: The default encoding is now locale.getpreferredencoding(False) instead of locale.getpreferredencoding(). Don't change temporary the locale encoding using locale.setlocale(), use the current locale encoding instead of the user preferred encoding.

在 3.10 版的變更: The encoding argument now supports the "locale" dummy encoding name.

TextIOWrapper provides these data attributes and methods in addition to those from TextIOBase and IOBase:

line_buffering

Whether line buffering is enabled.

write_through

Whether writes are passed immediately to the underlying binary buffer.

Added in version 3.7.

reconfigure(*, encoding=None, errors=None, newline=None, line_buffering=None, write_through=None)

Reconfigure this text stream using new settings for encoding, errors, newline, line_buffering and write_through.

Parameters not specified keep current settings, except errors='strict' is used when encoding is specified but errors is not specified.

It is not possible to change the encoding or newline if some data has already been read from the stream. On the other hand, changing encoding after write is possible.

This method does an implicit stream flush before setting the new parameters.

Added in version 3.7.

在 3.11 版的變更: The method supports encoding="locale" option.

seek(cookie, whence=os.SEEK_SET, /)

Set the stream position. Return the new stream position as an int.

Four operations are supported, given by the following argument combinations:

  • seek(0, SEEK_SET): Rewind to the start of the stream.

  • seek(cookie, SEEK_SET): Restore a previous position; cookie must be a number returned by tell().

  • seek(0, SEEK_END): Fast-forward to the end of the stream.

  • seek(0, SEEK_CUR): Leave the current stream position unchanged.

Any other argument combinations are invalid, and may raise exceptions.

tell()

Return the stream position as an opaque number. The return value of tell() can be given as input to seek(), to restore a previous stream position.

class io.StringIO(initial_value='', newline='\n')

A text stream using an in-memory text buffer. It inherits from TextIOBase.

The text buffer is discarded when the close() method is called.

The initial value of the buffer can be set by providing initial_value. If newline translation is enabled, newlines will be encoded as if by write(). The stream is positioned at the start of the buffer which emulates opening an existing file in a w+ mode, making it ready for an immediate write from the beginning or for a write that would overwrite the initial value. To emulate opening a file in an a+ mode ready for appending, use f.seek(0, io.SEEK_END) to reposition the stream at the end of the buffer.

The newline argument works like that of TextIOWrapper, except that when writing output to the stream, if newline is None, newlines are written as \n on all platforms.

StringIO provides this method in addition to those from TextIOBase and IOBase:

getvalue()

Return a str containing the entire contents of the buffer. Newlines are decoded as if by read(), although the stream position is not changed.

使用範例:

import io

output = io.StringIO()
output.write('First line.\n')
print('Second line.', file=output)

# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()

# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
class io.IncrementalNewlineDecoder

A helper codec that decodes newlines for universal newlines mode. It inherits from codecs.IncrementalDecoder.

Performance

This section discusses the performance of the provided concrete I/O implementations.

二進位 (Binary) I/O

By reading and writing only large chunks of data even when the user asks for a single byte, buffered I/O hides any inefficiency in calling and executing the operating system's unbuffered I/O routines. The gain depends on the OS and the kind of I/O which is performed. For example, on some modern OSes such as Linux, unbuffered disk I/O can be as fast as buffered I/O. The bottom line, however, is that buffered I/O offers predictable performance regardless of the platform and the backing device. Therefore, it is almost always preferable to use buffered I/O rather than unbuffered I/O for binary data.

文字 I/O

Text I/O over a binary storage (such as a file) is significantly slower than binary I/O over the same storage, because it requires conversions between unicode and binary data using a character codec. This can become noticeable handling huge amounts of text data like large log files. Also, tell() and seek() are both quite slow due to the reconstruction algorithm used.

StringIO, however, is a native in-memory unicode container and will exhibit similar speed to BytesIO.

Multi-threading

FileIO objects are thread-safe to the extent that the operating system calls (such as read(2) under Unix) they wrap are thread-safe too.

Binary buffered objects (instances of BufferedReader, BufferedWriter, BufferedRandom and BufferedRWPair) protect their internal structures using a lock; it is therefore safe to call them from multiple threads at once.

TextIOWrapper objects are not thread-safe.

Reentrancy

Binary buffered objects (instances of BufferedReader, BufferedWriter, BufferedRandom and BufferedRWPair) are not reentrant. While reentrant calls will not happen in normal situations, they can arise from doing I/O in a signal handler. If a thread tries to re-enter a buffered object which it is already accessing, a RuntimeError is raised. Note this doesn't prohibit a different thread from entering the buffered object.

The above implicitly extends to text files, since the open() function will wrap a buffered object inside a TextIOWrapper. This includes standard streams and therefore affects the built-in print() function as well.