Cannot enlarge string buffer containing XX bytes by XX more bytes

在ELK的數據庫報警系統中,發現有臺機器報出了下面的錯誤:java

2018-12-04 18:55:26.842 CST,"XXX","XXX",21106,"XXX",5c065c3d.5272,4,"idle",2018-12-04 18:51:41 CST,117/0,0,ERROR,54000,"out of memory","Cannot enlarge string buffer containing 0 bytes by 1342177281 more bytes.",,,,,,,"enlargeStringInfo, stringinfo.c:268",""數據庫

當看到是發生了OOM時,覺得是整個數據庫實例存在了問題,線上檢查發現數據庫正常,後查閱資料瞭解到,pg對於一次執行的查詢語句長度是有限制的,若是長度超過了1G,則會報出上面的錯誤。vim

上面日誌中的1342177281 bytes是查詢的長度。app

在使用copy的時候,也常會報出相似的問題,此時就要根據報錯,查看對應的行數是否是因爲引號或轉義問題致使了對應行沒有恰當的結束,或者是一整行的內容大於了1G。oop

下面是翻閱pg9.6源碼找到的相關內容:post

結合註釋,pg的源碼很容易看懂。ui

src/include/utils/memutils.hthis

/*
 * MaxAllocSize, MaxAllocHugeSize
 *      Quasi-arbitrary limits on size of allocations.
 *
 * Note:
 *      There is no guarantee that smaller allocations will succeed, but
 *      larger requests will be summarily denied.
 *
 * palloc() enforces MaxAllocSize, chosen to correspond to the limiting size
 * of varlena objects under TOAST.  See VARSIZE_4B() and related macros in
 * postgres.h.  Many datatypes assume that any allocatable size can be
 * represented in a varlena header.  This limit also permits a caller to use
 * an "int" variable for an index into or length of an allocation.  Callers
 * careful to avoid these hazards can access the higher limit with
 * MemoryContextAllocHuge().  Both limits permit code to assume that it may
 * compute twice an allocation's size without overflow.
 */
#define MaxAllocSize    ((Size) 0x3fffffff)     /* 1 gigabyte - 1 */

src/backend/lib/stringinfo.c編碼

/*
* enlargeStringInfo
*
* Make sure there is enough space for 'needed' more bytes
* ('needed' does not include the terminating null).
*
* External callers usually need not concern themselves with this, since
* all stringinfo.c routines do it automatically.  However, if a caller
* knows that a StringInfo will eventually become X bytes large, it
* can save some palloc overhead by enlarging the buffer before starting
* to store data in it.
*
* NB: because we use repalloc() to enlarge the buffer, the string buffer
* will remain allocated in the same memory context that was current when
* initStringInfo was called, even if another context is now current.
* This is the desired and indeed critical behavior!
*/
void
enlargeStringInfo(StringInfo str, int needed)
{
   int         newlen;

   /*
    * Guard against out-of-range "needed" values.  Without this, we can get
    * an overflow or infinite loop in the following.
    */
   if (needed < 0)             /* should not happen */
       elog(ERROR, "invalid string enlargement request size: %d", needed);
   if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
       ereport(ERROR,
               (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                errmsg("out of memory"),
                errdetail("Cannot enlarge string buffer containing %d bytes by %d more bytes.",
                          str->len, needed)));

   needed += str->len + 1;     /* total space required now */

   /* Because of the above test, we now have needed <= MaxAllocSize */

   if (needed <= str->maxlen)
       return;                 /* got enough space already */

   /*
    * We don't want to allocate just a little more space with each append;
    * for efficiency, double the buffer size each time it overflows.
    * Actually, we might need to more than double it if 'needed' is big...
    */
   newlen = 2 * str->maxlen;
   while (needed > newlen)
       newlen = 2 * newlen;

   /*
    * Clamp to MaxAllocSize in case we went past it.  Note we are assuming
    * here that MaxAllocSize <= INT_MAX/2, else the above loop could
    * overflow.  We will still have newlen >= needed.
    */
   if (newlen > (int) MaxAllocSize)
       newlen = (int) MaxAllocSize;

   str->data = (char *) repalloc(str->data, newlen);

   str->maxlen = newlen;
}

src/include/lib/stringinfo.hspa

下面是字符串存儲用到的結構體:

/*-------------------------
 * StringInfoData holds information about an extensible string.
 *      data    is the current buffer for the string (allocated with palloc).
 *      len     is the current string length.  There is guaranteed to be
 *              a terminating '\0' at data[len], although this is not very
 *              useful when the string holds binary data rather than text.
 *      maxlen  is the allocated size in bytes of 'data', i.e. the maximum
 *              string size (including the terminating '\0' char) that we can
 *              currently store in 'data' without having to reallocate
 *              more space.  We must always have maxlen > len.
 *      cursor  is initialized to zero by makeStringInfo or initStringInfo,
 *              but is not otherwise touched by the stringinfo.c routines.
 *              Some routines use it to scan through a StringInfo.
 *-------------------------
 */
typedef struct StringInfoData
{
    char       *data;
    int         len;
    int         maxlen;
    int         cursor;
} StringInfoData;

typedef StringInfoData *StringInfo;

從存放字符串或二進制的結構體StringInfoData中,能夠看出pg字符串類型不支持\u0000的緣由,由於在pg中的字符串形式是C strings,是以\0結束的字符串,\0在ASCII中叫作NUL,Unicode編碼表示爲\u0000,八進制則爲0x00,若是字符串中包含\0,pg會當作字符串的結束符。

pg中的字符串不支持其中包含NULL(\0x00),這個很明顯是不一樣於NULL值的,NULL值pg是支持的。

在具體的使用中,能夠將\u0000替換掉再導入pg數據庫。

在其餘數據庫導入pg時,能夠使用下面方式替換:

regexp_replace(stringWithNull, '\\u0000', '', 'g')

java程序中替換:

str.replaceAll('\u0000', '')

vim替換:

s/\x00//g;

參考:

src/backend/lib/stringinfo.c

src/include/lib/stringinfo.h

src/include/utils/memutils.h

https://en.wikipedia.org/wiki/Null-terminated_string

https://stackoverflow.com/questions/1347646/postgres-error-on-insert-error-invalid-byte-sequence-for-encoding-utf8-0x0?rq=1

相關文章
相關標籤/搜索