Archive

Monthly Archives: December 2015

Go deeper into BDR and patched-PostgreSQL 9.4.5, eventually I find out the root cause. When new record INSERT-ed, BDR replicates it to other nodes. After simple_heap_insert, BDR calls UserTableUpdateOpenIndexes. For index on epression written in SQL, BDR code routes to GetActiveSnapshot() which fires Segmentation fault because ActiveSnapShot is invalid.

Snapshot
GetActiveSnapshot(void)
{
...
return ActiveSnapshot->as_snap;//invalid ActiveSnapshot fires Segmentation fault
}

Goes to bdr_apply.c in BDR package, I found out that ActiveSnapshot is set and cleared with PushActiveSnapshot and PopActiveSnapshot calls respectively in process_remote_update and process_remote_delete functions but not in process_remote_insert.

Then, I apply the function pair in process_remote_insert at the point before and after UserTableUpdateOpenIndexes calls :

...
PushActiveSnapshot(GetTransactionSnapshot());
if (conflict)
{

...
if (apply_update)
{

...
UserTableUpdateOpenIndexes(estate, newslot);
...

}

}
else
{

simple_heap_insert(rel->rel, newslot->tts_tuple);
UserTableUpdateOpenIndexes(estate, newslot);
...

}
PopActiveSnapshot();
...

Rebuild and reinstall the BDR package. Now no more replication crashes caused by updating index on expression written in SQL language. Case closed.

Advertisements

What a sleepless night investigating why BDR crashes when updating index on expression with Segmentation Fault error. I agree with akretschmer‘s comment on my previous post that replacing functional index with column index is not considered as a solution. So I take a closer look at function associated with the problematic expression index:

The function is defined in SQL language:

CREATE OR REPLACE FUNCTION mybdr.funcidx(a smallint, b integer, c smallint, d text)
RETURNS text AS
$BODY$
SELECT ($1+1)::text || '.' || '5' || (CASE WHEN $3=2::smallint THEN '1' ELSE '' END) || '.' || LPAD($2::text, 6, '0') || '/' || $4;
$BODY$
LANGUAGE sql IMMUTABLE STRICT;

Then I find out that by converting it to PLPGSQL language, no more BDR crashes:

CREATE OR REPLACE FUNCTION mybdr.funcidx(a smallint, b integer, c smallint, d text)
RETURNS text AS
$BODY$
BEGIN
RETURN (a+1)::text || '.' || '5' || (CASE WHEN c=2::smallint THEN '1' ELSE '' END) || '.' || LPAD(b::text, 6, '0') || '/' || d;
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE STRICT;

Need a deep investigation on that.

Giving BDR version 0.9.3 and patched-PostgreSQL 9.4.5 replication a shot. All runs smooth until I add an index expression to a replicated table then insert a record into. BDR crashes with error message:

LOG: worker process: bdr (6229655055721591121,1,16386,)->bdr (6229655055721591121,1, (PID 750) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: bdr_apply: BEGIN origin(source, orig_lsn, timestamp): 0/9D50AD0, 2015-12-27 10:13:29.916096+07

Followed by series of:
FATAL: mismatch in worker state, got 0, expected 1

Dig into BDR C codes, I found out that the error is emitted from a point in bdr_apply.c:

function: process_remote_insert
simple_heap_insert(rel->rel, newslot->tts_tuple); --ok
UserTableUpdateOpenIndexes(estate, newslot); --failed

Go deeper into BDR-patched-PostgreSQL’s index.c, it is clear that BDR only crashes when updating index on expression (no problem with column index).

function: FormIndexDatum
iDatum = slot_getattr(slot, keycol, &isNull); --ok

but
iDatum = ExecEvalExprSwitchContext((ExprState *) lfirst(indexpr_item), GetPerTupleExprContext(estate),&isNull,NULL); --failed

Simple solution is removing the index expression and promote field (or new field) for regular column index instead.