Monday, November 25, 2013

ACID

From Wikipedia, the free encyclopedia
In computer scienceACID (AtomicityConsistencyIsolationDurability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. The chosen initials refer to the acid test.[citation needed]
Jim Gray defined these properties of a reliable transaction system in the late 1970s and developed technologies to achieve them automatically.[1][2][3]
In 1983, Andreas Reuter and Theo Härder coined the acronym ACID to describe them.[4]

Characteristics[edit]

Atomicity[edit]

Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.

Consistency[edit]

The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraintscascadestriggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors do not violate any defined rules.

Isolation[edit]

The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction.[citation needed]

Durability[edit]

Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.

Examples[edit]

The following examples further illustrate the ACID properties. In these examples, the database table has two columns, A and B. An integrity constraint requires that the value in A and the value in B must sum to 100. The following SQL code creates a table as described above:
CREATE TABLE acidtest (A INTEGER, B INTEGER CHECK (A + B = 100));

Atomicity failure[edit]

Assume that a transaction attempts to subtract 10 from A and add 10 to B. This is a valid transaction, since the data continue to satisfy the constraint after it has executed. However, assume that after removing 10 from A, the transaction is unable to modify B. If the database retained A's new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction, or neither, be complete.

Consistency failure[edit]

Consistency is a very general term which demands that the data must meet all validation rules. In the previous example, the validation is a requirement that A + B = 100. Also, it may be inferred that both A and B must be integers. A valid range for A and B may also be inferred. All validation rules must be checked to ensure consistency.
Assume that a transaction attempts to subtract 10 from A without altering B. Because consistency is checked after each transaction, it is known that A + B = 100 before the transaction begins. If the transaction removes 10 from A successfully, atomicity will be achieved. However, a validation check will show that A + B = 90, which is inconsistent with the rules of the database. The entire transaction must be cancelled and the affected rows rolled back to their pre-transaction state. If there had been other constraints, triggers, or cascades, every single change operation would have been checked in the same way as above before the transaction was committed.

Isolation failure[edit]

To demonstrate isolation, we assume two transactions execute at the same time, each attempting to modify the same data. One of the two must wait until the other completes in order to maintain isolation.
Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from B to A. Combined, there are four actions:
  • T1 subtracts 10 from A.
  • T1 adds 10 to B.
  • T2 subtracts 10 from B.
  • T2 adds 10 to A.
If these operations are performed in order, isolation is maintained, although T2 must wait. Consider what happens if T1 fails half-way through. The database eliminates T1's effects, and T2 sees only valid data.
By interleaving the transactions, the actual order of actions might be:
  • T1 subtracts 10 from A.
  • T2 subtracts 10 from B.
  • T2 adds 10 to A.
  • T1 adds 10 to B.
Again, consider what happens if T1 fails halfway through. By the time T1 fails, T2 has already modified A; it cannot be restored to the value it had before T1 without leaving an invalid database. This is known as a write-write failure,[citation needed] because two transactions attempted to write to the same data field. In a typical system, the problem would be resolved by reverting to the last known good state, canceling the failed transaction T1, and restarting the interrupted transaction T2 from the good state.

Durability failure[edit]

Assume that a transaction transfers 10 from A to B. It removes 10 from A. It then adds 10 to B. At this point, a "success" message is sent to the user. However, the changes are still queued in the disk buffer waiting to be committed to the disk. Power fails and the changes are lost. The user assumes (understandably) that the changes have been made.

Implementation[edit]

Processing a transaction often requires a sequence of operations that is subject to failure for a number of reasons. For instance, the system may have no room left on its disk drives, or it may have used up its allocated CPU time.
There are two popular families of techniques: write ahead logging and shadow paging. In both cases, locks must be acquired on all information that is updated, and depending on the level of isolation, possibly on all data that is read as well. In write ahead logging, atomicity is guaranteed by copying the original (unchanged) data to a log before changing the database.[dubious ]That allows the database to return to a consistent state in the event of a crash.
In shadowing, updates are applied to a partial copy of the database, and the new copy is activated when the transaction commits.

Locking vs multiversioning[edit]

Many databases rely upon locking to provide ACID capabilities. Locking means that the transaction marks the data that it accesses so that the DBMS knows not to allow other transactions to modify it until the first transaction succeeds or fails. The lock must always be acquired before processing data, including data that are read but not modified. Non-trivial transactions typically require a large number of locks, resulting in substantial overhead as well as blocking other transactions. For example, if user A is running a transaction that has to read a row of data that user B wants to modify, user B must wait until user A's transaction completes. Two phase locking is often applied to guarantee full isolation.[citation needed]
An alternative to locking is multiversion concurrency control, in which the database provides each reading transaction the prior, unmodified version of data that is being modified by another active transaction. This allows readers to operate without acquiring locks, i.e. writing transactions do not block reading transactions, and readers do not block writers. Going back to the example, when user A's transaction requests data that user B is modifying, the database provides A with the version of that data that existed when user B started his transaction. User A gets a consistent view of the database even if other users are changing data. One implementation, namely snapshot isolation, relaxes the isolation property.

Distributed transactions[edit]

Guaranteeing ACID properties in a distributed transaction across a distributed database where no single node is responsible for all data affecting a transaction presents additional complications. Network connections might fail, or one node might successfully complete its part of the transaction and then be required to roll back its changes, because of a failure on another node. The two-phase commit protocol (not to be confused with two-phase locking) provides atomicity for distributed transactions to ensure that each participant in the transaction agrees on whether the transaction should be committed or not.[citation needed] Briefly, in the first phase, one node (the coordinator) interrogates the other nodes (the participants) and only when all reply that they are prepared does the coordinator, in the second phase, formalize the transaction.

See also[edit]

Database transaction

From Wikipedia, the free encyclopedia
transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes:
  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome are possibly erroneous.
A database transaction, by definition, must be atomicconsistentisolated and durable.[1] Database practitioners often refer to these properties of database transactions using the acronym ACID.
Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.

Purpose[edit]

Databases and other data stores which treat the integrity of data as paramount often include the ability to handle transactions to maintain the integrity of data. A single transaction consists of one or more independent units of work, each reading and/or writing information to a database or other data store. When this happens it is often important to ensure that all such processing leaves the database or data store in a consistent state.
Examples from double-entry accounting systems often illustrate the concept of transactions. In double-entry accounting every debit requires the recording of an associated credit. If one writes a check for $100 to buy groceries, a transactional double-entry accounting system must record the following two entries to cover the single transaction:
  1. Debit $100 to Groceries Expense Account
  2. Credit $100 to Checking Account
A transactional system would make both entries pass or both entries would fail. By treating the recording of multiple entries as an atomic transactional unit of work the system maintains the integrity of the data recorded. In other words, nobody ends up with a situation in which a debit is recorded but no associated credit is recorded, or vice versa.

Transactional databases[edit]

transactional database is a DBMS where write transactions on the database are able to be rolled back if they are not completed properly (e.g. due to power or connectivity loss).
Most modern relational database management systems fall into the category of databases that support transactions.
In a database system a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database. Users of database systemsconsider consistency and integrity of data as highly important. A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following:
  1. Begin the transaction
  2. Execute a set of data manipulations and/or queries
  3. If no errors occur then commit the transaction and end it
  4. If errors occur then rollback the transaction and end it
If no errors occurred during the execution of the transaction then the system commits the transaction. A transaction commit operation applies all data manipulations within the scope of the transaction and persists the results to the database. If an error occurs during the transaction, or if the user specifies a rollback operation, the data manipulations within the transaction are not persisted to the database. In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state.
Internally, multi-user databases store and process transactions, often by using a transaction ID or XID.
There are multiple varying ways for transactions to be implemented other than the simple way documented above. Nested transactions, for example, are transactions which contain statements within them that start new transactions (i.e. sub-transactions). Multi-level transactions are similar but have a few extra properties[citation needed]. Another type of transaction is the compensating transaction.

In SQL[edit]

SQL is inherently transactional, and a transaction is automatically started when another ends. Some databases extend SQL and implement a START TRANSACTION statement, but while seemingly signifying the start of the transaction it merely deactivates autocommit.[citation needed]
The result of any work done after this point will remain invisible to other database-users until the system processes a COMMIT statement. A ROLLBACK statement can also occur, which will undo any work performed since the last transaction. Both COMMIT and ROLLBACK will end the transaction, and start anew. If autocommit was disabled using START TRANSACTION, autocommit will often also be reenabled.
Some database systems allow the synonyms BEGINBEGIN WORK and BEGIN TRANSACTION, and may have other options available.

Distributed transactions[edit]

Database systems implement distributed transactions as transactions against multiple applications or hosts. A distributed transaction enforces the ACID properties over multiple systems or data stores, and might include systems such as databases, file systems, messaging systems, and other applications. In a distributed transaction a coordinating service ensures that all parts of the transaction are applied to all relevant systems. As with database and other transactions, if any part of the transaction fails, the entire transaction is rolled back across all affected systems.

Transactional filesystems[edit]

The Namesys Reiser4 filesystem for Linux[2] supports transactions, and as of Microsoft Windows Vista, the Microsoft NTFS filesystem[3] supports distributed transactions across networks.

See also[edit]

References[edit]

  1. Jump up^ A transaction is a group of operations that are atomic, consistent, isolated, and durable (ACID).
  2. Jump up^ http://namesys.com/v4/v4.html#committing
  3. Jump up^ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/portal.asp

X/Open XA

From Wikipedia, the free encyclopedia
In computing, the XA standard is a specification by The Open Group for distributed transaction processing (DTP). It describes the interface between the global transaction manager and the local resource manager. The goal of XA is to allow multiple resources (such as databases, application servers, message queues, transactional caches, etc.) to be accessed within the same transaction, thereby preserving the ACID properties across applications. XA uses a two-phase commit to ensure that all resources either commit or rollback any particular transaction consistently (all do the same).
XA stands for "eXtended Architecture" and is an X/Open group standard for executing a "global transaction" that accesses more than one back-end data-store. XA specifies how a transaction manager will roll up the transactions against the different data-stores into an "atomic" transaction and execute this with the two-phase commit (2PC) protocol for the transaction. Thus, XA is a type of transaction coordination, often among databases. ACID Transactions are a key feature of databases, but typically databases only provide the ACID guarantees for activities that happen inside a single database. XA coordination allows many resources (again, often databases) to participate in a single, coordinated, atomic update operation.
The XA specification describes what a resource manager must do to support transactional access. Resource managers that follow this specification are said to be XA-compliant.
The XA specification was based on an interface used in the Tuxedo system developed in the 1980s, but adopted by several systems since then.[1]

See also[edit]

References[edit]

  1. Jump up^ Philip A. Bernstein; Eric Newcomer (2009). Principles of transaction processing. Morgan Kaufmann. pp. 330–336. ISBN 978-1-55860-623-4.

External links[edit]

Java Transaction API

From Wikipedia, the free encyclopedia
The Java Transaction API (JTA), one of the Java Enterprise Edition (Java EE) APIs, enables distributed transactions to be done across multiple X/Open XA resources in a Java environment. JTA is a specification developed under the Java Community Process as JSR 907. JTA provides for:

X/Open XA architecture[edit]

In the X/Open XA architecture, a transaction manager or transaction processing monitor (TP monitor) coordinates the transactions across multiple resources such as databases and message queues. Each resource has its own resource manager. The resource manager typically has its own API for manipulating the resource, for example the JDBC API used by relational databases. In addition, the resource manager allows a TP monitor to coordinate a distributed transaction between its own and other resource managers. Finally, there is the application which communicates with the TP monitor to begin, commit or rollback the transactions. The application also communicates with the individual resources using their own API to modify the resource.

JTA implementation of the X/Open XA architecture[edit]

The JTA API consists of classes in two Java packages:
The JTA is modelled on the X/Open XA architecture, but it defines two different APIs for demarcating transaction boundaries. It distinguishes between an application server such as an EJB server and an application component. It provides an interface, javax.transaction.TransactionManager, that is used by the application server itself to begin, commit and rollback the transactions. It provides a different interface, the javax.transaction.UserTransaction, that is used by general client code such as a servlet or an EJB to manage the transactions.
The JTA architecture requires that each resource manager must implement the javax.transaction.xa.XAResource interface in order to be managed by the TP monitor. As stated previously, each resource will have its own specific API, for instance:

Java Transaction API[edit]

The Java Transaction API consists of three elements: a high-level application transaction demarcation interface, a high-level transaction manager interface intended for an application server, and a standard Java mapping of the X/Open XA protocol intended for a transactional resource manager.

UserTransaction interface[edit]

The javax.transaction.UserTransaction interface provides the application the ability to control transaction boundaries programmatically. This interface may be used by Java client programs or EJB beans.
The UserTransaction.begin() method starts a global transaction and associates the transaction with the calling thread. The transaction-to-thread association is managed transparently by the Transaction Manager.
Support for nested transactions is not required. The UserTransaction.begin method throws the NotSupportedException when the calling thread is already associated with a transaction and the transaction manager implementation does not support nested transactions.
Transaction context propagation between application programs is provided by the underlying transaction manager implementations on the client and server machines. The transaction context format used for propagation is protocol dependent and must be negotiated between the client and server hosts. For example, if the transaction manager is an implementation of the JTSspecification, it will use the transaction context propagation format as specified in the CORBA OTS 1.1 specification. Transaction propagation is transparent to application programs..

UserTransaction support in EJB server[edit]

EJB servers are required to support the UserTransaction interface for use by EJB beans with the BEAN value in the javax.ejb.TransactionManagement annotation (this is called bean-managed transactions or BMT). The UserTransaction interface is exposed to EJB components through either the EJBContext interface using the getUserTransaction method, or directly via injection using the general @Resource annotation. Thus, an EJB application does not interface with the Transaction Manager directly for transaction demarcation; instead, the EJB bean relies on the EJB server to provide support for all of its transaction work as defined in the Enterprise JavaBeans Specification. (The underlying interaction between the EJB Server and the TM is transparent to the application; the burden of implementing transaction management is on the EJB container and server provider.[1])
The code sample below illustrates the usage of UserTransaction via bean-managed transactions in an EJB session bean:
@Stateless
@TransactionManagement(BEAN)
public class ExampleBean {
 
    @Resource
    private UserTransaction utx;
 
    public void foo() {
        // start a transaction
        utx.begin();
 
        // Do work
 
        // Commit it
        utx.commit();
    }
}
Alternatively, the UserTransaction can be obtained from the SessionContext:
@Stateless
@TransactionManagement(BEAN)
public class ExampleBean {
 
    @Resource
    private SessionContext ctx;
 
    public void foo() {
        UserTransaction utx = ctx.getUserTransaction();
 
        // start a transaction
        utx.begin();
 
        // Do work
 
        // Commit it
        utx.commit();
    }
}
Note though that in the example above if the @TransactionManagement(BEAN) annotation is omitted, a JTA transaction is automatically started whenever foo() is called and is automatically committed or rolled back when foo() is exited. Making use of a UserTransaction is thus not necessary in EJB programming, but might be needed for very specialized code.

Distributed transaction

From Wikipedia, the free encyclopedia
distributed transaction is an operations bundle, in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that encompasses all operations against such resources. Distributed transactions, as any other transactions, must have all four ACID (atomicity, consistency, isolation, durability) properties, where atomicity guarantees all-or-nothing outcomes for the unit of work (operations bundle).
Open Group, a vendor consortium, proposed the X/Open Distributed Transaction Processing (DTP) Model (X/Open XA), which became a de facto standard for behavior of transaction model components.
Databases are common transactional resources and, often, transactions span a couple of such databases. In this case, a distributed transaction can be seen as a database transaction that must be synchronized (or provide ACID properties) among multiple participating databases which are distributed among different physical locations. The isolation property (the I of ACID) poses a special challenge for multi database transactions, since the (global) serializability property could be violated, even if each database provides it (see also global serializability). In practice most commercial database systems use strong strict two phase locking (SS2PL) for concurrency control, which ensures global serializability, if all the participating databases employ it. (see also commitment ordering for multidatabases.)
A common algorithm for ensuring correct completion of a distributed transaction is the two-phase commit (2PC). This algorithm is usually applied for updates able to commit in a short period of time, ranging from couple of milliseconds to couple of minutes.
There are also long-lived distributed transactions, for example a transaction to book a trip, which consists of booking a flight, a rental car and a hotel. Since booking the flight might take up to a day to get a confirmation, two-phase commit is not applicable here, it will lock the resources for this long. In this case more sophisticated techniques that involve multiple undo levels are used. The way you can undo the hotel booking by calling a desk and cancelling the reservation, a system can be designed to undo certain operations (unless they are irreversibly finished).
In practice, long-lived distributed transactions are implemented in systems based on Web Services. Usually these transactions utilize principles of Compensating transactions, Optimism and Isolation Without Locking. X/Open standard does not cover long-lived DTP.
Several modern technologies, including Enterprise Java Beans (EJBs) and Microsoft Transaction Server (MTS) fully support distributed transaction standards.

No comments: