J.U.C_ConcurrentHashMap

img

preface

​ 做为办公室的游泳健将,鱼是不可能不摸的. 今天在查找接口文档的时候,发现一些业务,像商详、秒杀,这些抗流都是使用的本地缓存, 第二层再是redis.

​ 那么本地缓存一般会使用的都是Map结构(例如key为skuId), 现在使用最多的就是Guava Cache, 然而Spring5即将放弃掉Guava Cache作为缓存机制,而改用Caffeine作为新的本地Cache的组件.(至于为啥,不是一两句能够讲清的 主要是性能,其次是语言方面 函数式编程) 这些实现都是基于今天要讲的CHM

​ PS:以下内容基于JDK 1.8

Map Implementations

CHM是Map 的一种实现, 这里再简单说说Map的其他实现. Map的实现分为通用实现(General-Purpose Map Implementations),专用实现(Special-Purpose Map Implementations)和并发实现(Concurrent Map Implementations)。

More see here (Map接口在1.8有挺多改动的,例如complete 、merge)

General-Purpose Map Implementations

The three general-purpose Map implementations are HashMap, TreeMap and LinkedHashMap.If you need SortedMap operations or key-ordered Collection-view iteration, use TreeMap; if you want maximum speed and don’t care about iteration order, use HashMap; if you want near-HashMap performance and insertion-order iteration, use LinkedHashMap

TreeMap 用于排序

LinkedHashMap 额外维护一个队列, 可以用它来简单实现一个LRU算法

Special-Purpose Map Implementations

There are three special-purpose Map implementations — EnumMap, WeakHashMap and IdentityHashMap. EnumMap, which is internally implemented as an array, is a high-performance Map implementation for use with enum keys. This implementation combines the richness and safety of the Map interface with a speed approaching that of an array. If you want to map an enum to a value, you should always use an EnumMap in preference to an array.

WeakHashMap is an implementation of the Map interface that stores only weak references to its keys. Storing only weak references allows a key-value pair to be garbage-collected when its key is no longer referenced outside of the WeakHashMap. This class provides the easiest way to harness the power of weak references. It is useful for implementing “registry-like” data structures, where the utility of an entry vanishes when its key is no longer reachable by any thread.

IdentityHashMap is an identity-based Map implementation based on a hash table. This class is useful for topology-preserving object graph transformations, such as serialization or deep-copying. To perform such transformations, you need to maintain an identity-based “node table” that keeps track of which objects have already been seen. Identity-based maps are also used to maintain object-to-meta-information mappings in dynamic debuggers and similar systems. Finally, identity-based maps are useful in thwarting “spoof attacks” that are a result of intentionally perverse equals methods because IdentityHashMap never invokes the equals method on its keys. An added benefit of this implementation is that it is fast

IdentityHashMap 不用调用equals ,而是使用==判断

Concurrent Map Implementations

ConcurrentHashMap is a highly concurrent, high-performance implementation backed up by a hash table. This implementation never blocks when performing retrievals and allows the client to select the concurrency level for updates. It is intended as a drop-in replacement for Hashtable: in addition to implementing ConcurrentMap, it supports all the legacy methods peculiar to Hashtable. Again, if you don’t need the legacy operations, be careful to manipulate it with the ConcurrentMap interface.

​ CHM是Hashtable的代替者, 至于原因嘛, 当然是性能啦. Hashtable 的实现是简单的在方法上面加synchronized ,这样会导致锁的竞争对象是实例本身.(如果是static 方法 锁的是整个类对象)

ConcurrentHashMap

通常我们应该站在实现者的角度,而不仅仅是使用者的角度, 通过猜测作者的意图, 换位思考,换成我来实现我会怎么做

Overview


/*
     * Overview:
     *
     * The primary design goal of this hash table is to maintain
     * concurrent readability (typically method get(), but also
     * iterators and related methods) while minimizing update
     * contention. Secondary goals are to keep space consumption about
     * the same or better than java.util.HashMap, and to support high
     * initial insertion rates on an empty table by many threads.
     *
     * This map usually acts as a binned (bucketed) hash table.  Each
     * key-value mapping is held in a Node.  Most nodes are instances
     * of the basic Node class with hash, key, value, and next
     * fields. However, various subclasses exist: TreeNodes are
     * arranged in balanced trees, not lists.  TreeBins hold the roots
     * of sets of TreeNodes. ForwardingNodes are placed at the heads
     * of bins during resizing. ReservationNodes are used as
     * placeholders while establishing values in computeIfAbsent and
     * related methods.  The types TreeBin, ForwardingNode, and
     * ReservationNode do not hold normal user keys, values, or
     * hashes, and are readily distinguishable during search etc
     * because they have negative hash fields and null key and value
     * fields. (These special nodes are either uncommon or transient,
     * so the impact of carrying around some unused fields is
     * insignificant.)
     *
     * The table is lazily initialized to a power-of-two size upon the
     * first insertion.  Each bin in the table normally contains a
     * list of Nodes (most often, the list has only zero or one Node).
     * Table accesses require volatile/atomic reads, writes, and
     * CASes.  Because there is no other way to arrange this without
     * adding further indirections, we use intrinsics
     * (sun.misc.Unsafe) operations.
     *
     * We use the top (sign) bit of Node hash fields for control
     * purposes -- it is available anyway because of addressing
     * constraints.  Nodes with negative hash fields are specially
     * handled or ignored in map methods.
     *
     * Insertion (via put or its variants) of the first node in an
     * empty bin is performed by just CASing it to the bin.  This is
     * by far the most common case for put operations under most
     * key/hash distributions.  Other update operations (insert,
     * delete, and replace) require locks.  We do not want to waste
     * the space required to associate a distinct lock object with
     * each bin, so instead use the first node of a bin list itself as
     * a lock. Locking support for these locks relies on builtin
     * "synchronized" monitors.
     *
     * Using the first node of a list as a lock does not by itself
     * suffice though: When a node is locked, any update must first
     * validate that it is still the first node after locking it, and
     * retry if not. Because new nodes are always appended to lists,
     * once a node is first in a bin, it remains first until deleted
     * or the bin becomes invalidated (upon resizing).
     *
     * The main disadvantage of per-bin locks is that other update
     * operations on other nodes in a bin list protected by the same
     * lock can stall, for example when user equals() or mapping
     * functions take a long time.  However, statistically, under
     * random hash codes, this is not a common problem.  Ideally, the
     * frequency of nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average, given the resizing threshold
     * of 0.75, although with a large variance because of resizing
     * granularity. Ignoring variance, the expected occurrences of
     * list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
     * first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     * Lock contention probability for two threads accessing distinct
     * elements is roughly 1 / (8 * #elements) under random hashes.
     *
     * Actual hash code distributions encountered in practice
     * sometimes deviate significantly from uniform randomness.  This
     * includes the case when N > (1<<30), so some keys MUST collide.
     * Similarly for dumb or hostile usages in which multiple keys are
     * designed to have identical hash codes or ones that differs only
     * in masked-out high bits. So we use a secondary strategy that
     * applies when the number of nodes in a bin exceeds a
     * threshold. These TreeBins use a balanced tree to hold nodes (a
     * specialized form of red-black trees), bounding search time to
     * O(log N).  Each search step in a TreeBin is at least twice as
     * slow as in a regular list, but given that N cannot exceed
     * (1<<64) (before running out of addresses) this bounds search
     * steps, lock hold times, etc, to reasonable constants (roughly
     * 100 nodes inspected per operation worst case) so long as keys
     * are Comparable (which is very common -- String, Long, etc).
     * TreeBin nodes (TreeNodes) also maintain the same "next"
     * traversal pointers as regular nodes, so can be traversed in
     * iterators in the same way.
     *
     * The table is resized when occupancy exceeds a percentage
     * threshold (nominally, 0.75, but see below).  Any thread
     * noticing an overfull bin may assist in resizing after the
     * initiating thread allocates and sets up the replacement array.
     * However, rather than stalling, these other threads may proceed
     * with insertions etc.  The use of TreeBins shields us from the
     * worst case effects of overfilling while resizes are in
     * progress.  Resizing proceeds by transferring bins, one by one,
     * from the table to the next table. However, threads claim small
     * blocks of indices to transfer (via field transferIndex) before
     * doing so, reducing contention.  A generation stamp in field
     * sizeCtl ensures that resizings do not overlap. Because we are
     * using power-of-two expansion, the elements from each bin must
     * either stay at same index, or move with a power of two
     * offset. We eliminate unnecessary node creation by catching
     * cases where old nodes can be reused because their next fields
     * won't change.  On average, only about one-sixth of them need
     * cloning when a table doubles. The nodes they replace will be
     * garbage collectable as soon as they are no longer referenced by
     * any reader thread that may be in the midst of concurrently
     * traversing table.  Upon transfer, the old table bin contains
     * only a special forwarding node (with hash field "MOVED") that
     * contains the next table as its key. On encountering a
     * forwarding node, access and update operations restart, using
     * the new table.
     *
     * Each bin transfer requires its bin lock, which can stall
     * waiting for locks while resizing. However, because other
     * threads can join in and help resize rather than contend for
     * locks, average aggregate waits become shorter as resizing
     * progresses.  The transfer operation must also ensure that all
     * accessible bins in both the old and new table are usable by any
     * traversal.  This is arranged in part by proceeding from the
     * last bin (table.length - 1) up towards the first.  Upon seeing
     * a forwarding node, traversals (see class Traverser) arrange to
     * move to the new table without revisiting nodes.  To ensure that
     * no intervening nodes are skipped even when moved out of order,
     * a stack (see class TableStack) is created on first encounter of
     * a forwarding node during a traversal, to maintain its place if
     * later processing the current table. The need for these
     * save/restore mechanics is relatively rare, but when one
     * forwarding node is encountered, typically many more will be.
     * So Traversers use a simple caching scheme to avoid creating so
     * many new TableStack nodes. (Thanks to Peter Levart for
     * suggesting use of a stack here.)
     *
     * The traversal scheme also applies to partial traversals of
     * ranges of bins (via an alternate Traverser constructor)
     * to support partitioned aggregate operations.  Also, read-only
     * operations give up if ever forwarded to a null table, which
     * provides support for shutdown-style clearing, which is also not
     * currently implemented.
     *
     * Lazy table initialization minimizes footprint until first use,
     * and also avoids resizings when the first operation is from a
     * putAll, constructor with map argument, or deserialization.
     * These cases attempt to override the initial capacity settings,
     * but harmlessly fail to take effect in cases of races.
     *
     * The element count is maintained using a specialization of
     * LongAdder. We need to incorporate a specialization rather than
     * just use a LongAdder in order to access implicit
     * contention-sensing that leads to creation of multiple
     * CounterCells.  The counter mechanics avoid contention on
     * updates but can encounter cache thrashing if read too
     * frequently during concurrent access. To avoid reading so often,
     * resizing under contention is attempted only upon adding to a
     * bin already holding two or more nodes. Under uniform hash
     * distributions, the probability of this occurring at threshold
     * is around 13%, meaning that only about 1 in 8 puts check
     * threshold (and after resizing, many fewer do so).
     *
     * TreeBins use a special form of comparison for search and
     * related operations (which is the main reason we cannot use
     * existing collections such as TreeMaps). TreeBins contain
     * Comparable elements, but may contain others, as well as
     * elements that are Comparable but not necessarily Comparable for
     * the same T, so we cannot invoke compareTo among them. To handle
     * this, the tree is ordered primarily by hash value, then by
     * Comparable.compareTo order if applicable.  On lookup at a node,
     * if elements are not comparable or compare as 0 then both left
     * and right children may need to be searched in the case of tied
     * hash values. (This corresponds to the full list search that
     * would be necessary if all elements were non-Comparable and had
     * tied hashes.) On insertion, to keep a total ordering (or as
     * close as is required here) across rebalancings, we compare
     * classes and identityHashCodes as tie-breakers. The red-black
     * balancing code is updated from pre-jdk-collections
     * (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
     * based in turn on Cormen, Leiserson, and Rivest "Introduction to
     * Algorithms" (CLR).
     *
     * TreeBins also require an additional locking mechanism.  While
     * list traversal is always possible by readers even during
     * updates, tree traversal is not, mainly because of tree-rotations
     * that may change the root node and/or its linkages.  TreeBins
     * include a simple read-write lock mechanism parasitic on the
     * main bin-synchronization strategy: Structural adjustments
     * associated with an insertion or removal are already bin-locked
     * (and so cannot conflict with other writers) but must wait for
     * ongoing readers to finish. Since there can be only one such
     * waiter, we use a simple scheme using a single "waiter" field to
     * block writers.  However, readers need never block.  If the root
     * lock is held, they proceed along the slow traversal path (via
     * next-pointers) until the lock becomes available or the list is
     * exhausted, whichever comes first. These cases are not fast, but
     * maximize aggregate expected throughput.
     *
     * Maintaining API and serialization compatibility with previous
     * versions of this class introduces several oddities. Mainly: We
     * leave untouched but unused constructor arguments refering to
     * concurrencyLevel. We accept a loadFactor constructor argument,
     * but apply it only to initial table capacity (which is the only
     * time that we can guarantee to honor it.) We also declare an
     * unused "Segment" class that is instantiated in minimal form
     * only when serializing.
     *
     * Also, solely for compatibility with previous versions of this
     * class, it extends AbstractMap, even though all of its methods
     * are overridden, so it is just useless baggage.
     *
     * This file is organized to make things a little easier to follow
     * while reading than they might otherwise: First the main static
     * declarations and utilities, then fields, then main public
     * methods (with a few factorings of multiple public methods into
     * internal ones), then sizing methods, trees, traversers, and
     * bulk operations.
     */

以上内容摘自 java.util.concurrent.ConcurrentHashMap 的Overview 部分,下面是基于这段的一个简单翻译

Design golds

  1. The primary design goal of this hash table is to maintain concurrent readability (typically method get(), but also iterators and related methods) while minimizing update contention(竞争).
  2. Secondary goals are to keep space consumption(消耗) about the same or better tha java.util.HashMap, and to support high initial insertion rates on an empty table by many threads.

Node

/*
 * This map usually acts as a binned (bucketed) hash table.  Each
 * key-value mapping is held in a Node.  Most nodes are instances
 * of the basic Node class with hash, key, value, and next
 * fields. However, various subclasses exist: TreeNodes are
 * arranged in balanced trees, not lists.  TreeBins hold the roots
 * of sets of TreeNodes. ForwardingNodes are placed at the heads
 * of bins during resizing. ReservationNodes are used as
 * placeholders while establishing values in computeIfAbsent and
 * related methods.  The types TreeBin, ForwardingNode, and
 * ReservationNode do not hold normal user keys, values, or
 * hashes, and are readily distinguishable during search etc
 * because they have negative hash fields and null key and value
 * fields. (These special nodes are either uncommon or transient,
 * so the impact of carrying around some unused fields is
 * insignificant.)
 */

其底层实现形式也是bucket数组的形式,且<key, value>存储在Node中。Node存在以下子类,包括TreeBin、TreeNode、ForwardingNode、ReservationNode,后三种一般有其特殊用途。

/*
 * The table is lazily initialized to a power-of-two size upon the
 * first insertion.  Each bin in the table normally contains a
 * list of Nodes (most often, the list has only zero or one Node).
 * Table accesses require volatile/atomic reads, writes, and
 * CASes.  Because there is no other way to arrange this without
 * adding further indirections, we use intrinsics
 * (sun.misc.Unsafe) operations.
 */

ConcurrentHashMap使用延迟初始化策略,在第一次insert时,才分配一个2^n大小的bucket数组,一般的bucket为list形式,通过Node中next属性来实现链表(通常链表为空或者只有一个)。

通过volatie 和CAS 来控制其原子性,底层使用的是sun.misc.Unsafe

hash conflict

/*
 * We use the top (sign) bit of Node hash fields for control
 * purposes -- it is available anyway because of addressing
 * constraints.  Nodes with negative hash fields are specially
 * handled or ignored in map methods.
 */

使用bit的高位来存储的目的是为了减少冲突

/*
 * Insertion (via put or its variants) of the first node in an
 * empty bin is performed by just CASing it to the bin.  This is
 * by far the most common case for put operations under most
 * key/hash distributions.  Other update operations (insert,
 * delete, and replace) require locks.  We do not want to waste
 * the space required to associate a distinct lock object with
 * each bin, so instead use the first node of a bin list itself as
 * a lock. Locking support for these locks relies on builtin
 * "synchronized" monitors.
 *
 * Using the first node of a list as a lock does not by itself
 * suffice though: When a node is locked, any update must first
 * validate that it is still the first node after locking it, and
 * retry if not. Because new nodes are always appended to lists,
 * once a node is first in a bin, it remains first until deleted
 * or the bin becomes invalidated (upon resizing).
 */

​ 当在一个空bin中insert第一个node时,其使用CAS操作来同步,而对于其他的update操作(insert,delete, and replace)则需要使用锁来同步。而在每一个bucket中,一般使用此bin中第一个node作为这个bin的锁,锁住整个bucket。(因为新放入bin的node总会添加到list的末尾,故除了delete掉第一个节点或resize数组之外,这个节点总是此bin的第一个node,具有稳定性)。

/*
 * The main disadvantage of per-bin locks is that other update
 * operations on other nodes in a bin list protected by the same
 * lock can stall, for example when user equals() or mapping
 * functions take a long time.  However, statistically, under
 * random hash codes, this is not a common problem.  Ideally, the
 * frequency of nodes in bins follows a Poisson distribution
 * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
 * parameter of about 0.5 on average, given the resizing threshold
 * of 0.75, although with a large variance because of resizing
 * granularity. Ignoring variance, the expected occurrences of
 * list size k are (exp(-0.5)  * pow(0.5, k) / factorial(k)). The
 * first values are:
 *
 * 0:    0.60653066
 * 1:    0.30326533
 * 2:    0.07581633
 * 3:    0.01263606
 * 4:    0.00157952
 * 5:    0.00015795
 * 6:    0.00001316
 * 7:    0.00000094
 * 8:    0.00000006
 * more: less than 1 in ten million
 *
 * Lock contention probability for two threads accessing distinct
 * elements is roughly 1 / (8  * #elements) under random hashes.
 */

​ 当使用锁,锁住bin节点的第一个节点的时候, 最大的问题是其他访问这个bin需要等待(例如equals 和mapping 的过程花费了很长时间 ) 然而, 其实就像上面提到过的通常链表为空或者只有一个. 因为冲突的概率很低

/*
 * Actual hash code distributions encountered in practice
 * sometimes deviate significantly from uniform randomness.  This
 * includes the case when N > (1<<30), so some keys MUST collide.
 * Similarly for dumb or hostile usages in which multiple keys are
 * designed to have identical hash codes or ones that differs only
 * in masked-out high bits. So we use a secondary strategy that
 * applies when the number of nodes in a bin exceeds a
 * threshold. These TreeBins use a balanced tree to hold nodes (a
 * specialized form of red-black trees), bounding search time to
 * O(log N).  Each search step in a TreeBin is at least twice as
 * slow as in a regular list, but given that N cannot exceed
 * (1<<64) (before running out of addresses) this bounds search
 * steps, lock hold times, etc, to reasonable constants (roughly
 * 100 nodes inspected per operation worst case) so long as keys
 * are Comparable (which is very common -- String, Long, etc).
 * TreeBin nodes (TreeNodes) also maintain the same "next"
 * traversal pointers as regular nodes, so can be traversed in
 * iterators in the same way.
 */

在实际使用中,哈希码分布有时会明显偏离均匀随机性。像一些错误使用,使用相同的hash codes,或低位不同会导致该问题.为了解决该问题,引入了红黑树

resized

/*
 * The table is resized when occupancy exceeds a percentage
 * threshold (nominally, 0.75, but see below).  Any thread
 * noticing an overfull bin may assist in resizing after the
 * initiating thread allocates and sets up the replacement array.
 * However, rather than stalling, these other threads may proceed
 * with insertions etc.  The use of TreeBins shields us from the
 * worst case effects of overfilling while resizes are in
 * progress.  Resizing proceeds by transferring bins, one by one,
 * from the table to the next table. However, threads claim small
 * blocks of indices to transfer (via field transferIndex) before
 * doing so, reducing contention.  A generation stamp in field
 * sizeCtl ensures that resizings do not overlap. Because we are
 * using power-of-two expansion, the elements from each bin must
 * either stay at same index, or move with a power of two
 * offset. We eliminate unnecessary node creation by catching
 * cases where old nodes can be reused because their next fields
 * won't change.  On average, only about one-sixth of them need
 * cloning when a table doubles. The nodes they replace will be
 * garbage collectable as soon as they are no longer referenced by
 * any reader thread that may be in the midst of concurrently
 * traversing table.  Upon transfer, the old table bin contains
 * only a special forwarding node (with hash field "MOVED") that
 * contains the next table as its key. On encountering a
 * forwarding node, access and update operations restart, using
 * the new table.
 */

重新扩容当负载因子超过0.75 , 扩容通常是power-of-two ,渐进式的方式,多次扩容(通过transferIndex) ,

使用字段sizeCtl来确保不会重复扩容.

这种扩容通常是相同的索引,或者移动power-of-two 位置(1/6的概率需要cloning) 旧表只包含forwarding node(具有散列字段“MOVED”)做为新表的key

/*
 * Each bin transfer requires its bin lock, which can stall
 * waiting for locks while resizing. However, because other
 * threads can join in and help resize rather than contend for
 * locks, average aggregate waits become shorter as resizing
 * progresses.  The transfer operation must also ensure that all
 * accessible bins in both the old and new table are usable by any
 * traversal.  This is arranged in part by proceeding from the
 * last bin (table.length - 1) up towards the first.  Upon seeing
 * a forwarding node, traversals (see class Traverser) arrange to
 * move to the new table without revisiting nodes.  To ensure that
 * no intervening nodes are skipped even when moved out of order,
 * a stack (see class TableStack) is created on first encounter of
 * a forwarding node during a traversal, to maintain its place if
 * later processing the current table. The need for these
 * save/restore mechanics is relatively rare, but when one
 * forwarding node is encountered, typically many more will be.
 * So Traversers use a simple caching scheme to avoid creating so
 * many new TableStack nodes. (Thanks to Peter Levart for
 * suggesting use of a stack here.)
 */

​ 调整大小时,每个bin移动的时候都需要对bin锁定。但是,因为其他线程可以加入并帮助调整大小而不是争用锁,总耗时会更多(多线程帮助扩容)。扩容时还必须确保旧表和新表中的所有可访问的bin都可以被任何遍历使用更多细节看TableStackTraverser

/*
 * The element count is maintained using a specialization of
 * LongAdder. We need to incorporate a specialization rather than
 * just use a LongAdder in order to access implicit
 * contention-sensing that leads to creation of multiple
 * CounterCells.  The counter mechanics avoid contention on
 * updates but can encounter cache thrashing if read too
 * frequently during concurrent access. To avoid reading so often,
 * resizing under contention is attempted only upon adding to a
 * bin already holding two or more nodes. Under uniform hash
 * distributions, the probability of this occurring at threshold
 * is around 13%, meaning that only about 1 in 8 puts check
 * threshold (and after resizing, many fewer do so).
 */

​ 使用LongAdder 来统计元素的个数

TreeBins

/*
 * TreeBins use a special form of comparison for search and
 * related operations (which is the main reason we cannot use
 * existing collections such as TreeMaps). TreeBins contain
 * Comparable elements, but may contain others, as well as
 * elements that are Comparable but not necessarily Comparable for
 * the same T, so we cannot invoke compareTo among them. To handle
 * this, the tree is ordered primarily by hash value, then by
 * Comparable.compareTo order if applicable.  On lookup at a node,
 * if elements are not comparable or compare as 0 then both left
 * and right children may need to be searched in the case of tied
 * hash values. (This corresponds to the full list search that
 * would be necessary if all elements were non-Comparable and had
 * tied hashes.) On insertion, to keep a total ordering (or as
 * close as is required here) across rebalancings, we compare
 * classes and identityHashCodes as tie-breakers. The red-black
 * balancing code is updated from pre-jdk-collections
 * (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
 * based in turn on Cormen, Leiserson, and Rivest "Introduction to
 * Algorithms" (CLR).
 * 
 * TreeBins also require an additional locking mechanism.  While
 * list traversal is always possible by readers even during
 * updates, tree traversal is not, mainly because of tree-rotations
 * that may change the root node and/or its linkages.  TreeBins
 * include a simple read-write lock mechanism parasitic on the
 * main bin-synchronization strategy: Structural adjustments
 * associated with an insertion or removal are already bin-locked
 * (and so cannot conflict with other writers) but must wait for
 * ongoing readers to finish. Since there can be only one such
 * waiter, we use a simple scheme using a single "waiter" field to
 * block writers.  However, readers need never block.  If the root
 * lock is held, they proceed along the slow traversal path (via
 * next-pointers) until the lock becomes available or the list is
 * exhausted, whichever comes first. These cases are not fast, but
 * maximize aggregate expected throughput.
 */

​ TreeBins , On insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers.

​ TreeBins还需要额外的锁定机制。 尽管在更新期间readers总是可以进行列表遍历,但是树遍历不是,主要是因为可以改变根节点和/或其链接的树旋转。使用读写锁实现,读不会阻塞

baggage

/*     
 * Maintaining API and serialization compatibility with previous
 * versions of this class introduces several oddities. Mainly: We
 * leave untouched but unused constructor arguments refering to
 * concurrencyLevel. We accept a loadFactor constructor argument,
 * but apply it only to initial table capacity (which is the only
 * time that we can guarantee to honor it.) We also declare an
 * unused "Segment" class that is instantiated in minimal form
 * only when serializing.
 *
 * Also, solely for compatibility with previous versions of this
 * class, it extends AbstractMap, even though all of its methods
 * are overridden, so it is just useless baggage.
 *
 */

一些兼容API , concurrencyLevel ,loadFactor,Segment (1.7的实现分段锁) , 继承了AbstractMap,但重写了所有方法

Sources Code

/*     
 * This file is organized to make things a little easier to follow
 * while reading than they might otherwise: First the main static
 * declarations and utilities, then fields, then main public
 * methods (with a few factorings of multiple public methods into
 * internal ones), then sizing methods, trees, traversers, and
 * bulk operations.
 */

​ ConcurrentHashMap 的源码可读性非常高,按照如上方式组织代码

main declarations

//默认bin数组的大小,必须为2的n次方
private static final int DEFAULT_CAPACITY = 16;            

//默认装填因子
private static final float LOAD_FACTOR = 0.75f;

//当bin中大于8个node时,将此bin由list转换为tree
static final int TREEIFY_THRESHOLD = 8;

//tree退化
static final int UNTREEIFY_THRESHOLD = 6;

//大于64个bucket了,才会考虑tree化
static final int MIN_TREEIFY_CAPACITY = 64;

//to place bounds on some sizings
static final int NCPU = Runtime.getRuntime().availableProcessors();


/** For serialization compatibility.  兼容*/
private static final ObjectStreamField[] serialPersistentFields = {
   new ObjectStreamField("segments", Segment[].class),
   new ObjectStreamField("segmentMask", Integer.TYPE),
   new ObjectStreamField("segmentShift", Integer.TYPE)
};
//兼容
private static final int DEFAULT_CONCURRENCY_LEVEL = 16;
/*
 * Encodings for Node hash fields. See above for explanation.
 */
static final int MOVED     = -1; // hash for forwarding nodes
static final int TREEBIN   = -2; // hash for roots of trees
static final int RESERVED  = -3; // hash for transient reservations
static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash

Node

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    volatile V val;
    volatile Node<K,V> next;

    Node(int hash, K key, V val, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.val = val;
        this.next = next;
    }

    public final K getKey()       { return key; }
    public final V getValue()     { return val; }
    public final int hashCode()   { return key.hashCode() ^ val.hashCode(); }
    public final String toString(){ return key + "=" + val; }
    public final V setValue(V value) {
        throw new UnsupportedOperationException();
    }

    public final boolean equals(Object o) {
        Object k, v, u; Map.Entry<?,?> e;
        return ((o instanceof Map.Entry) &&
                (k = (e = (Map.Entry<?,?>)o).getKey()) != null &&
                (v = e.getValue()) != null &&
                (k == key || k.equals(key)) &&
                (v == (u = val) || v.equals(u)));
    }

    /**
     * Virtualized support for map.get(); overridden in subclasses.
     */
    Node<K,V> find(int h, Object k) {
        Node<K,V> e = this;
        if (k != null) {
            do {
                K ek;
                if (e.hash == h &&
                    ((ek = e.key) == k || (ek != null && k.equals(ek))))
                    return e;
            } while ((e = e.next) != null);
        }
        return null;
    }
}

Static utilities

/**
 * Spreads (XORs) higher bits of hash to lower and also forces top
 * bit to 0. Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int spread(int h) {
    return (h ^ (h >>> 16)) & HASH_BITS;
}

/**
 * Returns a power of two table size for the given desired capacity.
 * See Hackers Delight, sec 3.2
 */
private static final int tableSizeFor(int c) {
    int n = c - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

/**
 * Returns x's Class if it is of the form "class C implements
 * Comparable<C>", else null.
 */
static Class<?> comparableClassFor(Object x) {
    if (x instanceof Comparable) {
        Class<?> c; Type[] ts, as; Type t; ParameterizedType p;
        if ((c = x.getClass()) == String.class) // bypass checks
            return c;
        if ((ts = c.getGenericInterfaces()) != null) {
            for (int i = 0; i < ts.length; ++i) {
                if (((t = ts[i]) instanceof ParameterizedType) &&
                    ((p = (ParameterizedType)t).getRawType() ==
                     Comparable.class) &&
                    (as = p.getActualTypeArguments()) != null &&
                    as.length == 1 && as[0] == c) // type arg is c
                    return c;
            }
        }
    }
    return null;
}

/**
 * Returns k.compareTo(x) if x matches kc (k's screened comparable
 * class), else 0.
 */
@SuppressWarnings({"rawtypes","unchecked"}) // for cast to Comparable
static int compareComparables(Class<?> kc, Object k, Object x) {
    return (x == null || x.getClass() != kc ? 0 :
            ((Comparable)k).compareTo(x));
}

Main public methods

Construct
/**
 * Creates a new, empty map with the default initial table size (16).
 */
public ConcurrentHashMap() {
}

/**
 * Creates a new, empty map with an initial table size
 * accommodating the specified number of elements without the need
 * to dynamically resize.
 */
public ConcurrentHashMap(int initialCapacity) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException();
    int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
               MAXIMUM_CAPACITY :
               tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
    this.sizeCtl = cap;
}

/**
 * Creates a new map with the same mappings as the given map.
 */
public ConcurrentHashMap(Map<? extends K, ? extends V> m) {
    this.sizeCtl = DEFAULT_CAPACITY;
    putAll(m);
}

/**
 * Creates a new, empty map with an initial table size based on
 * the given number of elements ({@code initialCapacity}) and
 * initial table density ({@code loadFactor}).
 *
 * @param initialCapacity the initial capacity. The implementation
 * performs internal sizing to accommodate this many elements,
 * given the specified load factor.
 * @param loadFactor the load factor (table density) for
 * establishing the initial table size
 * @throws IllegalArgumentException if the initial capacity of
 * elements is negative or the load factor is nonpositive
 *
 * @since 1.6
 */
public ConcurrentHashMap(int initialCapacity, float loadFactor) {
    this(initialCapacity, loadFactor, 1);
}

/**
 * Creates a new, empty map with an initial table size based on
 * the given number of elements ({@code initialCapacity}), table
 * density ({@code loadFactor}), and number of concurrently
 * updating threads ({@code concurrencyLevel}).
 *
 * @param initialCapacity the initial capacity. The implementation
 * performs internal sizing to accommodate this many elements,
 * given the specified load factor.
 * @param loadFactor the load factor (table density) for
 * establishing the initial table size
 * @param concurrencyLevel the estimated number of concurrently
 * updating threads. The implementation may use this value as
 * a sizing hint.
 * @throws IllegalArgumentException if the initial capacity is
 * negative or the load factor or concurrencyLevel are
 * nonpositive
 */
public ConcurrentHashMap(int initialCapacity,
                         float loadFactor, int concurrencyLevel) {
    if (!(loadFactor > 0.0f) || initialCapacity < 0 || concurrencyLevel <= 0)
        throw new IllegalArgumentException();
    if (initialCapacity < concurrencyLevel)   // Use at least as many bins
        initialCapacity = concurrencyLevel;   // as estimated threads
    long size = (long)(1.0 + (long)initialCapacity / loadFactor);
    int cap = (size >= (long)MAXIMUM_CAPACITY) ?
        MAXIMUM_CAPACITY : tableSizeFor((int)size);
    this.sizeCtl = cap;
}
Size
public int size() {
    long n = sumCount();
    return ((n < 0L) ? 0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
            (int)n);
}
/* ---------------- Counter support -------------- */

/**
 * A padded cell for distributing counts.  Adapted from LongAdder
 * and Striped64.  See their internal docs for explanation.
 */
@sun.misc.Contended static final class CounterCell {
    volatile long value;
    CounterCell(long x) { value = x; }
}

final long sumCount() {
    CounterCell[] as = counterCells; CounterCell a;
    long sum = baseCount;
    if (as != null) {
        for (int i = 0; i < as.length; ++i) {
            if ((a = as[i]) != null)
                sum += a.value;
        }
    }
    return sum;
}
Get
public boolean containsKey(Object key) {
    return get(key) != null;
}

public V get(Object key) {
    Node<K,V>[] tab; 
    Node<K,V> e, p; 
    int n, eh; 
    K ek;
    //通过位运算取得bit高位来使key分布更均匀,当key=null时,会抛出空指针异常
    int h = spread(key.hashCode());    
                                                            
    //e 取得bin中的first node
    if ((tab = table) != null && (n = tab.length) > 0 && (e = tabAt(tab, (n - 1) & h)) != null) {
        if ((eh = e.hash) == h) {                                      
            if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                return e.val;
        }
        else if (eh < 0)
            return (p = e.find(h, key)) != null ? p.val : null;

        //遍历链表
        while ((e = e.next) != null) {      
            if (e.hash == h && ((ek = e.key) == key || (ek != null && key.equals(ek))))
                return e.val;
        }
    }
    return null;
}
Put
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
            //延迟初始化
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();   
                        //当向empty bin 中添加node时,并不使用锁,而使用CAS操作来添加
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            if (casTabAt(tab, i, null,
                         new Node<K,V>(hash, key, value, null)))
                break;                   
        }
        //处于扩容状态,帮助扩容
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        else {
            V oldVal = null;
                //锁住bin节点
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) {
                        binCount = 1;
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node<K,V> pred = e;
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                       value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
            //先插入再转换成tree
            if (binCount != 0) {
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}

可以看到ConcurrentHashMap 为了提高性能做了很多事情

  • ​ 只有哈希冲突才会锁住bin,否则使用CAS
  • ​ 多线程帮助扩容
  • ​ 当链表过长树化,提高查询速度,防止锁住时间过长

More details to continues …..


转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 951488791@qq.com

文章标题:J.U.C_ConcurrentHashMap

字数:7.1k

本文作者:zhengyumin

发布时间:2019-06-24, 19:03:29

最后更新:2020-04-14, 15:33:05

原始链接:http://zyumin.github.io/2019/06/24/J-U-C-ConcurrentHashMap/

版权声明: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。