preface
做为办公室的游泳健将,鱼是不可能不摸的. 今天在查找接口文档的时候,发现一些业务,像商详、秒杀,这些抗流都是使用的本地缓存, 第二层再是redis.
那么本地缓存一般会使用的都是Map结构(例如key为skuId), 现在使用最多的就是Guava Cache, 然而Spring5即将放弃掉Guava Cache作为缓存机制,而改用Caffeine作为新的本地Cache的组件.(至于为啥,不是一两句能够讲清的 主要是性能,其次是语言方面 函数式编程) 这些实现都是基于今天要讲的CHM
PS:以下内容基于JDK 1.8
Map Implementations
CHM是Map 的一种实现, 这里再简单说说Map的其他实现. Map的实现分为通用实现(General-Purpose Map Implementations),专用实现(Special-Purpose Map Implementations)和并发实现(Concurrent Map Implementations)。
More see here (Map接口在1.8有挺多改动的,例如complete 、merge)
General-Purpose Map Implementations
The three general-purpose
Map
implementations areHashMap
,TreeMap
andLinkedHashMap
.If you needSortedMap
operations or key-orderedCollection
-view iteration, useTreeMap
; if you want maximum speed and don’t care about iteration order, useHashMap
; if you want near-HashMap
performance and insertion-order iteration, useLinkedHashMap
TreeMap 用于排序
LinkedHashMap 额外维护一个队列, 可以用它来简单实现一个LRU算法
Special-Purpose Map Implementations
There are three special-purpose Map implementations —
EnumMap
,WeakHashMap
andIdentityHashMap
.EnumMap
, which is internally implemented as anarray
, is a high-performanceMap
implementation for use with enum keys. This implementation combines the richness and safety of theMap
interface with a speed approaching that of an array. If you want to map an enum to a value, you should always use anEnumMap
in preference to an array.
WeakHashMap
is an implementation of theMap
interface that stores only weak references to its keys. Storing only weak references allows a key-value pair to be garbage-collected when its key is no longer referenced outside of theWeakHashMap
. This class provides the easiest way to harness the power of weak references. It is useful for implementing “registry-like” data structures, where the utility of an entry vanishes when its key is no longer reachable by any thread.
IdentityHashMap
is an identity-basedMap
implementation based on a hash table. This class is useful for topology-preserving object graph transformations, such as serialization or deep-copying. To perform such transformations, you need to maintain an identity-based “node table” that keeps track of which objects have already been seen. Identity-based maps are also used to maintain object-to-meta-information mappings in dynamic debuggers and similar systems. Finally, identity-based maps are useful in thwarting “spoof attacks” that are a result of intentionally perverseequals
methods becauseIdentityHashMap
never invokes theequals
method on its keys. An added benefit of this implementation is that it is fast
IdentityHashMap 不用调用equals ,而是使用==判断
Concurrent Map Implementations
ConcurrentHashMap
is a highly concurrent, high-performance implementation backed up by a hash table. This implementation never blocks when performing retrievals and allows the client to select the concurrency level for updates. It is intended as a drop-in replacement forHashtable
: in addition to implementingConcurrentMap
, it supports all the legacy methods peculiar toHashtable
. Again, if you don’t need the legacy operations, be careful to manipulate it with theConcurrentMap
interface.
CHM是Hashtable的代替者, 至于原因嘛, 当然是性能啦. Hashtable 的实现是简单的在方法上面加synchronized ,这样会导致锁的竞争对象是实例本身.(如果是static 方法 锁的是整个类对象)
ConcurrentHashMap
通常我们应该站在实现者的角度,而不仅仅是使用者的角度, 通过猜测作者的意图, 换位思考,换成我来实现我会怎么做
Overview
/*
* Overview:
*
* The primary design goal of this hash table is to maintain
* concurrent readability (typically method get(), but also
* iterators and related methods) while minimizing update
* contention. Secondary goals are to keep space consumption about
* the same or better than java.util.HashMap, and to support high
* initial insertion rates on an empty table by many threads.
*
* This map usually acts as a binned (bucketed) hash table. Each
* key-value mapping is held in a Node. Most nodes are instances
* of the basic Node class with hash, key, value, and next
* fields. However, various subclasses exist: TreeNodes are
* arranged in balanced trees, not lists. TreeBins hold the roots
* of sets of TreeNodes. ForwardingNodes are placed at the heads
* of bins during resizing. ReservationNodes are used as
* placeholders while establishing values in computeIfAbsent and
* related methods. The types TreeBin, ForwardingNode, and
* ReservationNode do not hold normal user keys, values, or
* hashes, and are readily distinguishable during search etc
* because they have negative hash fields and null key and value
* fields. (These special nodes are either uncommon or transient,
* so the impact of carrying around some unused fields is
* insignificant.)
*
* The table is lazily initialized to a power-of-two size upon the
* first insertion. Each bin in the table normally contains a
* list of Nodes (most often, the list has only zero or one Node).
* Table accesses require volatile/atomic reads, writes, and
* CASes. Because there is no other way to arrange this without
* adding further indirections, we use intrinsics
* (sun.misc.Unsafe) operations.
*
* We use the top (sign) bit of Node hash fields for control
* purposes -- it is available anyway because of addressing
* constraints. Nodes with negative hash fields are specially
* handled or ignored in map methods.
*
* Insertion (via put or its variants) of the first node in an
* empty bin is performed by just CASing it to the bin. This is
* by far the most common case for put operations under most
* key/hash distributions. Other update operations (insert,
* delete, and replace) require locks. We do not want to waste
* the space required to associate a distinct lock object with
* each bin, so instead use the first node of a bin list itself as
* a lock. Locking support for these locks relies on builtin
* "synchronized" monitors.
*
* Using the first node of a list as a lock does not by itself
* suffice though: When a node is locked, any update must first
* validate that it is still the first node after locking it, and
* retry if not. Because new nodes are always appended to lists,
* once a node is first in a bin, it remains first until deleted
* or the bin becomes invalidated (upon resizing).
*
* The main disadvantage of per-bin locks is that other update
* operations on other nodes in a bin list protected by the same
* lock can stall, for example when user equals() or mapping
* functions take a long time. However, statistically, under
* random hash codes, this is not a common problem. Ideally, the
* frequency of nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average, given the resizing threshold
* of 0.75, although with a large variance because of resizing
* granularity. Ignoring variance, the expected occurrences of
* list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
* first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*
* Lock contention probability for two threads accessing distinct
* elements is roughly 1 / (8 * #elements) under random hashes.
*
* Actual hash code distributions encountered in practice
* sometimes deviate significantly from uniform randomness. This
* includes the case when N > (1<<30), so some keys MUST collide.
* Similarly for dumb or hostile usages in which multiple keys are
* designed to have identical hash codes or ones that differs only
* in masked-out high bits. So we use a secondary strategy that
* applies when the number of nodes in a bin exceeds a
* threshold. These TreeBins use a balanced tree to hold nodes (a
* specialized form of red-black trees), bounding search time to
* O(log N). Each search step in a TreeBin is at least twice as
* slow as in a regular list, but given that N cannot exceed
* (1<<64) (before running out of addresses) this bounds search
* steps, lock hold times, etc, to reasonable constants (roughly
* 100 nodes inspected per operation worst case) so long as keys
* are Comparable (which is very common -- String, Long, etc).
* TreeBin nodes (TreeNodes) also maintain the same "next"
* traversal pointers as regular nodes, so can be traversed in
* iterators in the same way.
*
* The table is resized when occupancy exceeds a percentage
* threshold (nominally, 0.75, but see below). Any thread
* noticing an overfull bin may assist in resizing after the
* initiating thread allocates and sets up the replacement array.
* However, rather than stalling, these other threads may proceed
* with insertions etc. The use of TreeBins shields us from the
* worst case effects of overfilling while resizes are in
* progress. Resizing proceeds by transferring bins, one by one,
* from the table to the next table. However, threads claim small
* blocks of indices to transfer (via field transferIndex) before
* doing so, reducing contention. A generation stamp in field
* sizeCtl ensures that resizings do not overlap. Because we are
* using power-of-two expansion, the elements from each bin must
* either stay at same index, or move with a power of two
* offset. We eliminate unnecessary node creation by catching
* cases where old nodes can be reused because their next fields
* won't change. On average, only about one-sixth of them need
* cloning when a table doubles. The nodes they replace will be
* garbage collectable as soon as they are no longer referenced by
* any reader thread that may be in the midst of concurrently
* traversing table. Upon transfer, the old table bin contains
* only a special forwarding node (with hash field "MOVED") that
* contains the next table as its key. On encountering a
* forwarding node, access and update operations restart, using
* the new table.
*
* Each bin transfer requires its bin lock, which can stall
* waiting for locks while resizing. However, because other
* threads can join in and help resize rather than contend for
* locks, average aggregate waits become shorter as resizing
* progresses. The transfer operation must also ensure that all
* accessible bins in both the old and new table are usable by any
* traversal. This is arranged in part by proceeding from the
* last bin (table.length - 1) up towards the first. Upon seeing
* a forwarding node, traversals (see class Traverser) arrange to
* move to the new table without revisiting nodes. To ensure that
* no intervening nodes are skipped even when moved out of order,
* a stack (see class TableStack) is created on first encounter of
* a forwarding node during a traversal, to maintain its place if
* later processing the current table. The need for these
* save/restore mechanics is relatively rare, but when one
* forwarding node is encountered, typically many more will be.
* So Traversers use a simple caching scheme to avoid creating so
* many new TableStack nodes. (Thanks to Peter Levart for
* suggesting use of a stack here.)
*
* The traversal scheme also applies to partial traversals of
* ranges of bins (via an alternate Traverser constructor)
* to support partitioned aggregate operations. Also, read-only
* operations give up if ever forwarded to a null table, which
* provides support for shutdown-style clearing, which is also not
* currently implemented.
*
* Lazy table initialization minimizes footprint until first use,
* and also avoids resizings when the first operation is from a
* putAll, constructor with map argument, or deserialization.
* These cases attempt to override the initial capacity settings,
* but harmlessly fail to take effect in cases of races.
*
* The element count is maintained using a specialization of
* LongAdder. We need to incorporate a specialization rather than
* just use a LongAdder in order to access implicit
* contention-sensing that leads to creation of multiple
* CounterCells. The counter mechanics avoid contention on
* updates but can encounter cache thrashing if read too
* frequently during concurrent access. To avoid reading so often,
* resizing under contention is attempted only upon adding to a
* bin already holding two or more nodes. Under uniform hash
* distributions, the probability of this occurring at threshold
* is around 13%, meaning that only about 1 in 8 puts check
* threshold (and after resizing, many fewer do so).
*
* TreeBins use a special form of comparison for search and
* related operations (which is the main reason we cannot use
* existing collections such as TreeMaps). TreeBins contain
* Comparable elements, but may contain others, as well as
* elements that are Comparable but not necessarily Comparable for
* the same T, so we cannot invoke compareTo among them. To handle
* this, the tree is ordered primarily by hash value, then by
* Comparable.compareTo order if applicable. On lookup at a node,
* if elements are not comparable or compare as 0 then both left
* and right children may need to be searched in the case of tied
* hash values. (This corresponds to the full list search that
* would be necessary if all elements were non-Comparable and had
* tied hashes.) On insertion, to keep a total ordering (or as
* close as is required here) across rebalancings, we compare
* classes and identityHashCodes as tie-breakers. The red-black
* balancing code is updated from pre-jdk-collections
* (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
* based in turn on Cormen, Leiserson, and Rivest "Introduction to
* Algorithms" (CLR).
*
* TreeBins also require an additional locking mechanism. While
* list traversal is always possible by readers even during
* updates, tree traversal is not, mainly because of tree-rotations
* that may change the root node and/or its linkages. TreeBins
* include a simple read-write lock mechanism parasitic on the
* main bin-synchronization strategy: Structural adjustments
* associated with an insertion or removal are already bin-locked
* (and so cannot conflict with other writers) but must wait for
* ongoing readers to finish. Since there can be only one such
* waiter, we use a simple scheme using a single "waiter" field to
* block writers. However, readers need never block. If the root
* lock is held, they proceed along the slow traversal path (via
* next-pointers) until the lock becomes available or the list is
* exhausted, whichever comes first. These cases are not fast, but
* maximize aggregate expected throughput.
*
* Maintaining API and serialization compatibility with previous
* versions of this class introduces several oddities. Mainly: We
* leave untouched but unused constructor arguments refering to
* concurrencyLevel. We accept a loadFactor constructor argument,
* but apply it only to initial table capacity (which is the only
* time that we can guarantee to honor it.) We also declare an
* unused "Segment" class that is instantiated in minimal form
* only when serializing.
*
* Also, solely for compatibility with previous versions of this
* class, it extends AbstractMap, even though all of its methods
* are overridden, so it is just useless baggage.
*
* This file is organized to make things a little easier to follow
* while reading than they might otherwise: First the main static
* declarations and utilities, then fields, then main public
* methods (with a few factorings of multiple public methods into
* internal ones), then sizing methods, trees, traversers, and
* bulk operations.
*/
以上内容摘自 java.util.concurrent.ConcurrentHashMap 的Overview 部分,下面是基于这段的一个简单翻译
Design golds
- The primary design goal of this hash table is to maintain concurrent readability (typically method get(), but also iterators and related methods) while minimizing update contention(竞争).
- Secondary goals are to keep space consumption(消耗) about the same or better tha java.util.HashMap, and to support high initial insertion rates on an empty table by many threads.
Node
/*
* This map usually acts as a binned (bucketed) hash table. Each
* key-value mapping is held in a Node. Most nodes are instances
* of the basic Node class with hash, key, value, and next
* fields. However, various subclasses exist: TreeNodes are
* arranged in balanced trees, not lists. TreeBins hold the roots
* of sets of TreeNodes. ForwardingNodes are placed at the heads
* of bins during resizing. ReservationNodes are used as
* placeholders while establishing values in computeIfAbsent and
* related methods. The types TreeBin, ForwardingNode, and
* ReservationNode do not hold normal user keys, values, or
* hashes, and are readily distinguishable during search etc
* because they have negative hash fields and null key and value
* fields. (These special nodes are either uncommon or transient,
* so the impact of carrying around some unused fields is
* insignificant.)
*/
其底层实现形式也是bucket数组的形式,且<key, value>存储在Node中。Node存在以下子类,包括TreeBin、TreeNode、ForwardingNode、ReservationNode,后三种一般有其特殊用途。
/*
* The table is lazily initialized to a power-of-two size upon the
* first insertion. Each bin in the table normally contains a
* list of Nodes (most often, the list has only zero or one Node).
* Table accesses require volatile/atomic reads, writes, and
* CASes. Because there is no other way to arrange this without
* adding further indirections, we use intrinsics
* (sun.misc.Unsafe) operations.
*/
ConcurrentHashMap使用延迟初始化策略,在第一次insert时,才分配一个2^n大小的bucket数组,一般的bucket为list形式,通过Node中next属性来实现链表(通常链表为空或者只有一个)。
通过volatie 和CAS 来控制其原子性,底层使用的是sun.misc.Unsafe
hash conflict
/*
* We use the top (sign) bit of Node hash fields for control
* purposes -- it is available anyway because of addressing
* constraints. Nodes with negative hash fields are specially
* handled or ignored in map methods.
*/
使用bit的高位来存储的目的是为了减少冲突
/*
* Insertion (via put or its variants) of the first node in an
* empty bin is performed by just CASing it to the bin. This is
* by far the most common case for put operations under most
* key/hash distributions. Other update operations (insert,
* delete, and replace) require locks. We do not want to waste
* the space required to associate a distinct lock object with
* each bin, so instead use the first node of a bin list itself as
* a lock. Locking support for these locks relies on builtin
* "synchronized" monitors.
*
* Using the first node of a list as a lock does not by itself
* suffice though: When a node is locked, any update must first
* validate that it is still the first node after locking it, and
* retry if not. Because new nodes are always appended to lists,
* once a node is first in a bin, it remains first until deleted
* or the bin becomes invalidated (upon resizing).
*/
当在一个空bin中insert第一个node时,其使用CAS操作来同步,而对于其他的update操作(insert,delete, and replace)则需要使用锁来同步。而在每一个bucket中,一般使用此bin中第一个node作为这个bin的锁,锁住整个bucket。(因为新放入bin的node总会添加到list的末尾,故除了delete掉第一个节点或resize数组之外,这个节点总是此bin的第一个node,具有稳定性)。
/*
* The main disadvantage of per-bin locks is that other update
* operations on other nodes in a bin list protected by the same
* lock can stall, for example when user equals() or mapping
* functions take a long time. However, statistically, under
* random hash codes, this is not a common problem. Ideally, the
* frequency of nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average, given the resizing threshold
* of 0.75, although with a large variance because of resizing
* granularity. Ignoring variance, the expected occurrences of
* list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
* first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*
* Lock contention probability for two threads accessing distinct
* elements is roughly 1 / (8 * #elements) under random hashes.
*/
当使用锁,锁住bin节点的第一个节点的时候, 最大的问题是其他访问这个bin需要等待(例如equals 和mapping 的过程花费了很长时间 ) 然而, 其实就像上面提到过的通常链表为空或者只有一个. 因为冲突的概率很低
/*
* Actual hash code distributions encountered in practice
* sometimes deviate significantly from uniform randomness. This
* includes the case when N > (1<<30), so some keys MUST collide.
* Similarly for dumb or hostile usages in which multiple keys are
* designed to have identical hash codes or ones that differs only
* in masked-out high bits. So we use a secondary strategy that
* applies when the number of nodes in a bin exceeds a
* threshold. These TreeBins use a balanced tree to hold nodes (a
* specialized form of red-black trees), bounding search time to
* O(log N). Each search step in a TreeBin is at least twice as
* slow as in a regular list, but given that N cannot exceed
* (1<<64) (before running out of addresses) this bounds search
* steps, lock hold times, etc, to reasonable constants (roughly
* 100 nodes inspected per operation worst case) so long as keys
* are Comparable (which is very common -- String, Long, etc).
* TreeBin nodes (TreeNodes) also maintain the same "next"
* traversal pointers as regular nodes, so can be traversed in
* iterators in the same way.
*/
在实际使用中,哈希码分布有时会明显偏离均匀随机性。像一些错误使用,使用相同的hash codes,或低位不同会导致该问题.为了解决该问题,引入了红黑树
resized
/*
* The table is resized when occupancy exceeds a percentage
* threshold (nominally, 0.75, but see below). Any thread
* noticing an overfull bin may assist in resizing after the
* initiating thread allocates and sets up the replacement array.
* However, rather than stalling, these other threads may proceed
* with insertions etc. The use of TreeBins shields us from the
* worst case effects of overfilling while resizes are in
* progress. Resizing proceeds by transferring bins, one by one,
* from the table to the next table. However, threads claim small
* blocks of indices to transfer (via field transferIndex) before
* doing so, reducing contention. A generation stamp in field
* sizeCtl ensures that resizings do not overlap. Because we are
* using power-of-two expansion, the elements from each bin must
* either stay at same index, or move with a power of two
* offset. We eliminate unnecessary node creation by catching
* cases where old nodes can be reused because their next fields
* won't change. On average, only about one-sixth of them need
* cloning when a table doubles. The nodes they replace will be
* garbage collectable as soon as they are no longer referenced by
* any reader thread that may be in the midst of concurrently
* traversing table. Upon transfer, the old table bin contains
* only a special forwarding node (with hash field "MOVED") that
* contains the next table as its key. On encountering a
* forwarding node, access and update operations restart, using
* the new table.
*/
重新扩容当负载因子超过0.75 , 扩容通常是power-of-two ,渐进式的方式,多次扩容(通过transferIndex) ,
使用字段sizeCtl来确保不会重复扩容.
这种扩容通常是相同的索引,或者移动power-of-two 位置(1/6的概率需要cloning) 旧表只包含forwarding node(具有散列字段“MOVED”)做为新表的key
/*
* Each bin transfer requires its bin lock, which can stall
* waiting for locks while resizing. However, because other
* threads can join in and help resize rather than contend for
* locks, average aggregate waits become shorter as resizing
* progresses. The transfer operation must also ensure that all
* accessible bins in both the old and new table are usable by any
* traversal. This is arranged in part by proceeding from the
* last bin (table.length - 1) up towards the first. Upon seeing
* a forwarding node, traversals (see class Traverser) arrange to
* move to the new table without revisiting nodes. To ensure that
* no intervening nodes are skipped even when moved out of order,
* a stack (see class TableStack) is created on first encounter of
* a forwarding node during a traversal, to maintain its place if
* later processing the current table. The need for these
* save/restore mechanics is relatively rare, but when one
* forwarding node is encountered, typically many more will be.
* So Traversers use a simple caching scheme to avoid creating so
* many new TableStack nodes. (Thanks to Peter Levart for
* suggesting use of a stack here.)
*/
调整大小时,每个bin移动的时候都需要对bin锁定。但是,因为其他线程可以加入并帮助调整大小而不是争用锁,总耗时会更多(多线程帮助扩容)。扩容时还必须确保旧表和新表中的所有可访问的bin都可以被任何遍历使用更多细节看TableStack
、 Traverser
/*
* The element count is maintained using a specialization of
* LongAdder. We need to incorporate a specialization rather than
* just use a LongAdder in order to access implicit
* contention-sensing that leads to creation of multiple
* CounterCells. The counter mechanics avoid contention on
* updates but can encounter cache thrashing if read too
* frequently during concurrent access. To avoid reading so often,
* resizing under contention is attempted only upon adding to a
* bin already holding two or more nodes. Under uniform hash
* distributions, the probability of this occurring at threshold
* is around 13%, meaning that only about 1 in 8 puts check
* threshold (and after resizing, many fewer do so).
*/
使用LongAdder
来统计元素的个数
TreeBins
/*
* TreeBins use a special form of comparison for search and
* related operations (which is the main reason we cannot use
* existing collections such as TreeMaps). TreeBins contain
* Comparable elements, but may contain others, as well as
* elements that are Comparable but not necessarily Comparable for
* the same T, so we cannot invoke compareTo among them. To handle
* this, the tree is ordered primarily by hash value, then by
* Comparable.compareTo order if applicable. On lookup at a node,
* if elements are not comparable or compare as 0 then both left
* and right children may need to be searched in the case of tied
* hash values. (This corresponds to the full list search that
* would be necessary if all elements were non-Comparable and had
* tied hashes.) On insertion, to keep a total ordering (or as
* close as is required here) across rebalancings, we compare
* classes and identityHashCodes as tie-breakers. The red-black
* balancing code is updated from pre-jdk-collections
* (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
* based in turn on Cormen, Leiserson, and Rivest "Introduction to
* Algorithms" (CLR).
*
* TreeBins also require an additional locking mechanism. While
* list traversal is always possible by readers even during
* updates, tree traversal is not, mainly because of tree-rotations
* that may change the root node and/or its linkages. TreeBins
* include a simple read-write lock mechanism parasitic on the
* main bin-synchronization strategy: Structural adjustments
* associated with an insertion or removal are already bin-locked
* (and so cannot conflict with other writers) but must wait for
* ongoing readers to finish. Since there can be only one such
* waiter, we use a simple scheme using a single "waiter" field to
* block writers. However, readers need never block. If the root
* lock is held, they proceed along the slow traversal path (via
* next-pointers) until the lock becomes available or the list is
* exhausted, whichever comes first. These cases are not fast, but
* maximize aggregate expected throughput.
*/
TreeBins , On insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers.
TreeBins还需要额外的锁定机制。 尽管在更新期间readers总是可以进行列表遍历,但是树遍历不是,主要是因为可以改变根节点和/或其链接的树旋转。使用读写锁实现,读不会阻塞
baggage
/*
* Maintaining API and serialization compatibility with previous
* versions of this class introduces several oddities. Mainly: We
* leave untouched but unused constructor arguments refering to
* concurrencyLevel. We accept a loadFactor constructor argument,
* but apply it only to initial table capacity (which is the only
* time that we can guarantee to honor it.) We also declare an
* unused "Segment" class that is instantiated in minimal form
* only when serializing.
*
* Also, solely for compatibility with previous versions of this
* class, it extends AbstractMap, even though all of its methods
* are overridden, so it is just useless baggage.
*
*/
一些兼容API , concurrencyLevel ,loadFactor,Segment (1.7的实现分段锁) , 继承了AbstractMap,但重写了所有方法
Sources Code
/*
* This file is organized to make things a little easier to follow
* while reading than they might otherwise: First the main static
* declarations and utilities, then fields, then main public
* methods (with a few factorings of multiple public methods into
* internal ones), then sizing methods, trees, traversers, and
* bulk operations.
*/
ConcurrentHashMap 的源码可读性非常高,按照如上方式组织代码
main declarations
//默认bin数组的大小,必须为2的n次方
private static final int DEFAULT_CAPACITY = 16;
//默认装填因子
private static final float LOAD_FACTOR = 0.75f;
//当bin中大于8个node时,将此bin由list转换为tree
static final int TREEIFY_THRESHOLD = 8;
//tree退化
static final int UNTREEIFY_THRESHOLD = 6;
//大于64个bucket了,才会考虑tree化
static final int MIN_TREEIFY_CAPACITY = 64;
//to place bounds on some sizings
static final int NCPU = Runtime.getRuntime().availableProcessors();
/** For serialization compatibility. 兼容*/
private static final ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("segments", Segment[].class),
new ObjectStreamField("segmentMask", Integer.TYPE),
new ObjectStreamField("segmentShift", Integer.TYPE)
};
//兼容
private static final int DEFAULT_CONCURRENCY_LEVEL = 16;
/*
* Encodings for Node hash fields. See above for explanation.
*/
static final int MOVED = -1; // hash for forwarding nodes
static final int TREEBIN = -2; // hash for roots of trees
static final int RESERVED = -3; // hash for transient reservations
static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash
Node
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
volatile V val;
volatile Node<K,V> next;
Node(int hash, K key, V val, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.val = val;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return val; }
public final int hashCode() { return key.hashCode() ^ val.hashCode(); }
public final String toString(){ return key + "=" + val; }
public final V setValue(V value) {
throw new UnsupportedOperationException();
}
public final boolean equals(Object o) {
Object k, v, u; Map.Entry<?,?> e;
return ((o instanceof Map.Entry) &&
(k = (e = (Map.Entry<?,?>)o).getKey()) != null &&
(v = e.getValue()) != null &&
(k == key || k.equals(key)) &&
(v == (u = val) || v.equals(u)));
}
/**
* Virtualized support for map.get(); overridden in subclasses.
*/
Node<K,V> find(int h, Object k) {
Node<K,V> e = this;
if (k != null) {
do {
K ek;
if (e.hash == h &&
((ek = e.key) == k || (ek != null && k.equals(ek))))
return e;
} while ((e = e.next) != null);
}
return null;
}
}
Static utilities
/**
* Spreads (XORs) higher bits of hash to lower and also forces top
* bit to 0. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int spread(int h) {
return (h ^ (h >>> 16)) & HASH_BITS;
}
/**
* Returns a power of two table size for the given desired capacity.
* See Hackers Delight, sec 3.2
*/
private static final int tableSizeFor(int c) {
int n = c - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
/**
* Returns x's Class if it is of the form "class C implements
* Comparable<C>", else null.
*/
static Class<?> comparableClassFor(Object x) {
if (x instanceof Comparable) {
Class<?> c; Type[] ts, as; Type t; ParameterizedType p;
if ((c = x.getClass()) == String.class) // bypass checks
return c;
if ((ts = c.getGenericInterfaces()) != null) {
for (int i = 0; i < ts.length; ++i) {
if (((t = ts[i]) instanceof ParameterizedType) &&
((p = (ParameterizedType)t).getRawType() ==
Comparable.class) &&
(as = p.getActualTypeArguments()) != null &&
as.length == 1 && as[0] == c) // type arg is c
return c;
}
}
}
return null;
}
/**
* Returns k.compareTo(x) if x matches kc (k's screened comparable
* class), else 0.
*/
@SuppressWarnings({"rawtypes","unchecked"}) // for cast to Comparable
static int compareComparables(Class<?> kc, Object k, Object x) {
return (x == null || x.getClass() != kc ? 0 :
((Comparable)k).compareTo(x));
}
Main public methods
Construct
/**
* Creates a new, empty map with the default initial table size (16).
*/
public ConcurrentHashMap() {
}
/**
* Creates a new, empty map with an initial table size
* accommodating the specified number of elements without the need
* to dynamically resize.
*/
public ConcurrentHashMap(int initialCapacity) {
if (initialCapacity < 0)
throw new IllegalArgumentException();
int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
MAXIMUM_CAPACITY :
tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
this.sizeCtl = cap;
}
/**
* Creates a new map with the same mappings as the given map.
*/
public ConcurrentHashMap(Map<? extends K, ? extends V> m) {
this.sizeCtl = DEFAULT_CAPACITY;
putAll(m);
}
/**
* Creates a new, empty map with an initial table size based on
* the given number of elements ({@code initialCapacity}) and
* initial table density ({@code loadFactor}).
*
* @param initialCapacity the initial capacity. The implementation
* performs internal sizing to accommodate this many elements,
* given the specified load factor.
* @param loadFactor the load factor (table density) for
* establishing the initial table size
* @throws IllegalArgumentException if the initial capacity of
* elements is negative or the load factor is nonpositive
*
* @since 1.6
*/
public ConcurrentHashMap(int initialCapacity, float loadFactor) {
this(initialCapacity, loadFactor, 1);
}
/**
* Creates a new, empty map with an initial table size based on
* the given number of elements ({@code initialCapacity}), table
* density ({@code loadFactor}), and number of concurrently
* updating threads ({@code concurrencyLevel}).
*
* @param initialCapacity the initial capacity. The implementation
* performs internal sizing to accommodate this many elements,
* given the specified load factor.
* @param loadFactor the load factor (table density) for
* establishing the initial table size
* @param concurrencyLevel the estimated number of concurrently
* updating threads. The implementation may use this value as
* a sizing hint.
* @throws IllegalArgumentException if the initial capacity is
* negative or the load factor or concurrencyLevel are
* nonpositive
*/
public ConcurrentHashMap(int initialCapacity,
float loadFactor, int concurrencyLevel) {
if (!(loadFactor > 0.0f) || initialCapacity < 0 || concurrencyLevel <= 0)
throw new IllegalArgumentException();
if (initialCapacity < concurrencyLevel) // Use at least as many bins
initialCapacity = concurrencyLevel; // as estimated threads
long size = (long)(1.0 + (long)initialCapacity / loadFactor);
int cap = (size >= (long)MAXIMUM_CAPACITY) ?
MAXIMUM_CAPACITY : tableSizeFor((int)size);
this.sizeCtl = cap;
}
Size
public int size() {
long n = sumCount();
return ((n < 0L) ? 0 :
(n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
(int)n);
}
/* ---------------- Counter support -------------- */
/**
* A padded cell for distributing counts. Adapted from LongAdder
* and Striped64. See their internal docs for explanation.
*/
@sun.misc.Contended static final class CounterCell {
volatile long value;
CounterCell(long x) { value = x; }
}
final long sumCount() {
CounterCell[] as = counterCells; CounterCell a;
long sum = baseCount;
if (as != null) {
for (int i = 0; i < as.length; ++i) {
if ((a = as[i]) != null)
sum += a.value;
}
}
return sum;
}
Get
public boolean containsKey(Object key) {
return get(key) != null;
}
public V get(Object key) {
Node<K,V>[] tab;
Node<K,V> e, p;
int n, eh;
K ek;
//通过位运算取得bit高位来使key分布更均匀,当key=null时,会抛出空指针异常
int h = spread(key.hashCode());
//e 取得bin中的first node
if ((tab = table) != null && (n = tab.length) > 0 && (e = tabAt(tab, (n - 1) & h)) != null) {
if ((eh = e.hash) == h) {
if ((ek = e.key) == key || (ek != null && key.equals(ek)))
return e.val;
}
else if (eh < 0)
return (p = e.find(h, key)) != null ? p.val : null;
//遍历链表
while ((e = e.next) != null) {
if (e.hash == h && ((ek = e.key) == key || (ek != null && key.equals(ek))))
return e.val;
}
}
return null;
}
Put
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
//延迟初始化
if (tab == null || (n = tab.length) == 0)
tab = initTable();
//当向empty bin 中添加node时,并不使用锁,而使用CAS操作来添加
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break;
}
//处于扩容状态,帮助扩容
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
else {
V oldVal = null;
//锁住bin节点
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
//先插入再转换成tree
if (binCount != 0) {
if (binCount >= TREEIFY_THRESHOLD)
treeifyBin(tab, i);
if (oldVal != null)
return oldVal;
break;
}
}
}
addCount(1L, binCount);
return null;
}
可以看到ConcurrentHashMap 为了提高性能做了很多事情
- 只有哈希冲突才会锁住bin,否则使用CAS
- 多线程帮助扩容
- 当链表过长树化,提高查询速度,防止锁住时间过长
More details to continues …..
转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 951488791@qq.com