How can we code when we want to get unique values from an array? Which one is the fastest? I will show the performance result at the end.
Distinct an array of primitive values
The arrays that we will use are the following. Number and string array. We need to handle an object array differently, so let’s start with a simple one.
const primitiveNumbers = [1, 2, 3, 2, 4, 5, 6, 3, 1];
// unique values -> [ 1, 2, 3, 4, 5, 6 ]
const primitiveStrings = [
"aaa", // duplicated 1
"bbb",
"ddd", // duplicated 2
"aaa", // duplicated 1
"eee",
"ccc",
"ddd", // duplicated 2
];
// unique values -> [ 'aaa', 'bbb', 'ddd', 'eee', 'ccc' ]
We need to execute several functions to see if those results are the same. I prepare the generic function to make it easy.
type UniqFunc<T> = (array: T[]) => T[];
function test<T>(array: T[], funcs: UniqFunc<T>[]) {
for (const func of funcs) {
const start = performance.now();
const result = func(array);
const elapsedTime = performance.now() - start;
console.log(`${func.name}, time: ${Math.round(elapsedTime)}`);
console.log(result);
}
}
const funcsForPrimitive = [
uniqByForOf,
uniqByForEach,
uniqByReduce,
uniqByFilter,
uniqBySetWithArrayFrom,
uniqBySetWithSpread,
];
test(primitiveNumbers, funcsForPrimitive);
test(primitiveStrings, funcsForPrimitive);
All functions are defined as UniqFunc
type that requires array
variable. We can specify anything as far as it’s an array but I will use only number/string/object in this post.test
function measure how long it takes to complete the function’s work. funcsForPrimitive
stores functions that I will show in this post.
Comparing values one by one
The first idea is to loop the array and check whether the item has already been shown or not. Add the value if it has not been in result
variable.
function uniqByObject<T>(array: T[]) {
const result: T[] = [];
for (const item of array) {
if (!result.includes(item)) {
result.push(item);
}
}
return result;
}
If you prefer forEach you can write it like this below.
function uniqByForEach<T>(array: T[]) {
const result: T[] = [];
array.forEach((item) => {
if (!result.includes(item)) {
result.push(item);
}
})
return result;
}
Using Array.prototype.reduce
There are other ways if you don’t want to define a middle-man that stores results.
function uniqByReduce<T>(array: T[]): T[] {
return array.reduce((acc: T[], cur: T) => {
if (!acc.includes(cur)) {
acc.push(cur);
}
return acc;
}, [])
}
The first arg acc
of reduce
function callback is an accumulator that will be the return value. The second arg cur is the current value. What this does is basically the same as previous examples.
Using Array.prototype.filer
It can be written with one line if using the filter function.
function uniqByFilter<T>(array: T[]) {
return array.filter((value, index) => array.indexOf(value) === index);
}
The behavior of indexOf
is following.
The indexOf() method returns the first index at which a given element can be found in the array, or -1 if it is not present.
This table shows the relationships.
index | value | indexOf | result |
---|---|---|---|
0 | 1 | 0 | true |
1 | 2 | 1 | true |
2 | 3 | 2 | true |
3 | 2 | 1 | false |
4 | 4 | 4 | true |
5 | 5 | 5 | true |
6 | 6 | 6 | true |
7 | 3 | 2 | false |
8 | 1 | 0 | false |
Since indexOf
function returns the first index the result of indexOf
is different from the current index of the array.
Using Map
Map offers key-value object. If it receives the same key name it updates the value. It means that we don’t have to check the values in there if using the value as a key.
function uniqByMap<T>(array: T[]): T[] {
const map = new Map();
for (const item of array) {
map.set(item, item);
}
return Array.from(map.values());
}
map.values()
returns IterableIterator
type but not array. It should be converted to Array type. That’s why Array.from()
is used here.
Using Set object
I think it’s the easiest way to do it. What we have to do is just put the array into the constructor. Set object stores unique values. It removes the duplicates.
function uniqBySetWithArrayFrom<T>(array: T[]): T[] {
return Array.from(new Set(array));
}
function uniqBySetWithSpread<T>(array: T[]): T[] {
return [...new Set(array)];
}
Unfortunately, it can’t be used for an object because it refers to a reference but not the values in the object.
Distinct an array of key-value object
The examples that I showed above can’t be used for an object. An object isn’t as simple as a primitive value because it refers to a reference. The following two objects have the same values but the result is false.
const obj1 = {foo: 1, hoge: 2};
const obj2 = {foo: 1, hoge: 2};
console.log(obj1 === obj2);
// false
If the same reference is assigned to another variable the result becomes true.
const obj1 = {foo: 1, hoge: 2};
const obj3 = obj1
console.log(obj1 === obj3);
// true
The objects are defined somewhere in the address space. The reference value of the object is the address. obj1
and obj2
are respectively stored in different addresses obj1 === obj2
becomes false.
To compare objects we need to check the values one by one. We will use the following object.
const objects = [
{ name: "aaa", price: 50 },
{ name: "bbb", price: 30 }, // duplicated 1
{ name: "ccc", price: 80 }, // duplicated 2
{ name: "aaa", price: 15 },
{ name: "bbb", price: 30 }, // duplicated 1
{ name: "ccc", price: 80 }, // duplicated 2
];
// Unique values
// [
// { name: 'aaa', price: 50 },
// { name: 'bbb', price: 30 },
// { name: 'ccc', price: 80 },
// { name: 'aaa', price: 15 }
// ]
Using lodash.isEqual
The way for object is basically the same as the one for primitive values. Loop the array and check the value one by one. The difference is to use lodash.isEqual
in order to compare all primitive values in the object.
function uniqForObject<T>(array: T[]): T[] {
const result: T[] = [];
for (const item of array) {
const found = result.some((value) => isEqual(value, item));
if (!found) {
result.push(item);
}
}
return result;
}
This function can be used for primitive values as well.
Performance comparison
Let’s compare the functions’ performance! I used Node.js version 14.13.0 for the experiment. This is the code for it.
type UniqFunc<T> = (array: T[]) => T[];
function test<T>(array: T[], funcs: UniqFunc<T>[], loopCount = 1) {
const times: number[] = [];
for (const func of funcs) {
const start = performance.now();
for (let i = 0; i < loopCount; i++) {
const result = func(array);
}
const elapsedTime = performance.now() - start;
times.push(Math.round(elapsedTime));
}
return times;
}
function generatePrimitiveArray(length: number, range: number): number[] {
const result = [];
for (let i = 0; i < length; i++) {
const value = Math.floor(Math.random() * range);
result.push(value);
}
return result;
}
function measure() {
const settings = [
{ arrayLen: 10, loopRangeS: 10000, loopRangeE: 200000, step: 5000 },
{ arrayLen: 10, loopRangeS: 100000, loopRangeE: 500000, step: 50000 },
{ arrayLen: 100, loopRangeS: 100000, loopRangeE: 500000, step: 50000 },
{ arrayLen: 1000, loopRangeS: 100000, loopRangeE: 500000, step: 50000 },
];
for (const setting of settings) {
console.log(`length: ${setting.arrayLen}`);
const array = generatePrimitiveArray(setting.arrayLen, 10);
const titles = funcsForPrimitive.map(x => x.name).join(",");
console.log(`loop count,${titles}`);
for (let i = setting.loopRangeS; i <= setting.loopRangeE; i += setting.step) {
const times = test(array, funcsForPrimitive, i);
const data = times.join(", ");
console.log(`${i}, ${data}`);
}
console.log();
}
}
measure();
The value range is set to 0 – 9. The same array is used in the loop for all functions to compare the performance precisely. If you want to try it yourself you can clone my repository or just copy and paste the code. You can change the range if you change the second argument value of generatePrimitiveArray
function.
I tried the following code in order to make it faster.
async function testAsync<T>(array: T[], funcs: UniqFunc<T>[], loopCount = 1) {
const promises = funcs.map((func) => execute(array, func, loopCount));
const times = await Promise.all(promises);
return times.map(x => Math.round(x));
}
async function execute<T>(array: T[], func: UniqFunc<T>, loopCount: number) {
const start = performance.now();
for (let i = 0; i < loopCount; i++) {
const result = func(array);
}
return performance.now() - start;
}
However, it didn’t improve it because promise works on a single thread. It can improve the performance for I/O-related work for example but not for CPU-intensive work. To improve the performance for CPU-intensive work we need another thread. We can make it faster by worker thread but I don’t do it here.
You might want to check this post if you are not familiar with Promise.
Array Length 10
loop count | ForOf | ForEach | Reduce | Filter | Map | SetWith ArrayFrom | SetWith Spread | ForObject |
---|---|---|---|---|---|---|---|---|
100000 | 40 | 25 | 24 | 32 | 74 | 37 | 59 | 92 |
150000 | 47 | 45 | 48 | 54 | 88 | 56 | 49 | 57 |
200000 | 36 | 35 | 27 | 39 | 82 | 67 | 68 | 75 |
250000 | 40 | 40 | 30 | 39 | 107 | 85 | 93 | 114 |
300000 | 58 | 50 | 45 | 55 | 123 | 104 | 102 | 120 |
350000 | 64 | 45 | 39 | 55 | 149 | 129 | 124 | 138 |
400000 | 73 | 66 | 59 | 67 | 171 | 151 | 140 | 164 |
450000 | 75 | 68 | 65 | 82 | 189 | 192 | 155 | 176 |
500000 | 83 | 77 | 77 | 94 | 191 | 196 | 172 | 190 |
The vertical line shows time (ms). uniqByReduce
is top and second is uniqByForEach
.
Array Length 10 loop 10000 to 200000
The result for the loop count of less than 200000 looks not good. I tried it again with a smaller loop count increment.
loop count | ForOf | ForEach | Reduce | Filter | Map | SetWith ArrayFrom | SetWith Spread | ForObject |
---|---|---|---|---|---|---|---|---|
10000 | 9 | 17 | 10 | 5 | 9 | 9 | 4 | 14 |
15000 | 2 | 2 | 2 | 2 | 7 | 7 | 7 | 8 |
20000 | 3 | 2 | 3 | 5 | 13 | 7 | 7 | 12 |
25000 | 4 | 4 | 3 | 5 | 11 | 9 | 12 | 13 |
30000 | 7 | 6 | 5 | 5 | 13 | 11 | 10 | 15 |
35000 | 6 | 5 | 6 | 8 | 14 | 12 | 12 | 18 |
40000 | 7 | 6 | 9 | 8 | 27 | 15 | 13 | 20 |
45000 | 10 | 8 | 6 | 7 | 19 | 16 | 19 | 29 |
50000 | 11 | 8 | 11 | 10 | 20 | 17 | 17 | 25 |
55000 | 8 | 7 | 13 | 15 | 25 | 20 | 18 | 31 |
60000 | 14 | 11 | 9 | 10 | 24 | 21 | 20 | 39 |
65000 | 12 | 9 | 10 | 11 | 27 | 23 | 22 | 38 |
70000 | 12 | 10 | 13 | 13 | 27 | 29 | 24 | 41 |
75000 | 15 | 11 | 10 | 15 | 29 | 27 | 24 | 40 |
80000 | 13 | 11 | 14 | 14 | 32 | 27 | 26 | 41 |
85000 | 16 | 12 | 12 | 17 | 40 | 31 | 29 | 48 |
90000 | 18 | 17 | 13 | 16 | 35 | 31 | 29 | 54 |
95000 | 18 | 14 | 15 | 18 | 38 | 32 | 32 | 49 |
100000 | 16 | 17 | 16 | 17 | 43 | 35 | 33 | 58 |
105000 | 19 | 16 | 15 | 17 | 44 | 38 | 35 | 69 |
110000 | 18 | 15 | 16 | 18 | 45 | 38 | 37 | 65 |
115000 | 20 | 16 | 18 | 23 | 46 | 40 | 38 | 60 |
120000 | 22 | 18 | 16 | 22 | 47 | 43 | 40 | 72 |
125000 | 23 | 23 | 17 | 20 | 48 | 45 | 47 | 69 |
130000 | 23 | 20 | 23 | 23 | 57 | 45 | 43 | 77 |
135000 | 22 | 19 | 19 | 26 | 55 | 47 | 44 | 77 |
140000 | 25 | 19 | 19 | 22 | 55 | 50 | 46 | 95 |
145000 | 25 | 21 | 24 | 24 | 61 | 52 | 48 | 76 |
150000 | 27 | 20 | 21 | 35 | 66 | 51 | 50 | 82 |
155000 | 30 | 22 | 25 | 25 | 64 | 54 | 53 | 95 |
160000 | 32 | 22 | 28 | 26 | 69 | 56 | 52 | 102 |
165000 | 28 | 25 | 27 | 35 | 71 | 58 | 62 | 109 |
170000 | 33 | 25 | 25 | 27 | 73 | 59 | 56 | 97 |
175000 | 29 | 26 | 40 | 29 | 70 | 64 | 57 | 91 |
180000 | 32 | 26 | 26 | 38 | 75 | 64 | 59 | 113 |
185000 | 35 | 27 | 28 | 30 | 81 | 65 | 63 | 119 |
190000 | 31 | 25 | 26 | 35 | 76 | 66 | 64 | 122 |
195000 | 36 | 32 | 32 | 39 | 77 | 69 | 63 | 105 |
200000 | 34 | 27 | 30 | 39 | 83 | 73 | 73 | 133 |
The difference is small. Let’s see the result with the log version.
Top is uniqByForEach
but 2nd to 4th are similar. They are uniqByReduce
, uniqByForOf
and uniqByFilter
.
Array Length 100
loop count | ForOf | ForEach | Reduce | Filter | Map | SetWith ArrayFrom | SetWith Spread | ForObject |
---|---|---|---|---|---|---|---|---|
100000 | 258 | 219 | 165 | 256 | 197 | 205 | 200 | 838 |
150000 | 271 | 227 | 244 | 419 | 290 | 303 | 286 | 1235 |
200000 | 362 | 305 | 314 | 563 | 397 | 388 | 388 | 1553 |
250000 | 444 | 391 | 398 | 680 | 480 | 485 | 494 | 2095 |
300000 | 522 | 454 | 494 | 820 | 551 | 600 | 574 | 2418 |
350000 | 613 | 540 | 567 | 938 | 644 | 664 | 674 | 2770 |
400000 | 714 | 641 | 628 | 1073 | 734 | 787 | 772 | 3291 |
450000 | 797 | 676 | 758 | 1207 | 855 | 876 | 860 | 3586 |
500000 | 849 | 768 | 817 | 1319 | 901 | 965 | 983 | 4085 |
uniqForObject
is too slow to compare. Top is uniqForEach
and second is uniqByReduce
.
Array Length 1000
loop count | ForOf | ForEach | Reduce | Filter | Map | SetWith ArrayFrom | SetWith Spread |
---|---|---|---|---|---|---|---|
100000 | 2772 | 2806 | 2783 | 3552 | 2050 | 1991 | 1985 |
150000 | 4190 | 4022 | 3785 | 5137 | 3118 | 2964 | 2968 |
200000 | 5423 | 5180 | 4988 | 6853 | 4139 | 3957 | 3945 |
250000 | 6874 | 6514 | 6439 | 8829 | 5195 | 4944 | 4939 |
300000 | 8046 | 7905 | 7822 | 10299 | 6414 | 5940 | 5916 |
350000 | 9619 | 9038 | 8858 | 12164 | 7337 | 6960 | 6907 |
400000 | 11228 | 10540 | 10051 | 13789 | 8664 | 7917 | 7897 |
450000 | 12106 | 11852 | 11403 | 15548 | 9414 | 8887 | 8894 |
500000 | 13623 | 13509 | 12710 | 17154 | 10404 | 9893 | 9902 |
Wow, uniqBySetWithSpread
or uniqBySetWithArrayFrom
is top here! There are almost the same. Next is uniqByMap
Conclusion
We learned many ways to remove duplicated values from an array. We need to use a library such as lodash to compare two objects deeply but we can easily implement primitive values without a library.
uniqForEach
, uniqByReduce
, uniqByFilter
or uniqByForOf
is good enough for most cases. Using Set
object is the best if you need to work with a big array and need to consider the performance.
Check the following post if you also want to know how to remove an element.
Comments