How to construct a list of lengths efficiently
up vote
17
down vote
favorite
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
add a comment |
up vote
17
down vote
favorite
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
Nov 13 at 13:08
add a comment |
up vote
17
down vote
favorite
up vote
17
down vote
favorite
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
Say I have a sorted list of integers
RandomInteger[{1, 100000}, 10000] // Sort // Short
I want to construct another list whose $m$-th element is the number of elements in the original list that are less than or equal to $m$:
Table[Length@Select[%, LessEqualThan[m]], {m, 10000}]
This is terribly inefficient, but for some reason I cannot come up with a better a approach. What's a better way to accomplish this? This seems to be a fairly standard exercise, so there should be plenty of duplicates, but I can find none.
I am probably missing a key word...
list-manipulation performance-tuning filtering
list-manipulation performance-tuning filtering
edited Nov 13 at 12:30
Alexey Popkov
38k4104260
38k4104260
asked Nov 13 at 1:00
AccidentalFourierTransform
4,9581941
4,9581941
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
Nov 13 at 13:08
add a comment |
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe anEmpiricalDistribution
orBinCounts
already can accomplish what you want?
– Thies Heidecke
Nov 13 at 13:08
1
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?– Thies Heidecke
Nov 13 at 13:08
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
or BinCounts
already can accomplish what you want?– Thies Heidecke
Nov 13 at 13:08
add a comment |
4 Answers
4
active
oldest
votes
up vote
20
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
Ah, great answer, as usual.
– AccidentalFourierTransform
Nov 13 at 1:19
Can you please check my answer because my laptop is very slow
– J42161217
Nov 13 at 3:05
add a comment |
up vote
13
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
Nov 13 at 6:49
1
Hey @ciao, you are back?!!
– kglr
Nov 13 at 6:53
Sorry @Henrik; thanks for the edit.
– kglr
Nov 13 at 10:23
1
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
Nov 13 at 15:14
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
Nov 13 at 22:42
add a comment |
up vote
12
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
Nov 13 at 3:55
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
Nov 13 at 4:01
add a comment |
up vote
5
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
20
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
Ah, great answer, as usual.
– AccidentalFourierTransform
Nov 13 at 1:19
Can you please check my answer because my laptop is very slow
– J42161217
Nov 13 at 3:05
add a comment |
up vote
20
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
Ah, great answer, as usual.
– AccidentalFourierTransform
Nov 13 at 1:19
Can you please check my answer because my laptop is very slow
– J42161217
Nov 13 at 3:05
add a comment |
up vote
20
down vote
accepted
up vote
20
down vote
accepted
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
You can use the usual UnitStep
+ Total
tricks:
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
r2 = Table[Length@Select[s,LessEqualThan[m]],{m,10000}];//AbsoluteTiming
r1 === r2
{0.435358, Null}
{41.4357, Null}
True
Update
As @J42161217 points out, you can take advantage of the fact that the data is sorted to speed things up. He used Differences
. Here is a version that uses Nearest
instead:
mincounts[s_] := With[
{
unique = DeleteDuplicates@Nearest[s->"Element",s][[All,-1]],
counts = Prepend[0] @ DeleteDuplicates@Nearest[s->"Index",s][[All,-1]]
},
With[{near = Nearest[unique->"Index", Range @ Length @ s][[All,1]]},
counts[[1+near-UnitStep[unique[[near]]-Range@Length@s-1]]]
]
]
Comparison:
SeedRandom[1];
s=RandomInteger[{1,100000},10000]//Sort;
(* my first answer *)
r1 = Table[Total[UnitStep[m-s]], {m,10000}]; //AbsoluteTiming
(* J42161217's answer *)
r2 = Flatten[
Join[
{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i, Length[Select[s, # <= 10000 &]]}]
]
][[;;10000]]; // AbsoluteTiming
(* using Nearest *)
r3 = mincounts[s]; //AbsoluteTiming
r1 === r2 === r3
{0.432897, Null}
{0.122198, Null}
{0.025923, Null}
True
edited Nov 13 at 4:11
answered Nov 13 at 1:11
Carl Woll
65.1k285171
65.1k285171
Ah, great answer, as usual.
– AccidentalFourierTransform
Nov 13 at 1:19
Can you please check my answer because my laptop is very slow
– J42161217
Nov 13 at 3:05
add a comment |
Ah, great answer, as usual.
– AccidentalFourierTransform
Nov 13 at 1:19
Can you please check my answer because my laptop is very slow
– J42161217
Nov 13 at 3:05
Ah, great answer, as usual.
– AccidentalFourierTransform
Nov 13 at 1:19
Ah, great answer, as usual.
– AccidentalFourierTransform
Nov 13 at 1:19
Can you please check my answer because my laptop is very slow
– J42161217
Nov 13 at 3:05
Can you please check my answer because my laptop is very slow
– J42161217
Nov 13 at 3:05
add a comment |
up vote
13
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
Nov 13 at 6:49
1
Hey @ciao, you are back?!!
– kglr
Nov 13 at 6:53
Sorry @Henrik; thanks for the edit.
– kglr
Nov 13 at 10:23
1
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
Nov 13 at 15:14
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
Nov 13 at 22:42
add a comment |
up vote
13
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
Beat me to it - BinCounts is the way... +1
– ciao
Nov 13 at 6:49
1
Hey @ciao, you are back?!!
– kglr
Nov 13 at 6:53
Sorry @Henrik; thanks for the edit.
– kglr
Nov 13 at 10:23
1
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
Nov 13 at 15:14
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
Nov 13 at 22:42
add a comment |
up vote
13
down vote
up vote
13
down vote
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
BinCounts
and Accumulate
combination is faster than all the methods posted so far:
r4 = Accumulate @ BinCounts[s, {1, 1 + 10000, 1}]; // RepeatedTiming // First
0.00069
versus Henrik Schumacher's mySparseArray
, Carl Woll's mincounts
and J42161217's Differences
-based method:
r5 = Accumulate[mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[
1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00081
r3 = mincounts[s]; // RepeatedTiming // First
0.016
r2 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
0.149
r2 == r3 == r4 == r5
True
edited Nov 13 at 10:10
Henrik Schumacher
44.9k265130
44.9k265130
answered Nov 13 at 6:31
kglr
171k8194399
171k8194399
Beat me to it - BinCounts is the way... +1
– ciao
Nov 13 at 6:49
1
Hey @ciao, you are back?!!
– kglr
Nov 13 at 6:53
Sorry @Henrik; thanks for the edit.
– kglr
Nov 13 at 10:23
1
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
Nov 13 at 15:14
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
Nov 13 at 22:42
add a comment |
Beat me to it - BinCounts is the way... +1
– ciao
Nov 13 at 6:49
1
Hey @ciao, you are back?!!
– kglr
Nov 13 at 6:53
Sorry @Henrik; thanks for the edit.
– kglr
Nov 13 at 10:23
1
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
Nov 13 at 15:14
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
Nov 13 at 22:42
Beat me to it - BinCounts is the way... +1
– ciao
Nov 13 at 6:49
Beat me to it - BinCounts is the way... +1
– ciao
Nov 13 at 6:49
1
1
Hey @ciao, you are back?!!
– kglr
Nov 13 at 6:53
Hey @ciao, you are back?!!
– kglr
Nov 13 at 6:53
Sorry @Henrik; thanks for the edit.
– kglr
Nov 13 at 10:23
Sorry @Henrik; thanks for the edit.
– kglr
Nov 13 at 10:23
1
1
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
Nov 13 at 15:14
Short and fast. No need sorting.. Excellent!!... +1
– Okkes Dulgerci
Nov 13 at 15:14
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
Nov 13 at 22:42
@kglr - Been here all along, just busy herding a couple of startups, but I read several times a week, and usually learn something new each time.
– ciao
Nov 13 at 22:42
add a comment |
up vote
12
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
Nov 13 at 3:55
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
Nov 13 at 4:01
add a comment |
up vote
12
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
Nov 13 at 3:55
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
Nov 13 at 4:01
add a comment |
up vote
12
down vote
up vote
12
down vote
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
I think this is at least x3 faster than Mr. Carl Woll's answer
Can anybody compare my timing?
r3 = Flatten[Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;;10000]]; // AbsoluteTiming
{0.157123, Null}
Using MapThread the same code is way faster
r6 = Flatten[
Join[{Table[0, s[[1]] - 1]},
MapThread[
Table, {Range[t = Length[Select[s, # <= 10000 &]]],
Differences[s][[1 ;; t]]}]]][[;; 10000]]; // AbsoluteTiming
r6===r3
{0.008387, Null}
True
edited Nov 13 at 12:57
answered Nov 13 at 3:04
J42161217
2,983219
2,983219
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
Nov 13 at 3:55
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
Nov 13 at 4:01
add a comment |
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
Nov 13 at 3:55
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
Nov 13 at 4:01
1
1
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
Nov 13 at 3:55
These are the timing on my laptop r1={0.444354, Null},r2={39.456, Null},r3={0.157123, Null} True
– Okkes Dulgerci
Nov 13 at 3:55
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
Nov 13 at 4:01
Hey, thanks for checking. I will use your timing. Thanks for the confirmation
– J42161217
Nov 13 at 4:01
add a comment |
up vote
5
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
up vote
5
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
add a comment |
up vote
5
down vote
up vote
5
down vote
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
s = Sort[RandomInteger[{1, 100000}, 10000]];
Let us just imagine for the moment that the target list r
is supposed to have length 100000
(we can truncate it afterwards). Then for each entry i
in the list s
, the list r
needs to have a jump of height 1
at position i
. The jumps are the "derivative" of r
(in a discrete sense) and the antiderivative can be obtained with Accumulate
. In order to get the list of jumps, we need additive matrix assembly.
This can be done with this function:
mySparseArray[rules_, dims_, f_: Total, background_: 0.] :=
If[(Head[rules] === Rule) && (rules[[1]] === {}),
rules[[2]],
With[{spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> f}],
SparseArray[rules, dims, background],
SetSystemOptions[spopt]]
]
]
So, in total, r
can be obtained as follows:
r4 = Accumulate[
mySparseArray[Partition[s, 1] -> 1, {s[[-1]]}, Total, 0][[1 ;; Length[s]]]
]; // RepeatedTiming // First
0.00055
For comparison:
r3 = Flatten[
Join[{Table[0, s[[1]] - 1]},
Table[Table[i, Differences[s][[i]]], {i,
Length[Select[s, # <= 10000 &]]}]]][[;; 10000]]; //
RepeatedTiming // First
r3 == r4
0.116
True
edited Nov 13 at 23:11
answered Nov 13 at 6:32
Henrik Schumacher
44.9k265130
44.9k265130
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f185874%2fhow-to-construct-a-list-of-lengths-efficiently%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What do you want to do with the table? What's your original problem? It sounds like you want to build some kind of CDF table manually. Maybe an
EmpiricalDistribution
orBinCounts
already can accomplish what you want?– Thies Heidecke
Nov 13 at 13:08