zakruti.com » IT - Software » freeCodeCamp.org

Data Structures For Python Developers (w/ Flask) - Course

video description

Rating: 4.0; Vote: 1

Learn about basic data structure concepts by building a Flask API with Python. - Code: https://github.com/selikapro/FlaskDS Philippe: Please add a note that your -custom_hash- function is completely dysfunctional, with a severe statistic overponderation on the LAST character(s) of ANY key. This occurs because you multiply by ord(i), which is the byte value of the last scanned character: imagine this last byte is a big-enough power of 2, when you multiply by it, and then take the -modulo table_size-; if this table_size is a power of 2, you always get 0 in the final hash (which means that all these keys will fill the collision list on the 1st bucket of the hash table, which then degenerates into a single long linked list, and so you loose ALL the interest of hash tables in terms of CPU time)! So your example function is mostly hashing only the end of strings (it has many more collisions for very different contents of the beginning of input key strings than there are collisions for the same variations at end of input key strings).
The same happens for EVERY table_size value, when it can be a multiple of a byte value. Note also that the byte value 0 of characters found in keys will cancell all accumulated hash value from the beginin of the string (independantly of the content and length of this start of string).
In a course about hash tables, the -custom hashing function- MUST respect a flattened distribution of its output bits, for every possible table size (see Knuth for modular hashing functions): every implementer of hash tables MUST learn how to build CORRECT hashing functions... this was not your case, sorry, but please don't tell others to imitate when you did!
HERE IS THE FIX: Instead of multiplying by ord(i) in function custom_hash, you MUST multiply by a large enough ODD integer constant, and preferably a prime, so that it cannot be a multiple of table_size (e.g. 8171=largest 13-bit prime, 16301=largest 14-bit prime, 32707=largest 15-bit prime): this will lways work with any table_size lower than this prime constant (and will work very well for hashing series of bytes of variable length, like text strings). Many simple examples in Java, C or C++ just use one of 17, 67, 127, 129 or 257 as the prime multiplier for their simple modular hashing function. [Multiplying by integer constants of the form (2-n-1) or (2-n+1) is very simple to compute on the most modest CPUs without an efficient multiplication instruction and without any -barrel-shifter-, just by using a few integer additions]. In all cases, using a good constant multiplier will even be FASTER then your code using a variable multiplier.
As well a good string hashing function should use the total length of the source string as one of its input [if this requires scanning the string, like in C/C++ null-terminated strings used by str-() functions in the classic C library, just hash bytes and count them until you find the terminating null byte, then use the count as an additional value to add to the result, before it is truncated to modulo the table_size; but you may terminate the scan early by only scanning and hashing the first 256 bytes at most for practical cases, to limit the time to compute the hash on very large keys, such as the binary contents of files like images, or videos, but in general those files should have already metadata containing a strong enough hash should as SHA1 or SHA256].
Note that hashing string can even be faster by reading groups of bytes into single 32-bit or 64-bit integer value before adding this value to the hash and multiplying the result it by the prime constant (this saves the cost of multiplications, which may be slower than packing group of bytes into a 32-bit or 34-bit value, possibly accelerated when all strings are allocated in blocks allocated on 32-bit or 64-bit address boundaries, and when their allocated buffer always includes extra -ward- null bytes stored at end of the allocated memory, instead of having any odd buffer sizes, in order to avoid memory access exceptions when reading a few bytes past the end of buffers: these ward bytes must however be initialized to zeroes, and may eventually be protected against write accesses after their initialization, if there's ever some hardware support for it).
Hash tables in default simple implementations used in C++ standard libraries, Java, .Net/CLI, PHP, Lua, Python, Lisp... all use a Knuth modular algorithm by multiplying by a good prime (-good primes- should be fixed-width integer constants having roughly the same number of bits set to -1- as the number of bits set to -0-, considering also that the least significant bit is necessarily a -1-, because all primes larger than 2 are necessarily odd integers).
you may use more flattened distributions using secure digests, including MD5 or SHA1 (even if they are deprecated for strong security usages, in favor of SHA2 algorithms like SHA256, SHA384, SHA512), but at the price of being a slower to compute (even if there are good implementations in most OSes and libraries, already using accelerated instructions of CPUs or GPUs); some CPUs also include their own accelerated digest-computing instructions (which may use something else than modular arithmetic based on additions/multiplication of integers, e.g. using products of polynoms, or accelerated FFT transforms, or one of the common computing or bit-shuffling passes used in wellknown security digests: computing a strong hash of arbitrarily long strings is also highly parallelizable, e.g. by using a good -MAC- function for recombining partial hashes computed in parallel from different segments of the input text, which is then processed in a Merkle tree)...-
Thanks for this correction!
Date: 2022-03-14

← UML Diagrams Full Course (Unified Modeling Language)

OpenGL Course - Create 3D and 2D Graphics With C++ →

Related videos

AM4 Lives: AMD Ryzen 5 5500X3D CPU Review & Benchmarks

• Gamers Nexus

Private DIY Servers Are Illegal Black Markets of Piracy - The ESA's Shady Ties

• Gamers Nexus

How to Speed Up Your Android Phone (Best Settings)

• Geeks Tutorial

End of Real Photography

• PiXimperfect

Corsair, Get It Together

• Gamers Nexus

How to Speed Up Windows 11 (Best Settings)

• Geeks Tutorial

Comments and reviews: 9

Wai
Does this also work for -_search_recursive-? I let the recursion go all the way down to None, which meant an additional node to check, but less IF statements
def _search_recursive(self, blog_post_id, node):
# base case: if node is None, return false
if node is None:
return False
node_id = node.data[-id-]
# on match, return node.data
if int(blog_post_id) == int(node_id):
return node.data
# if target id < current_node.data.id, search left
if (blog_post_id < node_id):
return self._search_recursive(blog_post_id, node.left)
# if target id > current_node.data.id, search right
if (blog_post_id > node_id):
return self._search_recursive(blog_post_id, node.right)
reply

Anthony
Thank you for showing an implementation of LinkedList through Data Structures, but ....
As an Technical Architect or Development lead,
I would never ask a programmer to implement a Linked List Sort, Bubble Sort or any sort algorithm.. So much easier to use a Collections Library to execute sort. chances are they handle out-of-memory conditions for large Datasets.
Is the goal is too sort, chances are the sorting in the Database, even SQLLite, will have a better -memory and other resource:, faster implementation of sorting.
even if it requires research and education, it should product a better implementation.
reply

theonlycatonice
1:09:21 the insert_beginning method seems wrong - it duplicates the first node added at the beginning
(empty list, insert_beginning)
None-
data1 -> data1 -> None
If you place the last 2 lines under the else after the if it works as expected
(empty list, insert_beginning)
None-
data1 -> None
reply

Bug
02:36:29 We dont have to check - if node.left.data[-id-] == blog_post_id - or - if node.right.data[-id-] == blog_post_id -. Because when we call node.left or node.right recursively it is checked at the first condition as it is the main node of that function.
reply

Colm
Does anybody know why I might be getting a -TypeError: 'NoneType' object is not subscriptable- when I try to create a blog post? There seems to be something wrong with the add_key_value function but I wrote it the same way as in the video.
reply

Abdirahman
This is such an amazing course. It opened my eyes on how to actually use data structures in real life taught. Very rare resource and I have successfully completed it. Thank you.
reply

Valentin
I have a question : Do we use a BST to store all blog posts because it's taking o(log(n)) to find a specific post instead of o(n) eventually ? Please, feel free to correct me :)
reply

phalyce
I'm getting -TypeError: Object of type Node is not JSON serializable- when using Postman to request the -get_all_users_descending- route, anyone else? using python 3.8
reply

Add a review, comment