social.kernel.org

Conversation

CF Bolz-Tereick

cfbolz@mastodon.social

On Twitter I had a thread going this year in which I tried to reflect on bugs that I found throughout the year, how to avoid this kind of bug, what can be learned, etc. I will port this idea over to here and see how it goes in the future (I'm still both here and on Twitter, we'll see how that goes).

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Recently I fixed a bug in PyPy's time.strftime. It was using some unicode helper function that takes as argument a byte buffer with some utf-8 encoded string, as well as the number of code points. strftime was using this API wrong and passing the number of bytes instead.

https://foss.heptapod.net/pypy/pypy/-/issues/3862

0

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

After finding the bug we tried to make this API more robust by having a check in the function that counts the codepoints in the byte buffer and complains if that is different from the second argument. This shouldn't be one by default for performance reasons, but it's on during testing.

The reason why the bug got away for so long is that if you test only with ASCII chars it works, because number of bytes == number of codepoints in that case. Lesson: write tests with wider ranges of characters.

0

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Another bug, this time in itertools.tee: tee has an optimization that uses a __copy__ method on the iterator if it has one, instead of carefully using its generic implementation. However, PyPy got it wrong and copied the *iterable* instead of the iterator

https://foss.heptapod.net/pypy/pypy/-/issues/3852

This works in simple tests, but in more complicated situations it gives nonsense.

0

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

🪲, also present in CPython. On Linux, if you pass MSG_TRUNC as a flag to socket.recv (which calls recv in its implementation) it will return the size of the packet, not the number of bytes written into the output buf.

https://foss.heptapod.net/pypy/pypy/-/issues/3864
This confused the logic in socket.recv, it leads to an assertion error in PyPy (trying to read too many chars from the output buffer) and getting garbled characters in CPy. Fixed by not reading more than the buffer size from the buffer in PyPy in that case

0

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

CPython bug: https://github.com/python/cpython/issues/69121
someone could fix this! probably not super hard.

I learned again that I know nothing about network programming :-(

1

0

0

Jarkko Sakkinen

jarkko

Reply to @cfbolz@mastodon.social

Edited 2 years ago

@cfbolz I do that sort of thing over here occasionally and informally. E.g. I had this bug that I was hunting for over a month (yep, was total nightmare at times but I learned RISC-V assembly in the process so I guess it was for profit): https://github.com/keystone-enclave/keystone/issues/378

Then I wrote this post to have something generically applicable to try out in future when I have paging issues in RISC-V: https://social.kernel.org/notice/AbcKucebyfykZkSo4W. I don't want to keep this too formal. I try to remember adding #note to everything that I might want to checkup at times to get some ideas. Sort of scrap book in Mastodon 🙂

It is pretty easy to recall at least the year when I was working "something", and thus I feel that a single hashtag can help a lot already and is low-barrier at the same time.

0

0

1

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Fixed a bug in PyPy's 3.9 parser (based on the new PEG parsing approach introduced in cpy 3.9). The parser would report a valid generator expression in a function call as lacking parentheses, but only if there is another syntax error further down in the file. Eg

f(x for x in y)
if a:
pass

Would report line 1 (which is fine) not line 3.

Bug was an oversight, leaving out an 'if' in the logic when porting from CPy. Shows that error cases are often not tested enough?

https://foss.heptapod.net/pypy/pypy/-/issues/3873

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

seems I neglected my bug thread a little bit! I interacted with two interesting bugs this week, that I wanted to write about.

One is threads related: a Python program with a bunch of threads crashes on PyPy, but not on CPython.

It turns out the project was missing a lock around this kind of code:

request_id = self._next_id
self._next_id += 1

and was handing the same request id a bunch of times.

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

This bug was possible in CPython too, but happened rarely. On PyPy it was happening very reliably, due to PyPy's higher performance and slight differences in when the GIL is released. This is a pattern we see regularly, that a latent threading bug only appears in PyPy.

Since then I've also learned that you can use itertools.count() as an atomic counter.

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

2nd bug was in the HPy-implementation of PyPy. we aren't quite sure we've understood it 100%, but it looks like we managed to confuse ourselves with metaclasses. Say we have a metaclass that is created from an HPy extension, ie C code. If that metaclass is instantiated, its instances are themselves also types.

In one code path we were reading a slot from the newly instantiated type, as opposed to the metatype. Most code doesn't have C-defined metaclasses, but numpy does, leading to a crash.

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Amusingly enough, soon after writing about a CPython bug involving class versioning ( https://mastodon.social/@cfbolz/111708848209751692 ), I found one in PyPy too:

https://github.com/pypy/pypy/issues/4840

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

If you have a instance x of a class X, and both the instance and the class have an attribute f, reading x.f will return the instance attribute if the class attribute X.f is not a data descriptor (a data descriptor is something like a property). The lookup x.f will be cached in the interpreter, to not have to do any dictionary lookups if it is done repeatedly.

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

But there was a case of missing cache invalidation: We can *make* X.f be a data descriptor later by adding methods to its class after. If the x.f cache has been filled before that already, which will then return the wrong prior result.

The fix is to only fill the cache if X.f is an instance of an immutable class.

Feels hard to learn something from this, apart from the fact that Python's object model is a sprawling mess 🤷‍♀️

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Another week, another bug: PyPy's JIT assumes that property objects are immutable, but they can totally be mutated by calling their `__init__` method again later. This leads to miscompiles where the old property getter was still called in already JIT-compiled code.

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Fixing it (without losing performance) involves changing the fields of the property to be "quasi-immutable". This means they can be mutated, but the JIT should assume that this happens very rarely. If it does happen, all the callers of the property will get invalidated and recompiled. I found the bug when commenting on a CPython issue where they plan to add similar mechanisms to CPython: https://github.com/faster-cpython/ideas/issues/645#issuecomment-1905491165

2

0

0

danzin

danzin@mastodon.social

Reply to @cfbolz@mastodon.social

@cfbolz Seeing PyPy and faster-cpython folks interacting is the stuff of dreams :)

PyPy has a lot of history with experiments and optimizations that could be useful to faster-cpython development. I see it's been a long time since papers and talks were added to extradoc (and even then, they focus on novel ideas). Would writing down how PyPy does the things faster-cpython is investigating help them?

I also wonder whether faster-cpython's research could help with speeding up PyPy's interpreter.

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @danzin@mastodon.social

@danzin we don't really write papers any more, not enough academics left in PyPy 🤷‍♀️. Also, the core technology is really stable and hasn't been changing that much (and the small changes that do happen often end up on the blog).

Using some of the techniques that cpython has been using to write a faster interpreter in PyPy to speed up warmup is a possibility, but would require significant effort (the faster cpython team is pretty big, PyPy really isn't).

0

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Ouch, @mgorny found/reported that PyPy's unicode .expandtabs method was simply giving wrong results in non-ascii situations :-(. While fixing it, it also turns out that it's quadratic?! Fixed now, but I was impressed by the badness.

3

0

0

holga

hpk@chaos.social

Reply to @cfbolz@mastodon.social

@cfbolz @mgorny how old.was that code?

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @hpk@chaos.social

@hpk @mgorny the current form of the code was six years old. but the quadraticness had been there since at least 2005, and I can't follow the history much earlier, because the source commit is a merge by Armin's custom svn merge tool 😆

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

@hpk @mgorny digging deeper, it was written by Guenter Jantzen (who I don't know) in 2003, so it was basically quadratic from the first implementation on, and that property survived through a whole lot of refactorings
https://github.com/pypy/pypy/commit/65ff28c60376#diff-58edf75816640e8633647c2f5d9c50814f490608c19c8f31079e25761ccfaa21L623

0

0

0

danzin

danzin@mastodon.social

Reply to @cfbolz@mastodon.social

@mgorny is such an ubiquitous force of good in FOSS.

0

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

This is what I get for posting about finding bugs on social media: I now get invited to be a reviewer for scam journals on pesticide research.

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

actually I was wrong! I didn't get the review invitation to the pesticide journal because of my posting about bugs!

I got it due to bad pattern matching in some system. The name of the tool the paper describes has a very small edit distance to the string "PyPy".

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

The description of a tracing indefinitely bug: https://mastodon.social/@cfbolz/113142067087813944

1

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Found a PyPy JIT bug that happens when you compile Python code that repeatedly accesses a huge list at fixed offsets >= 2**15. Leads to the jit failing with an assertion error.

I wrote the code that caused the crash. It was likely caused by me misunderstanding that the method 'append_int' only works for... shorts 🤦‍♀️

2

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

I tried to find out who called the method append_int just now. Of course it turns out that that was also me. Oh well.

1

0

0

Andy Wingo

wingo@mastodon.social

Reply to @cfbolz@mastodon.social

@cfbolz i think modern hardware is missing “jort ints”

0

0

0

CF Bolz-Tereick

cfbolz@mastodon.social

Reply to @cfbolz@mastodon.social

Another longer thread about a JIT deoptimization bug is here: https://mastodon.social/@cfbolz/113980112206754464

0

0

0

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org