July 20, 2016
Related Material::
Additional Participants: Alexandre Belloni, Andrew Lunn, Bird, Timothy, Chris Mason, Christian Borntraeger, Daniel Vetter, Dan Williams, David Woodhouse, Dmitry Torokhov, Geert Uytterhoeven, Greg KH, Guenter Roeck, James Bottomley, Jani Nikula, Jan Kara, Jason Cooper, Jiri Kosina, Johannes Berg, Jonathan Cameron, Jonathan Corbet, Josh Boyer, Justin Forbes, Laurent Pinchart, Luck, Tony, Luis de Bethencourt, Luis R. Rodriguez, Mark Brown, Olof Johansson, Rafael J. Wysocki, Shuah Khan, Stephen Hemminger, Steven Rostedt, Sudip Mukherjee, Takashi Iwai, Theodore Ts'o, Trond Myklebust, Vinod Koul, Vlastimil Babka, and Zefan Li.
People tagged: Fengguang Wu and Greg Kroah-Hartman.
Jiri Kosina raised this perennial topic, suggesting a move from “random people pointing to patches that should go to stable” to “maintainers sending pull requests”, noting increasing numbers of regressions in the stable trees. Guenter Roeck seconded this sentiment, noting that his new employer is sufficiently unhappy with stable releases to stop using them. Guenter would therefore like to see increased quality of the -stable trees. Takashi Iwai agreed that this is a worthy topic, calling out the large number of stable branches, amount of testing, and details for workflow as good subtopics. Jani Nikula also indicated interest.
Speaking of subtopics:
Tony Luck believes that the maintainers should send a list of commit IDs to be cherry-picked instead of a pull request, noting that it is rare to need a fixup patch or a full-up backport. Jiri agreed that a cherry-pick list might well be better than a pull request, and is mainly interested in seeing a well-defined set of people who are responsible for sending -stable patches. Jiri believes that if this change puts too much load on maintainers, then that is a sign that more maintainers are needed—or that fewer patches should be sent to -stable. Guenter agreed that the current process is letting too many patches into -stable. Johannes Berg countered with a hybrid model, where anyone can CC stable, but that some sort of explicit ack from the maintainer also be required. Vlastimil Babka argues that the responsible person need not be a maintainer, just someone designated for the job. However, Vlastimil also suggests that the responsible person be required to actually check the patch against each applicable stable kernel version.
Jiri pointed out that CCing the maintainer with a CC stable didn't necessarily result in that maintainer putting much thought into whether the patch was in fact suitable for inclusion in -stable. Mark Brown agreed, noting that the -stable review patchbombs were often too noisy to be useful. James pointed out that David Miller does curate -stable patches for networking, and suggested involving similar maintainers in the discussion. Jiri Kosina added that if maintainers are overwhelmed, offloading to Greg makes no sense, but fixing the maintainer workflow for that subsystem could make quite a bit of sense.
Ted wondered if a patch queue checked into git might not work better, given that people seem to be just cherry-picking specific patches. Justin Forbes pointed to stable-queue.git as an example. Mark Brown believes that the upstream commit IDs are helpful, and that a queue of patches can be easily generated from the current -stable trees for those who want that. Olof, Guenter, Geert Uytterhoeven Dmitry Torokhov and James Bottomley agree, and Mark added that full git trees are required by many testing frameworks. Ted noted that quilt allows patches to be dropped completely, while git normally keeps commits, which are replaced and/or modified by later commits. The git approach thus can make it hard to figure out which commits you should take. Dmitry replied that only bleeding edge community distros use pure stable. Others have work on top of stable, so “quilt rebases” don't work. Additional discussion ensued.
Andrew Lunn asked if there was numerical evidence supporting the notion that maintainer control of -stable submissions actually improved matters. [ Ed. note: And in an election year! Sacrilege!!! ] Rafael J. Wysocki independently raised this same point, and added that he suspects that it is difficult to judge the “regression potential” of a given patch up front. Rafael also suspects that the policy of avoiding reverting from stable unless/until mainline reverts is problematic, given that mainline might choose to apply a fix patch instead of reverting. Dmitry Torokhov agreed with Rafael, suggesting that broken -stable patches be reverted, then reapplied with the fix when/if mainline gets around to fixing them.
Dmitry Torokhov called out the fact that “Everyone and their dog has a stable release nowadays”, resulting in yet another maintainership scalability problem. Ted Ts'o noted that this was in part due to the ease with which a new stable tree can be set up, and that this is OK as long as people understand what the intent is.
Rafael reviewed the original motivation of -stable:
“So going back to the origins of -stable, the problem it was invented to address at that time, IIRC, was that people started to perceive switching over to the kernels released by Linus as risky, because it was hard to get fixes for bugs found in them. The idea at that time was to collect the fixes (and fixes only) in a "stable" tree, so that whoever decided to use the latest kernel released by Linus could get them readily, but without burdening maintainers with having their own "stable" branches and similar. And that was going to last until the next kernel release from Linus, at which point a new "stable" tree was to be started.”
Rafael argued that the -stable trees do in fact fulfil this goal, especially the more recent ones, and suggested that the issues were not with -stable as conceived, but rather with attempts to treat -stable as long-term-stable. Jiri Kosina agreed that many of his concerns with -stable were in fact more applicable to long-term stable trees.
Trond Myklebust noted that we don't have a good set of regression tests, and wondered if it was time for another rousing round of “how do we regression-test the kernel” discussion. Dan Williams voted in favor. James pointed out that many of the regresssions were in device drivers, which are not well served by generic regression tests. James suggests that both Reviewed-by and Tested-by tags be required for all device-driver changes, thus at least ensuring that the patch has run on the corresponding device. Trond pointed out that this only applied to mainline, because testing and review on mainline does not necessarily imply regression freedom for any given stable branch. Although James agreed that mainline testing is no guarantee of -stable regression freedom, he also pointed out that filtering out the patches that are broken in both environments is a good thing. And that people lacking the hardware are going to have some difficulty testing the corresponding driver. Trond argued that unit testing could nevertheless help ensure that local constraints are being respected.
Rafael wondered if James had hard statistics backing up his statements, but agreed that hardware is needed to really test driver patches. James replied that his data was strictly anecdotal, but said that if he is suspicious of a patch, he marks it as such, for example:
cc: stable at vger.kernel.org # delay until 4.8-rc1
James also believes that we should discuss stable practices separately from testing. Trond noted that there is overlap between the two topics, but agreed that they could be discussed separately. In turn, James proposed that the stable-workflow topic begin with one of the maintainers who does their own tree, continue on with stable regressions, and end with a debate on the appropriate numbers and types of stable trees. Trond agreed with James's list of topics.
Jani Nikula believes that a cc: stable tag should only be used in cases where the patch is clearly a bug that is known to be present in the relevant stable kernels, however, Jani does not see such tags as a guarantee that the patch is appropriate for those kernels. Mark Brown argues that more stable-process Q/A is needed, and wishes to avoid discourage stable tagging. Rafael J. Wysocki agreed with Mark, arguing that “stable” does not necessarily mean “no regressions”, but rather “here is the stuff to take into consideration”. Jani Nikula felt that Rafael's interpretation matches current reality, but suspects that it does not match Documentation/stable_kernel_rules.txt. Greg KH questioned Jani's perceived disconnect between reality and Documentation/stable_kernel_rules.txt, and asked for hard data on regressions in stable trees. Jani Nikula suggested that the rules could be clarified, but Greg KH believes that people really do understand and asked for specific examples of real problems. Examples with varying levels of detail and realism were put forward here, here, and here, with ensuing discussion.
Jiri Kosina gave this example, which involved a “fix” to a bug that didn't actually exist in the -stable release in question. James Bottomley argued that the problem in this case really was with upstream review rather than -stable review. Jiri Kosina countered that the fix was perfectly valid upstream, but broken in the older stable releases. Jiri also said that he wished to call attention to the lack of stable review, agreeing with James's characterization of “yes, I already reviewed this in upstream”. James Bottomley argued that expecting all submitters to be familiar with all stable versions was unrealistic, suggesting that Jiri apply only those stable patches with a cc stable and a fixes tag, a suggestion that Steven Rostedt agreed with. Jiri Kosina liked that suggestion, also likes the idea of an explicit version range. However, Greg KH resisted the notion that fixes tags be required, noting that there are whole subsystems that never mark anything for stable, even given the current easy rules. Much discussion on various corner cases ensued, including device-ID additions, exactly what constitutes a fix, best practices vs. hard requirements, paths to enlightenment, and motivational tools, including the obligatory Dilbert comic.
Testing and Review, Or Lack Thereof
Jason Cooper suspects that most of the regressions are build or boot failures, which he believes should be amenable to automated testing. Mark Brown agrees. Guenter Roeck also agrees with the value of automated testing, but isn't convinced that there are enough test machines, with the possible exception of the x86 0day build robot. Guenter also suspects that it is not that the stable trees have gotten worse, but rather that improved test automation gives us a better idea of just how bad things have always been. Guenter provided an example of a simple fix that resulted in no fewer than five follow-up commits. James Bottomley agreed with Guenter, noting that patches that are perceived to be trivial might receive less review than they deserve, thus letting regressions slip through. David Woodhouse called out commit fa731ac7ea0 as a fix that introduces a bug, and is suspicious of patches that claim to fix compiler warnings.
Guenter pointed out that there actually is quite a bit of Q/A in the community, but that this Q/A is not sufficiently robust and that there are way too many -stable trees to spread this Q/A over. Jiri Kosina agreed, noting that in contrast, Linus's tree benefits from seriously crowdsourced testing. Guenter was unwilling to give up so easily, arguing for consistent and thorough testing of all stable kernels. Ted Ts'o does limited testing on 3.10, 3.14, 4.1, and 4.4, with 3.14.73 being a bit of a problem child. Ted also stated that any effort to shrink the number of -stable trees should look at who is using them, given that he has sometimes been discouraged by the fact that getting a fix into -stable doesn't help the end user, particularly in cases where vendors ignore -stable. Mark Brown stated that one reason that SOC vendors' BSPs diverge from mainline is that the BSPs contain under-development code not yet ready for inclusion. Mark also countered the “too many stable kernels” argument with “if people want to run a given kernel version it's nice for them to have a place to collaborate and share fixes.”
Takashi Iwai noted that there have been cases where the fix was in fact correct for mainline, and introduced regressions only in older kernels, which Takashi believes indicates a need for better validation of stable trees. Jiri Kosina agreed, and wondered if Fengguang's 0day test robot might be able to help, but expressed a concern that stable-tree testing might be quite bursty, resulting in 0day overload. Guenter replied that Greg KH keeps a nice even workflow, but isn't sure about other -stable tree maintainers. Zefan Li suggested setting up a stable branch within a tree that Fengguang's 0day test robot already tests. Guenter indicated that if a given -stable tree was being tested within some specific git tree, then he would like to pick up -stable updates from that git tree.
Rafael noted that it is not just a matter of getting the review and testing done -- it must also get done within rather tight timeframes. [ Ed note - Yay! Real-time review and testing! ] Jason Cooper suggested that examples of regressions induced by stable patches be used to drive the discussion. Jon Corbet pointed out that there was in fact such data here.
Sudip Mukherjee suggested a stable-tree-next approach to increase testing. Jiri Kosina wondered how applicable -next's merge-a-gazillion-trees approach would be to -stable trees, which pick and choose patches.
Vinod Koul argued that, just as patches sent upstream are tested by their submitters, so too should -stable patches be tested by their submitters, not just once, but against each stable tree that the patch applies to. Ted Ts'o said that such a policy would result in submitters never CCing to stable. Guenter agreed with Ted, arguing that testing has to happen in the stable tree. Vinod argued that the submitter was more likely to have the relevant hardware, and thus was in the best position to do testing, at least for device drivers. Ted pointed out that even submitting fixes upstream was difficult in many work environments, and that requiring additional testing was therefore not likely to have an overall positive effect. Luis de Bethencourt agreed with Ted that raising the barrier to entry for patch submission would be counterproductive, and wondered if increased sharing of infrastructure and information among stable-branch maintainers would help. Vinod agreed that these were good counterarguments, but asked what his opponents thought that a good solution might look like. Some suggested solutions regarding hardware availability may be found here.
David Woodhouse agreed that testing is valuable, and suggested that there be an expectation that new code be submitted with test cases, noting that even device drivers can sometimes be tested using tools based on MMIO tracing and playback. Guenter Roeck is concerned that requiring test cases could reduce the number of contributions, noting that upstreaming is already unpopular in many circles, even without the test-case requirement. David Woodhouse argued that for an expectation as opposed to a hard requirement, and further argued that having test infrastructure in place would make test cases easier to create. David also suggested that test-case creation might be a good proving ground for newbies. Guenter Roeck agreed with this approach, particularly with test cases as newbie proving grounds.
Laurent Pinchart
liked the idea of test infrastructure, calling
Not to be outdone,
Tim Bird
called out the
Fuego
framework, showing some of its workings.
For his part,
Greg KH
expressed appreciation for a large number of testing services that he
relies of in his stable-tree work.
Further discussion added yet more paint to the cc stable
bikeshed, detailed some of the test services that Greg called out,
and speculated on how many clones of Dave Miller there were given
all the -stable work he gets done.
Steven Rostedt
suggested that tests requiring specific hardware should provide some
sort of “unsupported” indication when that hardware is
not available.
Although
Mark Brown
agreed that this approach can be useful, he is concerned that device-driver
bugs would go unnoticed.
Steven Rostedt
suspects that if no one has a given piece of hardware, then lack of testing
isn't so much of a problem.
Mark Brown
would like a clear distinction between tests that anyone can run from
those requiring specific hardware in order to improve test
reproducibility.
Steven Rostedt
suggested a separate directory in kselftests for hardware-dependent tests.
Luis R. Rodriguez
suggested use of soft Kconfig entries to check for the presence of
the required device drivers.
Alexandre Belloni
noted a third class of tests, those requiring hardware, but which can
be run against a wide variety of devices, calling out real-time-clock
tests as an example, but noting that such tests usually change the
system time.
Steven Rostedt
would prefer to restrict kselftests to non-destructive tests, but
speculated on the possibility of saving and restoring the system time.
Laurent Pinchart
suggested that use of standard tests frameworks could enable out-of-tree
tests.
Dmitry Torokhov
suggested that there should be multiple flavors of -stable trees,
one for security, another for core fixes, and a third for hardware
support and device drivers.
Ted Ts'o
indicated some support for separating out hardware.
Rafael
is concerned that multiple flavors would multiply confusion, and
Takashi Iwai
agreed that flavors might not be all that helpful.
Rafael also believes that the probability of regression correlates well
with the complexity of the patch.
Dmitry
agreed that general-purpose distros would want to take all fixes,
but that embedded devices might well want to be more choosy, especially
those with working with rare devices that receive very little testing.
Dmitry was also a bit skeptical of Rafael's complexity-quality correlation.
Dan Williams
pointed out that it is possible to do significant testing without hardware,
calling out the ACPI NVDIMM Firmware Interface Table (NFIT) tables,
which are tested by tools/testing/nvdimm/.
Dan agrees that not all bugs can be found this way, but believes
that it is nevertheless a useful approach.
Guenter
notes that hardware can be supplied (for example, via
hardware testbeds),
or emulated using things like qemu.
Christian Borntraeger
seconded qemu, and further suggested that some sort of “make test”
(presumably using qemu) be required to work everywhere.
Christian would also like “make test”
failures to trigger -stable reverts even in the absence of a
corresponding revert of the upstream patch.
Dmitry Torokhov
says that community-based distros use the more recent stable trees,
not the older trees.
Jason Cooper
wondered what sorts of regressions cause people to give up on -stable.
Olof Johansson
recounted experiences with a group that maintained their own driver
instead of relying on either the mainline or stable due to bugs being
introduced into these external trees.
Olof believes that the specific problems have since been solved, but
points out that once a group of developers have been burnt by -stable,
they will be extremely reluctant to even consider using -stable ever again.
That said, that group did continue using -stable as a source of fixes,
so that their first reaction to finding a bug was to check -stable
for a fix.
Olof concludes by noting that -stable was of substantial value to this
group despite the fact that they did not use it in the conventional sense.
He followed up
stating that his experiences were with drivers frequently used on
x86 laptops.
Ted Ts'o
thanked Olof for the “color commentary”, and wondered if
other
system-on-a-chip (SoC)
vendors were using -stable, but noted that such vendors normally lose
interest in any given device once they stop shipping it.
Ted also pointed out that trawling -stable in response to bugs
won't locate important security fixes.
Olof
does not believe that optimizing -stable workflow for SoC vendors will
be useful.
Olof agreed that groups working on embedded systems often do miss out on
CVEs, but said that larger groups
can track CVEs or have representatives on the security lists.
Stephen Hemminger
added that Brocade regularly merges stable kernels into their code base
without serious issues, but notes that in Brocade's case there are
few vendor-specific changes.
Stephen suspects that we are only hearing from the unhappy users,
but nevertheless believes that it would be good to reduce the number
of unhappy users.
As noted earlier,
Greg KH
put forward the
LTSI Test Project.
Alex Shi
liked the backporting effort, and would like to see more backporting,
but feels some upstream focus is required.
Olof Johansson
suggested that moving to newer kernel versions would provide more eyes
and less need for backporting.
Olof also noted that LTSI is a different beast than is -stable because
LTSI includes feature backports in addition to backported bug fixes.
Olof also suspects that the goals of limiting the number of features
backported and increasing the size of a given tree's community are
in conflict.
Alex Shi
believes that the number of feature backports can be limited by carefully
chosen backporting criteria, and points out that LTSI has relatively
few feature backports.
Greg KH
wanted to know more about
Alex Shi's
tradeoffs between LTSI and upstream.
Alex
replied that industry needs more features backported to LTS, for example,
ARM PCIe, opp v2, writebacks, and cgroups, as was done to Linaro's
stable kernel (LSK) 4.1.
In fact, Alex believes that new features should sometimes be developed
on LTS because mainstream maintainers often cannot do adequate testing.
Ard Biesheuvel
argues that LSK is needed not in general, but rather because arm64 support
is still immature, so that LSK is not all that relevant to systems used
in production.
Alex
agreed that LSK is not a good model for stable kernel trees intended
for production use, but believes that LSK is nevertheless a good proof
point for the need to backport more features, including features not
directly related to arm64.
Mark Brown
noted that there has been significant pushback against LTSI, for example
some embedded vendors were concerned about conflicts between their
internal work and work done on LTSI.
Greg KH
was puzzled by this, given that embedded vendors were the ones pushing
for LTSI in the first place.
Greg also wondered about the exact nature of the conflicts.
Mark Brown
said that some Linaro members wanted LSK instead of LTSI, and that the
inclusion of board support and vendor-specific drivers was a problem
for some of these members, who then had to merge changes in LTSI with
changes in their internal trees.
Greg KH
understands that the specific LTSI tree might be a problem for some
people, but would still like more collaboration among people working
on long-term support trees like LTSI and LSK.
Guenter Roeck
would like to know what motivates companies to use and to not use LTSI
(and
Mark Brown agrees, though
Greg KH
suspects that it would not be all that relevant to most attendees).
Guenter added that his problem with LTSI is that it is a collection of
patches rather than a git tree, which leads us into the next section.
Steven Rostedt
believes that the passage of time is key, so that older stable trees have
more pressure for complex fixes and features.
Steven also notes that bugs mutate over time, for example, as timing
changes.
Greg KH
feels that sets of patches are most appropriate for maintaining stable
kernels, and that they are especially helpful in letting people know
just far they are deviating from mainline, something that is hidden
when using git trees
(Greg also wants to see patches from
Mark Brown,
who agreed to supply two).
Greg also pointed out that there are scripts to pull the quilt series
into git, for those who like git.
Guenter Roeck
countered that git was quite useful to him in a former life where they
maintained a few hundred patches on top of mainline, tracking a stable
kernel.
Guenter has seen serious problems in projects that attempted to do
this using sets of patches, and would not like to use quilt for active
development.
Greg
noted that enterprise distros use quilt, which indicates that developing
on top of quilt cannot be all that hard.
NeilBrown
(of SUSE) agrees that enterprise distros use quilt, but also points out
that they use git for development using an upstream-first approach
(as does Red Hat,
although
milage may differ for embedded distros).
Neil does not like the idea of using quilt for development,
and noted that given that Greg wants quilt for maintaining stable trees
and that Guenter wants git for development, perhaps they are actually
in violent agreement.
Greg
agreed with Neil's assessment, and said that Geert was working on
producing a git tree for LTSI so that people wanting git and LTSI
could have the same commit ID for a given patch.
Jiri Kosina
said that SUSE already automatically generates git trees from quilt
patch series, and gave the relevant URLs.
For extra credit, SUSE maintains its quilt series in git.
Greg
asked how this handled updating a patch in the middle of a quilt series, and
Jiri
gave an example.
Geert Uytterhoeven
noted that git branch and git rebase can be used to
update patches in the middle of a series while leaving the old series intact.
Then git format patch can be used to regenerate the quilt series.
James Bottomley
added that git cherry can be used to identify patches that
are present in one series but not another, and that git cherry pick
can be used to pull those patches into the series that lacks them.
James admitted that stgit might give better user experience.
NeilBrown
noted that if git cherry pick added the upstream commit ID to
the commit log, it would be possible to very nearly emulate quilt
commands in git—however, this could be confused by conflict-resolution
changes.
James Bottomley
suggested that the same techniques used by git to detect file moves
might be applied to overcome problems introduced by conflict-resolution
changes.
James also noted that the -x argument to git cherry pick
records the upstream commit ID, as did
Dmitry Torokhov.
NeilBrown
read the manpage and learned that -x records the upstream commit ID
only in the absence of conflicts, which does not work for his use cases.
Dmitry Torokhov
agreed that the manpage in fact said that, but that -x really does
work when there are conflicts, and also helpfully documents what the
conflicts were in the commit log.
Geert Uytterhoeven
pointed out that you can get the effect of git cherry pick
by giving the --onto argument to git rebase.
James Bottomley
suggested -i instead of (or perhaps in addition to) --onto,
but likes the fact that git cherry pick is scriptable.
James points out that git cherry is needed either way.
Geert Uytterhoeven
avoids git rebase -i exactly because it is not scriptable,
but notes that git rebase automates the filtering that is
done manually (or by the script) in the case of git cherry
and git cherry pick.
Vlastimil Babka
suggests that it might be possible to use git rebase's
--edit-todo and --continue arguments to more
closely emulate quilt commands.
Laurent Pinchart
called out git rebase --continue as doing the right thing, either
continuing if conflicts were handled correctly or complaining if not.
Daniel Vetter
calledout the drm/i915 maintainership tools, which are discussed at length
here.